LinuxLists.cc - [PATCH] slab leak detector (Was: Size-128 slab leak)

2006-02-02 08:26:19

Subject: [PATCH] slab leak detector (Was: Size-128 slab leak)

Hi,

Here's a version that uses dbg_userword() instead of overriding bufctls
and adds a CONFIG_DEBUG_SLAB_LEAK config option. Upside is that this works
with the slab verifier patch and is less invasive. Downside is that now
some slabs don't get leak reports (those that don't get SLAB_STORE_USER
enabled in kmem_cache_create). However, I think we should improve
dbg_userword() mechanism instead if we need leak reports for all caches.

Pekka

From: Manfred Spraul <[email protected]>

Maintenance work from Alexander Nyberg <[email protected]>

With the patch applied,

echo "size-32 0 0 0" > /proc/slabinfo

walks the objects in the size-32 slab, printing out the calling address
of whoever allocated that object.

It is for leak detection.

Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Pekka Enberg <[email protected]>
---

lib/Kconfig.debug | 12 ++++++++++++
mm/slab.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 61 insertions(+)

Index: 2.6-git/mm/slab.c
===================================================================
--- 2.6-git.orig/mm/slab.c
+++ 2.6-git/mm/slab.c
@@ -3669,6 +3669,54 @@ struct seq_operations slabinfo_op = {
.show = s_show,
};

+#ifdef CONFIG_DEBUG_SLAB_LEAK
+
+static void print_slab_last_users(struct kmem_cache *cache, struct slab *slab)
+{
+ int i;
+
+ for (i = 0; i < cache->num; i++) {
+ void *obj = slab->s_mem + cache->buffer_size * i;
+ unsigned long sym = (unsigned long) *dbg_userword(cache, obj);
+
+ printk("obj %p/%d: %p", slab, i, (void *)sym);
+ print_symbol(" <%s>", sym);
+ printk("\n");
+ }
+}
+
+static void print_cache_last_users(struct kmem_cache *cache)
+{
+ int node;
+
+ if (!(cache->flags & SLAB_STORE_USER))
+ return;
+
+ check_irq_on();
+ spin_lock_irq(&cache->spinlock);
+ for_each_online_node(node) {
+ struct kmem_list3 *lists = cache->nodelists[node];
+ struct list_head *q;
+
+ spin_lock(&lists->list_lock);
+
+ list_for_each(q, &lists->slabs_full) {
+ struct slab *slab = list_entry(q, struct slab, list);
+ print_slab_last_users(cache, slab);
+ }
+ spin_unlock(&lists->list_lock);
+ }
+ spin_unlock_irq(&cache->spinlock);
+}
+
+#else
+
+static void print_cache_last_users(struct kmem_cache *cache)
+{
+}
+
+#endif
+
#define MAX_SLABINFO_WRITE 128
/**
* slabinfo_write - Tuning for the slab allocator
@@ -3709,6 +3757,7 @@ ssize_t slabinfo_write(struct file *file
if (limit < 1 ||
batchcount < 1 ||
batchcount > limit || shared < 0) {
+ print_cache_last_users(cachep);
res = 0;
} else {
res = do_tune_cpucache(cachep, limit,
Index: 2.6-git/lib/Kconfig.debug
===================================================================
--- 2.6-git.orig/lib/Kconfig.debug
+++ 2.6-git/lib/Kconfig.debug
@@ -85,6 +85,18 @@ config DEBUG_SLAB
allocation as well as poisoning memory on free to catch use of freed
memory. This can make kmalloc/kfree-intensive workloads much slower.

+config DEBUG_SLAB_LEAK
+ bool "Debug memory leaks"
+ depends on DEBUG_SLAB
+ help
+ Say Y here to have the kernel track last user of a slab object which
+ can be used to detect memory leaks. With this config option enabled,
+
+ echo "size-32 0 0 0" > /proc/slabinfo
+
+ walks the objects in the size-32 slab, printing out the calling
+ address of whoever allocated that object.
+
config DEBUG_PREEMPT
bool "Debug preemptible kernel"
depends on DEBUG_KERNEL && PREEMPT

2006-02-02 08:45:09

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] slab leak detector (Was: Size-128 slab leak)

Pekka J Enberg <[email protected]> wrote:
>
> Here's a version that uses dbg_userword() instead of overriding bufctls
> and adds a CONFIG_DEBUG_SLAB_LEAK config option. Upside is that this works
> with the slab verifier patch and is less invasive.

What is the slab verifier patch?

> Downside is that now
> some slabs don't get leak reports (those that don't get SLAB_STORE_USER
> enabled in kmem_cache_create).

Which slabs are those? SLAB_HWCACHE_ALIGN? If so, that's quite a lot of
them (more than needed, probably).

2006-02-02 08:55:15

by Dave Jones

[permalink] [raw]

Subject: Re: [PATCH] slab leak detector (Was: Size-128 slab leak)

On Thu, Feb 02, 2006 at 12:44:15AM -0800, Andrew Morton wrote:
> Pekka J Enberg <[email protected]> wrote:
> >
> > Here's a version that uses dbg_userword() instead of overriding bufctls
> > and adds a CONFIG_DEBUG_SLAB_LEAK config option. Upside is that this works
> > with the slab verifier patch and is less invasive.
>
> What is the slab verifier patch?

See the thread.. "Re: 2.6.16rc1-git4 slab corruption"
Pekka posted a rediffed variant of a patch Manfred came up with
a while back, which periodically scans slabs for corruption, which
in the past has caught some bugs where we scribbled in the redzones
of long-living slab objects, which weren't picked up by regular SLAB_DEBUG.

Dave

2006-02-02 08:59:58

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH] slab leak detector (Was: Size-128 slab leak)

Pekka J Enberg <[email protected]> wrote:
> > Here's a version that uses dbg_userword() instead of overriding bufctls
> > and adds a CONFIG_DEBUG_SLAB_LEAK config option. Upside is that this works
> > with the slab verifier patch and is less invasive.

On Thu, 2 Feb 2006, Andrew Morton wrote:
> What is the slab verifier patch?

See the included (untested) patch. I got it from Davej who got it from
Manfred and cleaned it up a bit. The verifier checks slab red zones during
cache_reap() to catch slab corruption early.

Pekka J Enberg <[email protected]> wrote:
> > Downside is that now
> > some slabs don't get leak reports (those that don't get SLAB_STORE_USER
> > enabled in kmem_cache_create).

On Thu, 2 Feb 2006, Andrew Morton wrote:
> Which slabs are those? SLAB_HWCACHE_ALIGN? If so, that's quite a lot of
> them (more than needed, probably).

It's for those caches that don't get red-zoning now. I am not sure how far
it extends but kmem_cache_create() tries to avoid worst-case scenarios
where internal fragmentation would be too much.

Pekka

---

mm/slab.c | 134 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 131 insertions(+), 3 deletions(-)

Index: 2.6-git/mm/slab.c
===================================================================
--- 2.6-git.orig/mm/slab.c
+++ 2.6-git/mm/slab.c
@@ -202,7 +202,7 @@

typedef unsigned int kmem_bufctl_t;
#define BUFCTL_END (((kmem_bufctl_t)(~0U))-0)
-#define BUFCTL_FREE (((kmem_bufctl_t)(~0U))-1)
+#define BUFCTL_ALLOC (((kmem_bufctl_t)(~0U))-1)
#define SLAB_LIMIT (((kmem_bufctl_t)(~0U))-2)

/* Max number of objs-per-slab for caches which use off-slab slabs.
@@ -430,6 +430,11 @@ struct kmem_cache {
*/
int obj_offset;
int obj_size;
+
+ /*
+ * Time for next cache verification.
+ */
+ unsigned long next_verify;
#endif
};

@@ -445,6 +450,7 @@ struct kmem_cache {
*/
#define REAPTIMEOUT_CPUC (2*HZ)
#define REAPTIMEOUT_LIST3 (4*HZ)
+#define REDZONETIMEOUT (300*HZ)

#if STATS
#define STATS_INC_ACTIVE(x) ((x)->num_active++)
@@ -1923,6 +1929,11 @@ kmem_cache_create (const char *name, siz
cachep->limit = BOOT_CPUCACHE_ENTRIES;
}

+#if DEBUG
+ cachep->next_verify = jiffies + REDZONETIMEOUT +
+ ((unsigned long)cachep/L1_CACHE_BYTES)%REDZONETIMEOUT;
+#endif
+
/* cache setup completed, link it into the list */
list_add(&cachep->next, &cache_chain);
unlock_cpu_hotplug();
@@ -2251,7 +2262,7 @@ static void *slab_get_obj(struct kmem_ca
slabp->inuse++;
next = slab_bufctl(slabp)[slabp->free];
#if DEBUG
- slab_bufctl(slabp)[slabp->free] = BUFCTL_FREE;
+ slab_bufctl(slabp)[slabp->free] = BUFCTL_ALLOC;
WARN_ON(slabp->nodeid != nodeid);
#endif
slabp->free = next;
@@ -2268,7 +2279,7 @@ static void slab_put_obj(struct kmem_cac
/* Verify that the slab belongs to the intended node */
WARN_ON(slabp->nodeid != nodeid);

- if (slab_bufctl(slabp)[objnr] != BUFCTL_FREE) {
+ if (slab_bufctl(slabp)[objnr] != BUFCTL_ALLOC) {
printk(KERN_ERR "slab: double free detected in cache "
"'%s', objp %p\n", cachep->name, objp);
BUG();
@@ -3266,6 +3277,116 @@ static int alloc_kmemlist(struct kmem_ca
return err;
}

+#if DEBUG
+
+static void verify_slab_redzone(struct kmem_cache *cache, struct slab *slab)
+{
+ int i;
+
+ if (!(cache->flags & SLAB_RED_ZONE))
+ return;
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+ /* Page alloc debugging on for this cache. Mapping & Unmapping happens
+ * without any locking, thus parallel checks are impossible.
+ */
+ if ((cache->buffer_size % PAGE_SIZE) == 0 && OFF_SLAB(cache))
+ return;
+#endif
+
+ for (i = 0; i < cache->num; i++) {
+ unsigned long red1, red2;
+ void *obj = slab->s_mem + cache->buffer_size * i;
+
+ red1 = *dbg_redzone1(cache, obj);
+ red2 = *dbg_redzone2(cache, obj);
+
+ /* Simplest case: redzones marked as inactive. */
+ if (red1 == RED_INACTIVE && red2 == RED_INACTIVE)
+ continue;
+
+ /*
+ * Tricky case: if the bufctl value is BUFCTL_ALLOC, then
+ * the object is either allocated or somewhere in a cpu
+ * cache. The cpu caches are lockless and there might be
+ * a concurrent alloc/free call, thus we must accept random
+ * combinations of RED_ACTIVE and _INACTIVE
+ */
+ if (slab_bufctl(slab)[i] == BUFCTL_ALLOC &&
+ (red1 == RED_INACTIVE || red1 == RED_ACTIVE) &&
+ (red2 == RED_INACTIVE || red2 == RED_ACTIVE))
+ continue;
+
+ printk(KERN_ERR "slab %s: redzone mismatch in slab %p,"
+ " obj %p, bufctl 0x%lx\n", cache->name, slab, obj,
+ slab_bufctl(slab)[i]);
+
+ print_objinfo(cache, obj, 2);
+ }
+}
+
+static void print_invalid_slab(const char *list_name, struct kmem_cache *cache,
+ struct slab *slab)
+{
+ printk(KERN_ERR "slab %s: invalid slab found in %s list at %p (%d/%d).\n",
+ cache->name, list_name, slab, slab->inuse, cache->num);
+}
+
+static void verify_nodelists(struct kmem_cache *cache,
+ struct kmem_list3 *lists)
+{
+ struct list_head *q;
+ struct slab *slab;
+
+ list_for_each(q, &lists->slabs_full) {
+ slab = list_entry(q, struct slab, list);
+
+ if (slab->inuse != cache->num)
+ print_invalid_slab("full", cache, slab);
+
+ check_slabp(cache, slab);
+ verify_slab_redzone(cache, slab);
+ }
+ list_for_each(q, &lists->slabs_partial) {
+ slab = list_entry(q, struct slab, list);
+
+ if (slab->inuse == cache->num || slab->inuse == 0)
+ print_invalid_slab("partial", cache, slab);
+
+ check_slabp(cache, slab);
+ verify_slab_redzone(cache, slab);
+ }
+ list_for_each(q, &lists->slabs_free) {
+ slab = list_entry(q, struct slab, list);
+
+ if (slab->inuse != 0)
+ print_invalid_slab("free", cache, slab);
+
+ check_slabp(cache, slab);
+ verify_slab_redzone(cache, slab);
+ }
+}
+
+/*
+ * Perform a self test on all slabs from a cache
+ */
+static void verify_cache(struct kmem_cache *cache)
+{
+ int node;
+
+ check_spinlock_acquired(cache);
+
+ for_each_online_node(node) {
+ struct kmem_list3 *lists = cache->nodelists[node];
+
+ if (!lists)
+ continue;
+ verify_nodelists(cache, lists);
+ }
+}
+
+#endif
+
struct ccupdate_struct {
struct kmem_cache *cachep;
struct array_cache *new[NR_CPUS];
@@ -3446,6 +3567,13 @@ static void cache_reap(void *unused)
drain_array_locked(searchp, cpu_cache_get(searchp), 0,
numa_node_id());

+#if DEBUG
+ if (time_before(searchp->next_verify, jiffies)) {
+ searchp->next_verify = jiffies + REDZONETIMEOUT;
+ verify_cache(searchp);
+ }
+#endif
+
if (time_after(l3->next_reap, jiffies))
goto next_unlock;

2006-02-03 06:35:11

by Manfred Spraul

[permalink] [raw]

Subject: Re: [PATCH] slab leak detector (Was: Size-128 slab leak)

Andrew Morton wrote:

>
>
>>Downside is that now
>> some slabs don't get leak reports (those that don't get SLAB_STORE_USER
>> enabled in kmem_cache_create).
>>
>>
>
>Which slabs are those? SLAB_HWCACHE_ALIGN? If so, that's quite a lot of
>them (more than needed, probably).
>
>
Slabs with 4 kB or larger objects.

--
Manfred

2006-02-03 11:53:26

by Pekka Enberg

[permalink] [raw]

Subject: Re: [PATCH] slab leak detector (Was: Size-128 slab leak)

Andrew Morton wrote:
> > Which slabs are those? SLAB_HWCACHE_ALIGN? If so, that's quite a lot of
> > them (more than needed, probably).

On 2/3/06, Manfred Spraul <[email protected]> wrote:
> Slabs with 4 kB or larger objects.

Hmm. The relevant check is:

if ((size < 4096
|| fls(size - 1) == fls(size - 1 + 3 * BYTES_PER_WORD)))

So for example, when object size is 4097, we _will_ get redzoning.
Shouldn't that be 5 * BYTES_PER_WORD btw?

Pekka