2006-01-31 02:50:25

by Kevin O'Connor

[permalink] [raw]
Subject: Size-128 slab leak

Hi,

I have an annoying slab leak on my kernel. Every day, I lose about
50Megs of memory to the leak. It seems to be related to disk
accesses, because the count only goes up noticeable around 4:00am when
the system locate utility runs.

I can tell there is a leak because /proc/slabinfo shows "size-128"
growing continuously. For example, it currently reads:

size-128 4086041 4106550 128 30 1 : tunables 120 60 8 : slabdata 136885 136885 0

The machine is a vanilla lkml kernel:

Linux double 2.6.15 #1 SMP Wed Jan 4 23:13:51 EST 2006 x86_64 x86_64 x86_64 GNU/Linux

I've noticed this bug on a 2.6.14 kernel also. This machine is using
libata (sata_uli) along with reiserfs, ext3, and lvm. I'm interested
in finding ways of diagnosing this problem. I can provide more
information on demand. Please CC me on any replies.

Thanks,
-Kevin


Attachments:
(No filename) (872.00 B)
lspci-20060130 (1.63 kB)
cpuinfo-20060130 (1.25 kB)
lsmod-20060130 (2.82 kB)
slabinfo-20060130.gz (2.36 kB)
Download all attachments

2006-02-02 07:11:07

by Andrew Morton

[permalink] [raw]
Subject: Re: Size-128 slab leak

"Kevin O'Connor" <[email protected]> wrote:
>
> I have an annoying slab leak on my kernel. Every day, I lose about
> 50Megs of memory to the leak. It seems to be related to disk
> accesses, because the count only goes up noticeable around 4:00am when
> the system locate utility runs.
>
> I can tell there is a leak because /proc/slabinfo shows "size-128"
> growing continuously. For example, it currently reads:
>
> size-128 4086041 4106550 128 30 1 : tunables 120 60 8 : slabdata 136885 136885 0
>
> The machine is a vanilla lkml kernel:
>
> Linux double 2.6.15 #1 SMP Wed Jan 4 23:13:51 EST 2006 x86_64 x86_64 x86_64 GNU/Linux
>
> I've noticed this bug on a 2.6.14 kernel also. This machine is using
> libata (sata_uli) along with reiserfs, ext3, and lvm. I'm interested
> in finding ways of diagnosing this problem.

-mm kernels have a patch (slab-leak-detector.patch) which will help.
Here's a version for 2.6.16-rc1. It requires CONFIG_DEBUG_SLAB. Thanks.


From: Manfred Spraul <[email protected]>

Maintenance work from Alexander Nyberg <[email protected]>

With the patch applied,

echo "size-4096 0 0 0" > /proc/slabinfo

walks the objects in the size-4096 slab, printing out the calling address
of whoever allocated that object.

It is for leak detection.

Signed-off-by: Andrew Morton <[email protected]>
---

mm/slab.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
1 files changed, 43 insertions(+), 3 deletions(-)

diff -puN mm/slab.c~slab-leak-detector mm/slab.c
--- devel/mm/slab.c~slab-leak-detector 2006-02-01 23:05:08.000000000 -0800
+++ devel-akpm/mm/slab.c 2006-02-01 23:07:43.000000000 -0800
@@ -198,7 +198,7 @@
* is less than 512 (PAGE_SIZE<<3), but greater than 256.
*/

-typedef unsigned int kmem_bufctl_t;
+typedef unsigned long kmem_bufctl_t;
#define BUFCTL_END (((kmem_bufctl_t)(~0U))-0)
#define BUFCTL_FREE (((kmem_bufctl_t)(~0U))-1)
#define SLAB_LIMIT (((kmem_bufctl_t)(~0U))-2)
@@ -2393,7 +2393,7 @@ static void check_slabp(kmem_cache_t *ca
i < sizeof(slabp) + cachep->num * sizeof(kmem_bufctl_t);
i++) {
if ((i % 16) == 0)
- printk("\n%03x:", i);
+ printk("\n%04lx:", i);
printk(" %02x", ((unsigned char *)slabp)[i]);
}
printk("\n");
@@ -2550,6 +2550,15 @@ static void *cache_alloc_debugcheck_afte
*dbg_redzone1(cachep, objp) = RED_ACTIVE;
*dbg_redzone2(cachep, objp) = RED_ACTIVE;
}
+ {
+ int objnr;
+ struct slab *slabp;
+
+ slabp = page_get_slab(virt_to_page(objp));
+
+ objnr = (objp - slabp->s_mem) / cachep->objsize;
+ slab_bufctl(slabp)[objnr] = (unsigned long)caller;
+ }
objp += obj_dbghead(cachep);
if (cachep->ctor && cachep->flags & SLAB_POISON) {
unsigned long ctor_flags = SLAB_CTOR_CONSTRUCTOR;
@@ -2691,7 +2700,7 @@ static void free_block(kmem_cache_t *cac
check_spinlock_acquired_node(cachep, node);
check_slabp(cachep, slabp);

-#if DEBUG
+#if 0
/* Verify that the slab belongs to the intended node */
WARN_ON(slabp->nodeid != node);

@@ -3573,6 +3582,36 @@ struct seq_operations slabinfo_op = {
.show = s_show,
};

+static void do_dump_slabp(kmem_cache_t *cachep)
+{
+#if DEBUG
+ struct list_head *q;
+ int node;
+
+ check_irq_on();
+ spin_lock_irq(&cachep->spinlock);
+ for_each_online_node(node) {
+ struct kmem_list3 *rl3 = cachep->nodelists[node];
+ spin_lock(&rl3->list_lock);
+
+ list_for_each(q, &rl3->slabs_full) {
+ int i;
+ struct slab *slabp = list_entry(q, struct slab, list);
+
+ for (i = 0; i < cachep->num; i++) {
+ unsigned long sym = slab_bufctl(slabp)[i];
+
+ printk("obj %p/%d: %p", slabp, i, (void *)sym);
+ print_symbol(" <%s>", sym);
+ printk("\n");
+ }
+ }
+ spin_unlock(&rl3->list_lock);
+ }
+ spin_unlock_irq(&cachep->spinlock);
+#endif
+}
+
#define MAX_SLABINFO_WRITE 128
/**
* slabinfo_write - Tuning for the slab allocator
@@ -3612,6 +3651,7 @@ ssize_t slabinfo_write(struct file *file
if (limit < 1 ||
batchcount < 1 ||
batchcount > limit || shared < 0) {
+ do_dump_slabp(cachep);
res = 0;
} else {
res = do_tune_cpucache(cachep, limit,
_

2006-02-02 07:32:32

by Pekka Enberg

[permalink] [raw]
Subject: Re: Size-128 slab leak

Hi,

On 2/2/06, Andrew Morton <[email protected]> wrote:
> @@ -2550,6 +2550,15 @@ static void *cache_alloc_debugcheck_afte
> *dbg_redzone1(cachep, objp) = RED_ACTIVE;
> *dbg_redzone2(cachep, objp) = RED_ACTIVE;
> }
> + {
> + int objnr;
> + struct slab *slabp;
> +
> + slabp = page_get_slab(virt_to_page(objp));
> +
> + objnr = (objp - slabp->s_mem) / cachep->objsize;
> + slab_bufctl(slabp)[objnr] = (unsigned long)caller;
> + }

We already have last caller in dbg_userword. Manfred, is there a
reason we're not using it?

Pekka

2006-02-03 04:02:24

by Kevin O'Connor

[permalink] [raw]
Subject: Re: Size-128 slab leak

On Wed, Feb 01, 2006 at 11:10:01PM -0800, Andrew Morton wrote:
> "Kevin O'Connor" <[email protected]> wrote:
> >
> > I have an annoying slab leak on my kernel. Every day, I lose about
> > 50Megs of memory to the leak. It seems to be related to disk
> > accesses, because the count only goes up noticeable around 4:00am when
> > the system locate utility runs.
> >
> > I can tell there is a leak because /proc/slabinfo shows "size-128"
> > growing continuously. For example, it currently reads:
>
> -mm kernels have a patch (slab-leak-detector.patch) which will help.
> Here's a version for 2.6.16-rc1. It requires CONFIG_DEBUG_SLAB. Thanks.

Thanks Andrew.

I've applied the patch and found the leak. It's in kzalloc. :-)

With kzalloc inlined, however, it appears that selinux is the likely
culprit. I would not have expected that.

After running updatedb I got 23530 occurrences of:

kernel: obj ffff81003f04f000/12: ffffffff801ed7b7 <selinux_inode_alloc_security+0x37/0x100>

I'm not sure how to debug selinux issues, but at least I can disable
it.

-Kevin

2006-02-03 06:27:53

by Manfred Spraul

[permalink] [raw]
Subject: Re: Size-128 slab leak

Pekka Enberg wrote:

>We already have last caller in dbg_userword. Manfred, is there a
>reason we're not using it?
>
>
>
IIRC only due to historical reasons: dbg_userword is newer than this patch.

--
Manfred

2006-02-03 13:15:44

by Stephen Smalley

[permalink] [raw]
Subject: Re: Size-128 slab leak

On Thu, 2006-02-02 at 23:00 -0500, Kevin O'Connor wrote:
> Thanks Andrew.
>
> I've applied the patch and found the leak. It's in kzalloc. :-)
>
> With kzalloc inlined, however, it appears that selinux is the likely
> culprit. I would not have expected that.
>
> After running updatedb I got 23530 occurrences of:
>
> kernel: obj ffff81003f04f000/12: ffffffff801ed7b7 <selinux_inode_alloc_security+0x37/0x100>
>
> I'm not sure how to debug selinux issues, but at least I can disable
> it.

Hmm...that allocation call occurs upon alloc_inode() via
security_inode_alloc, and the associated free call occurs upon
destroy_inode() via security_inode_free. However, when Jeff Mahoney
introduced the support for "private inodes" (S_PRIVATE flag) to support
reiserfs xattrs-as-files, he added the IS_PRIVATE guards to both
security_inode_alloc and security_inode_free. I think that this ends up
causing SELinux to allocate a security structure for every reiserfs
inode including private inodes since they are not marked until later by
reiserfs, while preventing SELinux from ever freeing the security
structure for the private inodes. Note that
selinux_inode_free_security() should be safe even for the private
inodes, as it doesn't assume any other initialization beyond the
allocation-time initialization. Patch below.

BTW, Jeff - I assume you know about the unrelated breakage in
SELinux/reiserfs support that was introduced by the changes for atomic
security labeling of inodes in 2.6.14. If you care, you might want to
use the same workaround upstreamed for xfs for 2.6.16. But I understand
that SELinux is not a priority in SuSE presently.

BTW, Chris - are you still serving as LSM maintainer? MAINTAINERS still
lists your osdl address.

---

Remove private inode tests from security_inode_alloc and security_inode_free,
as we otherwise end up leaking inode security structures for private inodes.

Signed-off-by: Stephen Smalley <[email protected]>

---

include/linux/security.h | 4 ----
1 file changed, 4 deletions(-)

Index: linux-2.6/include/linux/security.h
===================================================================
RCS file: /nfshome/pal/CVS/linux-2.6/include/linux/security.h,v
retrieving revision 1.50
diff -u -p -r1.50 security.h
--- linux-2.6/include/linux/security.h 3 Jan 2006 16:36:48 -0000 1.50
+++ linux-2.6/include/linux/security.h 3 Feb 2006 12:50:57 -0000
@@ -1437,15 +1437,11 @@ static inline void security_sb_post_pivo

static inline int security_inode_alloc (struct inode *inode)
{
- if (unlikely (IS_PRIVATE (inode)))
- return 0;
return security_ops->inode_alloc_security (inode);
}

static inline void security_inode_free (struct inode *inode)
{
- if (unlikely (IS_PRIVATE (inode)))
- return;
security_ops->inode_free_security (inode);
}



--
Stephen Smalley
National Security Agency

2006-02-03 15:13:28

by James Morris

[permalink] [raw]
Subject: Re: Size-128 slab leak

On Fri, 3 Feb 2006, Stephen Smalley wrote:

> Signed-off-by: Stephen Smalley <[email protected]>

Acked-by: James Morris <[email protected]>


--
James Morris
<[email protected]>

2006-02-04 01:15:36

by Kevin O'Connor

[permalink] [raw]
Subject: Re: Size-128 slab leak

On Fri, Feb 03, 2006 at 08:21:12AM -0500, Stephen Smalley wrote:
> On Thu, 2006-02-02 at 23:00 -0500, Kevin O'Connor wrote:
> > After running updatedb I got 23530 occurrences of:
> >
> > kernel: obj ffff81003f04f000/12: ffffffff801ed7b7 <selinux_inode_alloc_security+0x37/0x100>
> >
> Hmm...that allocation call occurs upon alloc_inode() via
> security_inode_alloc, and the associated free call occurs upon
> destroy_inode() via security_inode_free. However, when Jeff Mahoney
> introduced the support for "private inodes" (S_PRIVATE flag) to support
> reiserfs xattrs-as-files, he added the IS_PRIVATE guards to both
> security_inode_alloc and security_inode_free. I think that this ends up
> causing SELinux to allocate a security structure for every reiserfs
> inode including private inodes since they are not marked until later by
> reiserfs, while preventing SELinux from ever freeing the security
> structure for the private inodes. Note that
> selinux_inode_free_security() should be safe even for the private
> inodes, as it doesn't assume any other initialization beyond the
> allocation-time initialization. Patch below.

Hi Stephen,

I've applied your patch. It seems to be working. (Multiple runs of
updatedb no longer grow the size-128 slab.)

Thanks,
-Kevin

2006-02-14 04:52:50

by Jeff Mahoney

[permalink] [raw]
Subject: Re: Size-128 slab leak

Stephen Smalley wrote:
> BTW, Jeff - I assume you know about the unrelated breakage in
> SELinux/reiserfs support that was introduced by the changes for atomic
> security labeling of inodes in 2.6.14. If you care, you might want to
> use the same workaround upstreamed for xfs for 2.6.16. But I understand
> that SELinux is not a priority in SuSE presently.

Hi Stephen -

Thanks for the reminder on this one. I have a series of patches that
rework the reiserfs xattr implementation. One of the patches adds
journalling for xattrs, including putting the inherited ACLs in the same
transaction as the object creation. Adding the SELinux attribute to the
same transaction wouldn't be too much of a problem.

I should have a patch tomorrow, though it will depend on the xattr patchset.

-Jeff

--
Jeff Mahoney
SUSE Labs