2006-03-27 05:51:04

by Andi Kleen

[permalink] [raw]
Subject: dcache leak in 2.6.16-git8


A 2GB x86-64 desktop system here is currently swapping itself to death after
a few days uptime.

Some investigation shows this:

inode_cache 1287 1337 568 7 1 : tunables 54 27 8 : slabdata 191 191 0
dentry_cache 1867436 1867643 208 19 1 : tunables 120 60 8 : slabdata 98297 98297 0

Going to reboot it now.

-Andi


2006-03-27 11:43:24

by Bharata B Rao

[permalink] [raw]
Subject: Re: dcache leak in 2.6.16-git8

On Mon, Mar 27, 2006 at 07:50:20AM +0200, Andi Kleen wrote:
>
> A 2GB x86-64 desktop system here is currently swapping itself to death after
> a few days uptime.
>
> Some investigation shows this:
>
> inode_cache 1287 1337 568 7 1 : tunables 54 27 8 : slabdata 191 191 0
> dentry_cache 1867436 1867643 208 19 1 : tunables 120 60 8 : slabdata 98297 98297 0
>

Would it be possible to try out this experimental patch which
gets some stats from the dentry cache ?

Patch 1: dcache_stats.patch
- shows some dentry cache stats in /proc/meminfo.

Patch 2: cache_shrink_stats.patch (currently in mm)
- applies on top of the 1st patch; shows the of shrinking of
slab caches in /proc/slabinfo.

The patches are against 2.6.16-git8. I have just verified that these
work on x86_64. Haven't really stress tested to check if they hold for
reasonable time.

Regards,
Bharata.


Attachments:
(No filename) (932.00 B)
dcache_stats.patch (14.54 kB)
cache_shrink_stats.patch (8.38 kB)
Download all attachments

2006-03-27 16:22:48

by Andi Kleen

[permalink] [raw]
Subject: Re: dcache leak in 2.6.16-git8

On Monday 27 March 2006 13:48, Bharata B Rao wrote:
> On Mon, Mar 27, 2006 at 07:50:20AM +0200, Andi Kleen wrote:
> >
> > A 2GB x86-64 desktop system here is currently swapping itself to death after
> > a few days uptime.
> >
> > Some investigation shows this:
> >
> > inode_cache 1287 1337 568 7 1 : tunables 54 27 8 : slabdata 191 191 0
> > dentry_cache 1867436 1867643 208 19 1 : tunables 120 60 8 : slabdata 98297 98297 0
> >
>
> Would it be possible to try out this experimental patch which
> gets some stats from the dentry cache ?

It should be trivial to reproduce by other people. Biggest workload
is kernel compiles and quilt.

After a few hours with -git12 it's already at

dentry_cache 947013 952014 208 19 1 : tunables 120 60 8 : slabdata 50100 50106 480

and starting to go into swap.

I can't imagine I'm the only one seeing this?

I have a few x86-64 patches applied too, but they don't change anything
in this area.

-Andi

2006-03-28 03:00:48

by Andrew Morton

[permalink] [raw]
Subject: Re: dcache leak in 2.6.16-git8

Andi Kleen <[email protected]> wrote:
>
> On Monday 27 March 2006 13:48, Bharata B Rao wrote:
> > On Mon, Mar 27, 2006 at 07:50:20AM +0200, Andi Kleen wrote:
> > >
> > > A 2GB x86-64 desktop system here is currently swapping itself to death after
> > > a few days uptime.
> > >
> > > Some investigation shows this:
> > >
> > > inode_cache 1287 1337 568 7 1 : tunables 54 27 8 : slabdata 191 191 0
> > > dentry_cache 1867436 1867643 208 19 1 : tunables 120 60 8 : slabdata 98297 98297 0
> > >
> >
> > Would it be possible to try out this experimental patch which
> > gets some stats from the dentry cache ?
>
> It should be trivial to reproduce by other people. Biggest workload
> is kernel compiles and quilt.
>
> After a few hours with -git12 it's already at
>
> dentry_cache 947013 952014 208 19 1 : tunables 120 60 8 : slabdata 50100 50106 480
>
> and starting to go into swap.
>
> I can't imagine I'm the only one seeing this?
>
> I have a few x86-64 patches applied too, but they don't change anything
> in this area.

I don't think I can reproduce this on x86 uniproc. (avtab_node_cache
is a different story - maintainers separately pinged).

I'd expect pretty much everything we have in there now was under test in
-mm for quite some time - any obvious leaks would have been noticed. I'd
be suspecting recent changes in perhaps audit or nfs, at a guess. Or
something weird.

Which filesystems are in use?

2006-03-28 07:20:39

by Andi Kleen

[permalink] [raw]
Subject: Re: dcache leak in 2.6.16-git8

On Tuesday 28 March 2006 05:00, Andrew Morton wrote:
> Andi Kleen <[email protected]> wrote:
> >
> > On Monday 27 March 2006 13:48, Bharata B Rao wrote:
> > > On Mon, Mar 27, 2006 at 07:50:20AM +0200, Andi Kleen wrote:
> > > >
> > > > A 2GB x86-64 desktop system here is currently swapping itself to death after
> > > > a few days uptime.
> > > >
> > > > Some investigation shows this:
> > > >
> > > > inode_cache 1287 1337 568 7 1 : tunables 54 27 8 : slabdata 191 191 0
> > > > dentry_cache 1867436 1867643 208 19 1 : tunables 120 60 8 : slabdata 98297 98297 0
> > > >
> > >
> > > Would it be possible to try out this experimental patch which
> > > gets some stats from the dentry cache ?
> >
> > It should be trivial to reproduce by other people. Biggest workload
> > is kernel compiles and quilt.
> >
> > After a few hours with -git12 it's already at
> >
> > dentry_cache 947013 952014 208 19 1 : tunables 120 60 8 : slabdata 50100 50106 480
> >
> > and starting to go into swap.
> >
> > I can't imagine I'm the only one seeing this?
> >
> > I have a few x86-64 patches applied too, but they don't change anything
> > in this area.
>
> I don't think I can reproduce this on x86 uniproc. (avtab_node_cache
> is a different story - maintainers separately pinged).

This is x86-64 dual core.

>
> I'd expect pretty much everything we have in there now was under test in
> -mm for quite some time - any obvious leaks would have been noticed. I'd
> be suspecting recent changes in perhaps audit or nfs, at a guess. Or
> something weird.
>
> Which filesystems are in use?

ext3, NFS (not much)

-Andi

2006-03-29 22:27:07

by Andi Kleen

[permalink] [raw]
Subject: Re: dcache leak in 2.6.16-git8 II

On Tuesday 28 March 2006 05:00, Andrew Morton wrote:
> Andi Kleen <[email protected]> wrote:
> >
> > On Monday 27 March 2006 13:48, Bharata B Rao wrote:
> > > On Mon, Mar 27, 2006 at 07:50:20AM +0200, Andi Kleen wrote:
> > > >
> > > > A 2GB x86-64 desktop system here is currently swapping itself to death after
> > > > a few days uptime.
> > > >
> > > > Some investigation shows this:
> > > >
> > > > inode_cache 1287 1337 568 7 1 : tunables 54 27 8 : slabdata 191 191 0
> > > > dentry_cache 1867436 1867643 208 19 1 : tunables 120 60 8 : slabdata 98297 98297 0
> > > >
> > >
> > > Would it be possible to try out this experimental patch which
> > > gets some stats from the dentry cache ?
> >
> > It should be trivial to reproduce by other people. Biggest workload
> > is kernel compiles and quilt.
> >
> > After a few hours with -git12 it's already at
> >
> > dentry_cache 947013 952014 208 19 1 : tunables 120 60 8 : slabdata 50100 50106 480
> >
> > and starting to go into swap.
> >
> > I can't imagine I'm the only one seeing this?
> >
> > I have a few x86-64 patches applied too, but they don't change anything
> > in this area.
>
> I don't think I can reproduce this on x86 uniproc. (avtab_node_cache
> is a different story - maintainers separately pinged).


I got another OOM from this with -git13. Unfortunately it seems to take a few days at least
to trigger.

dentry_cache 999168 1024594 208 19 1 : tunables 120 60 8 : slabdata 53926 53926 0 : shrinker stat 18522624 8871000

Hrm interesting is this one:

sock_inode_cache 996784 996805 704 5 1 : tunables 54 27 8 : slabdata 199361 199361 0

Most of the leaked dentries seem to be sockets. I didn't notice this earlier.

This was with the debugging patches applied btw.

So maybe we have a socket leak?

I still got a copy of the /proc in case anybody wants more information.

-Andi

meminfo debugging was

r_dentries/page nr_pages nr_inuse
0 0 0
1 139 139
2 115 230
3 99 297
4 103 412
5 84 419
6 80 480
7 75 519
8 65 520
9 52 466
10 58 580
11 55 605
12 73 868
13 124 1610
14 184 2576
15 313 4685
16 518 8288
17 1338 22744
18 4591 82633
19 45843 870758
20 0 0
21 0 0
22 0 0
23 0 0
24 0 0
25 0 0
26 0 0
27 0 0
28 0 0
29 0 0
Total: 53909 998829
dcache lru: total 232 inuse 1


2006-03-29 22:48:08

by Andrew Morton

[permalink] [raw]
Subject: Re: dcache leak in 2.6.16-git8 II

Andi Kleen <[email protected]> wrote:
>
> On Tuesday 28 March 2006 05:00, Andrew Morton wrote:
> > Andi Kleen <[email protected]> wrote:
> > >
> > > On Monday 27 March 2006 13:48, Bharata B Rao wrote:
> > > > On Mon, Mar 27, 2006 at 07:50:20AM +0200, Andi Kleen wrote:
> > > > >
> > > > > A 2GB x86-64 desktop system here is currently swapping itself to death after
> > > > > a few days uptime.
> > > > >
> > > > > Some investigation shows this:
> > > > >
> > > > > inode_cache 1287 1337 568 7 1 : tunables 54 27 8 : slabdata 191 191 0
> > > > > dentry_cache 1867436 1867643 208 19 1 : tunables 120 60 8 : slabdata 98297 98297 0
> > > > >
> > > >
> > > > Would it be possible to try out this experimental patch which
> > > > gets some stats from the dentry cache ?
> > >
> > > It should be trivial to reproduce by other people. Biggest workload
> > > is kernel compiles and quilt.
> > >
> > > After a few hours with -git12 it's already at
> > >
> > > dentry_cache 947013 952014 208 19 1 : tunables 120 60 8 : slabdata 50100 50106 480
> > >
> > > and starting to go into swap.
> > >
> > > I can't imagine I'm the only one seeing this?
> > >
> > > I have a few x86-64 patches applied too, but they don't change anything
> > > in this area.
> >
> > I don't think I can reproduce this on x86 uniproc. (avtab_node_cache
> > is a different story - maintainers separately pinged).
>
>
> I got another OOM from this with -git13. Unfortunately it seems to take a few days at least
> to trigger.
>
> dentry_cache 999168 1024594 208 19 1 : tunables 120 60 8 : slabdata 53926 53926 0 : shrinker stat 18522624 8871000
>
> Hrm interesting is this one:
>
> sock_inode_cache 996784 996805 704 5 1 : tunables 54 27 8 : slabdata 199361 199361 0
>
> Most of the leaked dentries seem to be sockets. I didn't notice this earlier.
>
> This was with the debugging patches applied btw.
>
> So maybe we have a socket leak?

It looks that way. Didn't someone else report a sock_inode_cache leak?

> I still got a copy of the /proc in case anybody wants more information.

We have this fancy new /proc/slab_allocators now, it might show something
interesting. It needs CONFIG_DEBUG_SLAB_LEAK.

2006-03-29 22:53:59

by Andi Kleen

[permalink] [raw]
Subject: Re: dcache leak in 2.6.16-git8 II

On Thursday 30 March 2006 00:50, Andrew Morton wrote:

> It looks that way. Didn't someone else report a sock_inode_cache leak?

Didn't see it.

> > I still got a copy of the /proc in case anybody wants more information.
>
> We have this fancy new /proc/slab_allocators now, it might show something
> interesting. It needs CONFIG_DEBUG_SLAB_LEAK.

I didn't have that enabled unfortunately. I can try it on the next round.

-Andi

2006-03-30 06:43:40

by Balbir Singh

[permalink] [raw]
Subject: Re: dcache leak in 2.6.16-git8 II

On Thu, Mar 30, 2006 at 12:53:24AM +0200, Andi Kleen wrote:
> On Thursday 30 March 2006 00:50, Andrew Morton wrote:
>
> > It looks that way. Didn't someone else report a sock_inode_cache leak?
>
> Didn't see it.
>
> > > I still got a copy of the /proc in case anybody wants more information.
> >
> > We have this fancy new /proc/slab_allocators now, it might show something
> > interesting. It needs CONFIG_DEBUG_SLAB_LEAK.
>
> I didn't have that enabled unfortunately. I can try it on the next round.
>
> -Andi
>

There is also a new sysctl to drop caches. It is called vm.drop_caches.
It will be interesting to see if it is able to free up some dcache memory
for you.

Balbir

2006-03-30 09:51:05

by Al Viro

[permalink] [raw]
Subject: Re: dcache leak in 2.6.16-git8 II

On Thu, Mar 30, 2006 at 12:26:58AM +0200, Andi Kleen wrote:
> dentry_cache 999168 1024594 208 19 1 : tunables 120 60 8 : slabdata 53926 53926 0 : shrinker stat 18522624 8871000
>
> Hrm interesting is this one:
>
> sock_inode_cache 996784 996805 704 5 1 : tunables 54 27 8 : slabdata 199361 199361 0
>
> Most of the leaked dentries seem to be sockets. I didn't notice this earlier.

ITYM "all". You've got 2384 non-socket dentries, which is about what I'd
expect on severely pressured busy system...

> This was with the debugging patches applied btw.
>
> So maybe we have a socket leak?

Looks like that. Note: /proc/slab_allocators won't help here; all allocations
into that cache are done from sock_alloc_inode(), which is what will be shown.
Not useful... Moreover, call chain is predictable several steps deeper than
that: sock_alloc_inode() (as ->alloc_inode()) from alloc_inode() from
new_inode() from sock_alloc().

FWIW... One thing that might be useful here:

a) slab_set_creator(objp, cachep, address): no-op unless DEBUG_SLAB_LEAK set,
void slab_set_creator(void *objp, struct kmem_cache *cachep, void *address)
{
if (cachep->flags & SLAB_STORE_USER)
*dbg_userword(cachep, objp) = address;
}
otherwise (has to be function in mm/slab.c; exported).

b)
void slab_charge_here(void *objp, struct kmem_cache *cachep, void *address)
{
slab_set_creator(objp, cachep, __builtin_return_address(0));
}
in mm/slab.c (exported)

c) #define slab_charge_caller(objp, cachep) \
slab_set_creator((objp), (cachep), __builtin_return_address(0))


Then we can do the following: in sock_alloc() have
slab_charge_caller(container_of(inode, struct socket_alloc, vfs_inode),
sock_inode_cachep);

and _then_ /proc/slab_allocators will charge these guys to callers of
sock_alloc(); if you'll need to pursue it further, you can always slap
more slab_charge_...() where needed.

2006-03-30 10:13:05

by Al Viro

[permalink] [raw]
Subject: Re: dcache leak in 2.6.16-git8 II

On Thu, Mar 30, 2006 at 10:50:48AM +0100, Al Viro wrote:
> FWIW... One thing that might be useful here:

Here's what I had in mind:

Allow explictly mark allocated objects as "allocated here", so that they'll
show up that way for all slab debugging purposes. New helpers:
slab_charge_here(objp, cachep)
slab_charge_caller(objp, cachep)
mark object as allocated resp. by place where we have ...charge_here() called
and by the caller of function that calls slab_charge_caller().

It's useful when call chain leading to allocation in given cache always
ends the same way, making normal caller accounting uninformative. E.g.
allocation of struct socket is always done via sock_alloc() => new_inode() =>
alloc_inode() => sock_alloc_inode() => kmem_cache_alloc(). The last step
has no chance to give any useful information about the caller; adding
slab_charge_caller() in sock_alloc() will give us much more useful picture.

Signed-off-by: Al Viro <[email protected]>
----

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 3af03b1..6cc2f96 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -151,6 +151,16 @@ static inline void *kcalloc(size_t n, si
extern void kfree(const void *);
extern unsigned int ksize(const void *);

+#ifndef CONFIG_DEBUG_SLAB
+#define slab_set_creator(objp, cachep, address)
+#define slab_charge_here(objp, cachep)
+#else
+extern void slab_set_creator(void *objp, struct kmem_cache *cachep, void *address);
+extern void slab_charge_here(void *objp, struct kmem_cache *cachep);
+#endif
+#define slab_charge_caller(objp, cachep) \
+ slab_set_creator((objp), (cachep), __builtin_return_address(0))
+
#ifdef CONFIG_NUMA
extern void *kmem_cache_alloc_node(kmem_cache_t *, gfp_t flags, int node);
extern void *kmalloc_node(size_t size, gfp_t flags, int node);
@@ -189,6 +199,10 @@ void kfree(const void *m);
unsigned int ksize(const void *m);
unsigned int kmem_cache_size(struct kmem_cache *c);

+#define slab_set_creator(objp, cachep, address)
+#define slab_charge_here(objp, cachep)
+#define slab_charge_caller(objp, cachep)
+
static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
{
return __kzalloc(n * size, flags);
diff --git a/mm/slab.c b/mm/slab.c
index 4cbf8bb..db21301 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3144,6 +3144,23 @@ void *kmem_cache_zalloc(struct kmem_cach
}
EXPORT_SYMBOL(kmem_cache_zalloc);

+#ifdef CONFIG_DEBUG_SLAB
+void slab_set_creator(void *objp, struct kmem_cache *cachep, void *address)
+{
+ if (cachep->flags & SLAB_STORE_USER)
+ *dbg_userword(cachep, objp) = address;
+}
+
+EXPORT_SYMBOL(slab_set_creator);
+
+void slab_charge_here(void *objp, struct kmem_cache *cachep)
+{
+ slab_set_creator(objp, cachep, __builtin_return_address(0));
+}
+EXPORT_SYMBOL(slab_charge_here);
+
+#endif
+
/**
* kmem_ptr_validate - check if an untrusted pointer might
* be a slab entry.
diff --git a/net/socket.c b/net/socket.c
index fcd77ea..0c4d61b 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -517,6 +517,9 @@ static struct socket *sock_alloc(void)
if (!inode)
return NULL;

+ slab_charge_caller(container_of(inode, struct socket_alloc, vfs_inode),
+ sock_inode_cachep);
+
sock = SOCKET_I(inode);

inode->i_mode = S_IFSOCK|S_IRWXUGO;

2006-03-30 10:36:57

by Al Viro

[permalink] [raw]
Subject: Re: dcache leak in 2.6.16-git8 II

BTW, it allows even funnier stuff: instead of "I'd been allocated by <place>"
you can do "I'd passed through <place>". E.g. if object has different states
you can slap slab_charge_here() in state transitions and /proc/slab_allocators
will count them separately, showing how many objects are in which state, etc.