2006-03-26 21:16:09

by Adrian Bridgett

[permalink] [raw]
Subject: 2.6.16-mm1 leaks in dvb playback

I've had this problem for a little while (probably since 2.6.14/15
era) but I've only recently spent some time to figure out what's been
going wrong.

There is a patch in the -mm series which causes leaks in both
sock_inode_cache and dentry_cache for me during DVB playback (thanks
to slabtop).

2.6.16 is fine (no leakage), 2.6.16-mm1 has this problem (~ 2MB/s in
each cache).

I'm using dvbstream and sending the output to /dev/null, dvb modules
loaded are dvb_usb_vp7045, dvb_usb, dvb_core, dvb_pll. It's an EHCI USB
device running on a Dell D600 latitude.

turning on some debugging and looking at /proc/slab_allocators and
/proc/page_owners shows that the most prevalent page owners are:

(5363 out of 5631)
Page allocated via order 0, mask 0xd0
[0xc0161079] poison_obj+41
[0xc0162355] cache_alloc_refill+981
[0xc0161889] cache_alloc_debugcheck_after+169
[0xc01d5169] vsnprintf+857
[0xc0247eec] lock_sock+204
[0xc0244999] sock_alloc_inode+25
[0xc0161f73] kmem_cache_alloc+131
[0xc027a4b4] inet_csk_accept+436

(1989 out of 2734)
Page allocated via order 0, mask 0xd0
[0xc0162355] cache_alloc_refill+981
[0xc0161079] poison_obj+41
[0xc0161079] poison_obj+41
[0xc01d5169] vsnprintf+857
[0xc0182341] d_alloc+33
[0xc0161f73] kmem_cache_alloc+131
[0xc0182341] d_alloc+33
[0xc02461e0] sock_attach_fd+96

I don't normally read LKML so best to Cc me in case I forget :-)

Many thanks,

Adrian


2006-03-28 01:21:38

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback

Adrian Bridgett <[email protected]> wrote:
>
> I've had this problem for a little while (probably since 2.6.14/15
> era) but I've only recently spent some time to figure out what's been
> going wrong.
>
> There is a patch in the -mm series which causes leaks in both
> sock_inode_cache and dentry_cache for me during DVB playback (thanks
> to slabtop).
>
> 2.6.16 is fine (no leakage), 2.6.16-mm1 has this problem (~ 2MB/s in
> each cache).

Do you mean that the problem has been present in -mm kernels since the
2.6.14/15 timeframe, and not in mainline?

> I'm using dvbstream and sending the output to /dev/null, dvb modules
> loaded are dvb_usb_vp7045, dvb_usb, dvb_core, dvb_pll. It's an EHCI USB
> device running on a Dell D600 latitude.
>
> turning on some debugging and looking at /proc/slab_allocators and
> /proc/page_owners shows that the most prevalent page owners are:
>
> (5363 out of 5631)
> Page allocated via order 0, mask 0xd0
> [0xc0161079] poison_obj+41
> [0xc0162355] cache_alloc_refill+981
> [0xc0161889] cache_alloc_debugcheck_after+169
> [0xc01d5169] vsnprintf+857
> [0xc0247eec] lock_sock+204
> [0xc0244999] sock_alloc_inode+25
> [0xc0161f73] kmem_cache_alloc+131
> [0xc027a4b4] inet_csk_accept+436
>
> (1989 out of 2734)
> Page allocated via order 0, mask 0xd0
> [0xc0162355] cache_alloc_refill+981
> [0xc0161079] poison_obj+41
> [0xc0161079] poison_obj+41
> [0xc01d5169] vsnprintf+857
> [0xc0182341] d_alloc+33
> [0xc0161f73] kmem_cache_alloc+131
> [0xc0182341] d_alloc+33
> [0xc02461e0] sock_attach_fd+96
>

Strange. Are you sure that they really leak? Doing

echo 3 > /proc/sys/vm/drop_caches

doesn't make them go away?

2006-03-28 07:02:34

by Adrian Bridgett

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback

On Mon, Mar 27, 2006 at 17:23:56 -0800 (-0800), Andrew Morton wrote:
>
> Do you mean that the problem has been present in -mm kernels since the
> 2.6.14/15 timeframe, and not in mainline?

Correct.

> Strange. Are you sure that they really leak? Doing
>
> echo 3 > /proc/sys/vm/drop_caches
>
> doesn't make them go away?

dentry_cache drops a little bit, but the vast majority stays.
sock_inode_cache I didn't notice drop. If I don't reboot every
15/20mins the machine suddenly starts thrashing like mad and then
effectively locks up :-(

Last night I tried reverting the dvb-core ringbuffer part of -mm1 and
that didn't seem to help at all.

I've just tried 2.6.16 with just the origin.patch from -mm1 and that
has the same leak in it. So it looks like I should have spotted this
earlier before it was pushed into 2.6.16+ Just double checked and
in 2.5.16 sock_inode_cache isn't even on the slabtop screen.

I suppose that leads to a new question - what's the easiest way to
start to break down origin.patch and do you know of any likely
culprits? I see Andi Kleen has seen dentry_cache leaking on x86_64
(this machine is x86(_32) uni processor.

Thanks for your help,

Adrian

2006-03-28 07:16:36

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback

adrian <[email protected]> wrote:
>
> On Mon, Mar 27, 2006 at 17:23:56 -0800 (-0800), Andrew Morton wrote:
> >
> > Do you mean that the problem has been present in -mm kernels since the
> > 2.6.14/15 timeframe, and not in mainline?
>
> Correct.
>
> > Strange. Are you sure that they really leak? Doing
> >
> > echo 3 > /proc/sys/vm/drop_caches
> >
> > doesn't make them go away?
>
> dentry_cache drops a little bit, but the vast majority stays.
> sock_inode_cache I didn't notice drop. If I don't reboot every
> 15/20mins the machine suddenly starts thrashing like mad and then
> effectively locks up :-(
>
> Last night I tried reverting the dvb-core ringbuffer part of -mm1 and
> that didn't seem to help at all.
>
> I've just tried 2.6.16 with just the origin.patch from -mm1 and that
> has the same leak in it. So it looks like I should have spotted this
> earlier before it was pushed into 2.6.16+ Just double checked and
> in 2.5.16 sock_inode_cache isn't even on the slabtop screen.
>
> I suppose that leads to a new question - what's the easiest way to
> start to break down origin.patch and do you know of any likely
> culprits? I see Andi Kleen has seen dentry_cache leaking on x86_64
> (this machine is x86(_32) uni processor.
>

It's unlikely that the sock_inode_cache leak is related to the dcache leak,
but we won't know until we know...

As for breaking down origin.patch: that's all in Linus's tree now, so
git-bisect is the way to do that.

As for a culprit: don't know, sorry. I'd be surprised if there were _any_
patches which were in 2.6.14-mmX all the way through to 2.6.16-rc6-mmx and
which suddenly got merged into 2.6.16-git. Maybe someone was sitting on
something for that long in one of the git trees, but I'd be surprised.

2006-03-28 07:23:46

by Adrian Bridgett

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback

On Tue, Mar 28, 2006 at 08:02:20 +0100 (+0100), adrian wrote:
> I suppose that leads to a new question - what's the easiest way to
> start to break down origin.patch and do you know of any likely

I suppose starting with 2.6.15-rc5-mm3 would be a good start (I'll
double check 2.6.15-rc5 + origin patch from 2.6.15-rc5-mm3 is okay and
then bisect the remainder).

Adrian

2006-03-29 23:37:28

by Adrian Bridgett

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback

On Mon, Mar 27, 2006 at 23:16:30 -0800 (-0800), Andrew Morton wrote:
> It's unlikely that the sock_inode_cache leak is related to the dcache leak,
> but we won't know until we know...

Looks like this might be the same issus as "dcache leak in 2.6.16-git8
(II)"...

I think I've found the patch which causes the leak - it was the
"use fget_light() in net/socket.c" patch. I can't see anything
obviously wrong, although the patch changes the code so that in
sys_sendto and sys_recvfrom it now does a sockfd_put(sock) if the
sock_from_file call fails which didn't use to happen. That seems to
agree more with other bits of code, but I've no idea what is the right
thing todo.

One item I spotted whilst perusing the code is that in net/core/sock.c
in compat_sock_common_getsockopt, it checks if
sk->sk_prot->compat_setsockopt is NULL before calling
sk->sk_prot->compat_getsockopt (set vs get).

I'll try and confirm tomorrow with a nice fresh build. The command
I'm using to test is "dvbstream -f 650166.670 -v 570 -a 571 -o >
/dev/null"

Thanks,

Adrian

2006-03-30 00:05:04

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback

adrian <[email protected]> wrote:
>
> On Mon, Mar 27, 2006 at 23:16:30 -0800 (-0800), Andrew Morton wrote:
> > It's unlikely that the sock_inode_cache leak is related to the dcache leak,
> > but we won't know until we know...
>
> Looks like this might be the same issus as "dcache leak in 2.6.16-git8
> (II)"...
>
> I think I've found the patch which causes the leak - it was the
> "use fget_light() in net/socket.c" patch. I can't see anything
> obviously wrong, although the patch changes the code so that in
> sys_sendto and sys_recvfrom it now does a sockfd_put(sock) if the
> sock_from_file call fails which didn't use to happen. That seems to
> agree more with other bits of code, but I've no idea what is the right
> thing todo.

OK, thanks, that helps heaps.

The code does look OK, so it that's the source of the leak then something
subtle might be happening.

If it _is_ a fget/fput thing then I'd expect files_cache to be leaking too.

> One item I spotted whilst perusing the code is that in net/core/sock.c
> in compat_sock_common_getsockopt, it checks if
> sk->sk_prot->compat_setsockopt is NULL before calling
> sk->sk_prot->compat_getsockopt (set vs get).

Ah. I guess nobody ever implements compat_getsockopt without also
implementing compat_setsockopt but yes, it'd be tidier to actually check
the thing we're about to call ;)

> I'll try and confirm tomorrow with a nice fresh build. The command
> I'm using to test is "dvbstream -f 650166.670 -v 570 -a 571 -o >
> /dev/null"

Thanks again.

2006-03-30 00:45:29

by Adrian Bridgett

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback

On Wed, Mar 29, 2006 at 16:06:48 -0800 (-0800), Andrew Morton wrote:
> adrian <[email protected]> wrote:
> >
> > On Mon, Mar 27, 2006 at 23:16:30 -0800 (-0800), Andrew Morton wrote:
> > > It's unlikely that the sock_inode_cache leak is related to the dcache leak,
> > > but we won't know until we know...
> >
> > Looks like this might be the same issus as "dcache leak in 2.6.16-git8
> > (II)"...
> >
> > I think I've found the patch which causes the leak - it was the
> > "use fget_light() in net/socket.c" patch. I can't see anything
> > I'll try and confirm tomorrow with a nice fresh build. The command

Well I've just tried 2.6.11-mm1 with that patch reverted and it still
leaks so I must be mistaken I'm afraid. I'll go and trawl through the
builds I've done and try a simpler config as I reverted back to a
bigger config to confirm it.

Sorry - I had gone through carefully and that was the one that made
the difference. Back to some more builds :-(

Adrian

2006-03-30 22:58:42

by Adrian Bridgett

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback (found)

What I thought was just one patch was actually two and it was the
other patch causing the problem - "Do not lose accepted socket when
-ENFILE/-EMFILE".

Most of the patch seems to be just a restructuring - I guess that
leaves the sys_accept changes that are leaking the memory.

I've no idea how the code is supposed to work, so large rocks of salt
required :-) The code now does a sock_alloc_fd, and the error cases
now do a "put_filp" and "put_unused_fd" if the alloc succeeded.
However in the normal case, nothing gets freed (I guess that's the
memory leak). OTOH the description of the patch is:

"Try to allocate the struct file and an unused file
descriptor before we try to pull a newly accepted
socket out of the protocol layer."

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=39d8c1b6fbaeb8d6adec4a8c08365cc9eaca6ae4

Cheers,

Adrian

2006-03-30 23:11:37

by Adrian Bridgett

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback (found)

On Thu, Mar 30, 2006 at 23:58:30 +0100 (+0100), adrian wrote:
> What I thought was just one patch was actually two and it was the
> other patch causing the problem - "Do not lose accepted socket when
> -ENFILE/-EMFILE".

Hmm - it looks like it was meant to be reverted in 2.6.16-rc1-mm4,5 FWIW.

Adrian

2006-03-30 23:28:07

by David Miller

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback (found)

From: Adrian Bridgett <[email protected]>
Date: Fri, 31 Mar 2006 00:11:31 +0100

> On Thu, Mar 30, 2006 at 23:58:30 +0100 (+0100), adrian wrote:
> > What I thought was just one patch was actually two and it was the
> > other patch causing the problem - "Do not lose accepted socket when
> > -ENFILE/-EMFILE".
>
> Hmm - it looks like it was meant to be reverted in 2.6.16-rc1-mm4,5 FWIW.

So is the current version in Linus's tree causing this problem?

Yes, in the middle while initially doing development on that patch
there were many bad leaks, but later on those were all fixed.

2006-03-31 01:22:41

by Andi Kleen

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback (found)

On Thu, Mar 30, 2006 at 03:28:21PM -0800, David S. Miller wrote:
> From: Adrian Bridgett <[email protected]>
> Date: Fri, 31 Mar 2006 00:11:31 +0100
>
> > On Thu, Mar 30, 2006 at 23:58:30 +0100 (+0100), adrian wrote:
> > > What I thought was just one patch was actually two and it was the
> > > other patch causing the problem - "Do not lose accepted socket when
> > > -ENFILE/-EMFILE".
> >
> > Hmm - it looks like it was meant to be reverted in 2.6.16-rc1-mm4,5 FWIW.
>
> So is the current version in Linus's tree causing this problem?

I don't know if it's this patch, but I definitely see a socket
leak here with linus tree at least since -git8. It unfortunately takes a day
or two to really trigger, so binary search would be difficult.

-Andi

2006-03-31 07:29:18

by Adrian Bridgett

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback (found)

On Fri, Mar 31, 2006 at 03:22:35 +0200 (+0200), Andi Kleen wrote:
> On Thu, Mar 30, 2006 at 03:28:21PM -0800, David S. Miller wrote:
> > From: Adrian Bridgett <[email protected]>
> > Date: Fri, 31 Mar 2006 00:11:31 +0100
> >
> > > Hmm - it looks like it was meant to be reverted in 2.6.16-rc1-mm4,5 FWIW.
> >
> > So is the current version in Linus's tree causing this problem?

I'm taking 2.6.16(.0) adding -mm1. When running dvbstream I get
dentry_cache and sock_inode_cache leaking about 4MB/s.

I then revert this ENFILE/EMFILE patch and both leaks stop.

I've just compiled up 2.6.16-git18 and that leaks in an identical
manner. I'm suprised no-one else has seen it, so I've been putting
it down to the specific hardware (dvb-usb-vp7045), but then I saw that
it was a bad memory leak and then that 2.6.16 was fine and finally
started trying to isolate the patch that broke it for me (tm).

Maybe it's just exposed a bug in dvb-usb-vp7045, but given that it
appears to add a sock_alloc_fd without any matching deallocate code
AFAICT...

Adrian

2006-03-31 07:48:04

by David Miller

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback (found)

From: Adrian Bridgett <[email protected]>
Date: Fri, 31 Mar 2006 08:28:59 +0100

> On Fri, Mar 31, 2006 at 03:22:35 +0200 (+0200), Andi Kleen wrote:
> > On Thu, Mar 30, 2006 at 03:28:21PM -0800, David S. Miller wrote:
> > > From: Adrian Bridgett <[email protected]>
> > > Date: Fri, 31 Mar 2006 00:11:31 +0100
> > >
> > > > Hmm - it looks like it was meant to be reverted in 2.6.16-rc1-mm4,5 FWIW.
> > >
> > > So is the current version in Linus's tree causing this problem?
>
> I'm taking 2.6.16(.0) adding -mm1. When running dvbstream I get
> dentry_cache and sock_inode_cache leaking about 4MB/s.
>
> I then revert this ENFILE/EMFILE patch and both leaks stop.

As I stated, there was a bug in the initial patch, which subsequent
patches fix.

Can you try Linus's current tree to see if the problem is there?

2006-03-31 09:54:56

by Adrian Bridgett

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback (found)

On Thu, Mar 30, 2006 at 23:48:23 -0800 (-0800), David S. Miller wrote:
> As I stated, there was a bug in the initial patch, which subsequent
> patches fix.
>
> Can you try Linus's current tree to see if the problem is there?

2-6-16-git18 still has the problem (30th March).

Adrian

2006-03-31 10:08:17

by David Miller

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback (found)

From: Adrian Bridgett <[email protected]>
Date: Fri, 31 Mar 2006 10:54:43 +0100

> On Thu, Mar 30, 2006 at 23:48:23 -0800 (-0800), David S. Miller wrote:
> > As I stated, there was a bug in the initial patch, which subsequent
> > patches fix.
> >
> > Can you try Linus's current tree to see if the problem is there?
>
> 2-6-16-git18 still has the problem (30th March).

Strange, can you strace the process and follow the socket
operations your application performs? Something unique
is occuring in that app since there have not been other
reports of this problem that I am aware of.

Thanks.

2006-03-31 18:47:44

by Adrian Bridgett

[permalink] [raw]
Subject: Re: 2.6.16-mm1 leaks in dvb playback (found)

On Fri, Mar 31, 2006 at 02:07:27 -0800 (-0800), David S. Miller wrote:
> Strange, can you strace the process and follow the socket
> operations your application performs? Something unique
> is occuring in that app since there have not been other
> reports of this problem that I am aware of.

The strace output is the same when run under both kernels (a leaking
one and a non-leaking one):

7384 socket(PF_FILE, SOCK_STREAM, 0) = 7
7384 connect(7, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = 0
7384 recvmsg(7, {msg_name(0)=NULL, msg_iov(1)=[{"hosts\0", 6}],
msg_controllen=16, {cmsg_len=16, cmsg_level=SOL_SOCKET,
cmsg_type=SCM_RIGHTS, {8}}, msg_flags=0}, MSG_NOSIGNAL) = 6
7384 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 7
7384 setsockopt(7, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
7384 bind(7, {sa_family=AF_INET, sin_port=htons(12345), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
7384 listen(7, 1) = 0
7384 accept(7, 0x81546e4, [0]) = -1 EAGAIN (Resource temporarily unavailable)
7384 accept(7, 0x81546e4, [0]) = -1 EAGAIN (Resource temporarily unavai
[repeated lots and lots ....]

So I guess I was _completely_ wrong and that it's the failure case
that is leaking.

It also looks like dvbstream is being a tad silly so I'll go and have
a look at that (userspace stuff I stand a chance with).

Thanks,

Adrian