2011-04-17 16:11:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: [105/105] nfsd4: fix oops on lock failure

On Sun, Apr 17, 2011 at 8:03 AM, OGAWA Hirofumi
<[email protected]> wrote:
>
> I'm looking filp leak on recent kernel. Well, anyway,
> 23fcf2ec93fb8573a653408316af599939ff9a8e is strange, and I think it can
> be one of causes.

Hmm. Your patch looks correct to me. Added Neil and linux-nfs.

Bruce? Neil?

Linus

---
> [PATCH] nfsd4: Fix filp leak
>
> 23fcf2ec93fb8573a653408316af599939ff9a8e (nfsd4: fix oops on lock failure)
>
> The above patch breaks free path for stp->st_file. If stp was inserted
> into sop->so_stateids, we have to free stp->st_file refcount. Because
> stp->st_file refcount itself is taken unrelated to stp->st_file->fi_fds[].
>
> Signed-off-by: OGAWA Hirofumi <[email protected]>
> ---
>
> ?fs/nfsd/nfs4state.c | ? ?2 +-
> ?1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff -puN fs/nfsd/nfs4state.c~nfsd4-filp-leak-fix fs/nfsd/nfs4state.c
> --- linux-2.6/fs/nfsd/nfs4state.c~nfsd4-filp-leak-fix ? 2011-04-17 20:45:45.000000000 +0900
> +++ linux-2.6-hirofumi/fs/nfsd/nfs4state.c ? ? ?2011-04-17 20:59:53.000000000 +0900
> @@ -402,8 +402,8 @@ static void free_generic_stateid(struct
> ? ? ? ?if (stp->st_access_bmap) {
> ? ? ? ? ? ? ? ?oflag = nfs4_access_bmap_to_omode(stp);
> ? ? ? ? ? ? ? ?nfs4_file_put_access(stp->st_file, oflag);
> - ? ? ? ? ? ? ? put_nfs4_file(stp->st_file);
> ? ? ? ?}
> + ? ? ? put_nfs4_file(stp->st_file);
> ? ? ? ?kmem_cache_free(stateid_slab, stp);
> ?}
>
> _
> --
> OGAWA Hirofumi <[email protected]>
>


2011-04-18 21:12:34

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [105/105] nfsd4: fix oops on lock failure

OGAWA Hirofumi <[email protected]> writes:

> OGAWA Hirofumi <[email protected]> writes:
>
>>> commit 5152c8a947359758862d4631863e68e83ec01048
>>> Author: J. Bruce Fields <[email protected]>
>>> Date: Fri Apr 15 18:08:26 2011 -0400
>>>
>>> nfsd4: fix struct file leak on delegation
>>>
>>> Introduced by acfdf5c383b38f7f4dddae41b97c97f1ae058f49.
>>>
>>> Cc: [email protected]
>>> Reported-by: Gerhard Heift <[email protected]>
>>> Signed-off-by: J. Bruce Fields <[email protected]>
>>>
>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>> index aa309aa..c79a983 100644
>>> --- a/fs/nfsd/nfs4state.c
>>> +++ b/fs/nfsd/nfs4state.c
>>> @@ -258,6 +258,7 @@ static void nfs4_put_deleg_lease(struct nfs4_file *fp)
>>> if (atomic_dec_and_test(&fp->fi_delegees)) {
>>> vfs_setlease(fp->fi_deleg_file, F_UNLCK, &fp->fi_lease);
>>> fp->fi_lease = NULL;
>>> + fput(fp->fi_deleg_file);
>>> fp->fi_deleg_file = NULL;
>>> }
>>> }
>
> For now, I feel this explain filp leak on my system. the leak is
> increased slowly (filp, cred_jar, and no nfs* slabs), and leak is on
> nfs server side.
>
> I'll start test of this patch, and see what happens.

OK. Although filp slabs are still slightly increasing (I'm not sure yet
whether this is leak of filp on system). But watching before/after
patch, the graph of filp slabs is clearly different.

As far as I can say patches are fine.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2011-04-19 08:22:05

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [105/105] nfsd4: fix oops on lock failure

OGAWA Hirofumi <[email protected]> writes:

>> I'll start test of this patch, and see what happens.
>
> OK. Although filp slabs are still slightly increasing (I'm not sure yet
> whether this is leak of filp on system). But watching before/after
> patch, the graph of filp slabs is clearly different.
>
> As far as I can say patches are fine.

slightly increasing was stopped at 2200-2300. filp leak seems to be fixed.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2011-04-19 20:44:07

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [105/105] nfsd4: fix oops on lock failure

On Tue, Apr 19, 2011 at 05:21:57PM +0900, OGAWA Hirofumi wrote:
> OGAWA Hirofumi <[email protected]> writes:
>
> >> I'll start test of this patch, and see what happens.
> >
> > OK. Although filp slabs are still slightly increasing (I'm not sure yet
> > whether this is leak of filp on system). But watching before/after
> > patch, the graph of filp slabs is clearly different.
> >
> > As far as I can say patches are fine.
>
> slightly increasing was stopped at 2200-2300. filp leak seems to be fixed.

Another thing to check is whether you can always unmount the exported
filesystem on the server after running your test. So something like:

service nfs stop
unmount /exports/fs

should always succeed; if you get an inexplicable EBUSY on the final
unmount then we likely still have a leak someplace.

--b.

2011-04-18 15:43:06

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [105/105] nfsd4: fix oops on lock failure

On Mon, Apr 18, 2011 at 11:32:36AM -0400, J. Bruce Fields wrote:
> On Sun, Apr 17, 2011 at 09:10:12AM -0700, Linus Torvalds wrote:
> > On Sun, Apr 17, 2011 at 8:03 AM, OGAWA Hirofumi
> > <[email protected]> wrote:
> > >
> > > I'm looking filp leak on recent kernel. Well, anyway,
>
> Does this fix it?

(But, yes, I think your patch is almost certainly right as well.
Gah--I've been introducing depressing number of regressions lately.)

--b.

>
> --b.
>
> commit 5152c8a947359758862d4631863e68e83ec01048
> Author: J. Bruce Fields <[email protected]>
> Date: Fri Apr 15 18:08:26 2011 -0400
>
> nfsd4: fix struct file leak on delegation
>
> Introduced by acfdf5c383b38f7f4dddae41b97c97f1ae058f49.
>
> Cc: [email protected]
> Reported-by: Gerhard Heift <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index aa309aa..c79a983 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -258,6 +258,7 @@ static void nfs4_put_deleg_lease(struct nfs4_file *fp)
> if (atomic_dec_and_test(&fp->fi_delegees)) {
> vfs_setlease(fp->fi_deleg_file, F_UNLCK, &fp->fi_lease);
> fp->fi_lease = NULL;
> + fput(fp->fi_deleg_file);
> fp->fi_deleg_file = NULL;
> }
> }

2011-04-18 17:00:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: [105/105] nfsd4: fix oops on lock failure

On Mon, Apr 18, 2011 at 9:39 AM, OGAWA Hirofumi
<[email protected]> wrote:
>
> For now, I feel this explain filp leak on my system. the leak is
> increased slowly (filp, cred_jar, and no nfs* slabs), and leak is on
> nfs server side.
>
> I'll start test of this patch, and see what happens.

Can somebody ping/remind me when that is verified - preferably about
_both_ patches, even if it turns out that the first one by Ogawa
wasn't the one that caused the problem?

Or can I just assume that the fix will be in Bruce's pull requests some day?

Linus

2011-04-18 15:33:04

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [105/105] nfsd4: fix oops on lock failure

On Sun, Apr 17, 2011 at 09:10:12AM -0700, Linus Torvalds wrote:
> On Sun, Apr 17, 2011 at 8:03 AM, OGAWA Hirofumi
> <[email protected]> wrote:
> >
> > I'm looking filp leak on recent kernel. Well, anyway,

Does this fix it?

--b.

commit 5152c8a947359758862d4631863e68e83ec01048
Author: J. Bruce Fields <[email protected]>
Date: Fri Apr 15 18:08:26 2011 -0400

nfsd4: fix struct file leak on delegation

Introduced by acfdf5c383b38f7f4dddae41b97c97f1ae058f49.

Cc: [email protected]
Reported-by: Gerhard Heift <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index aa309aa..c79a983 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -258,6 +258,7 @@ static void nfs4_put_deleg_lease(struct nfs4_file *fp)
if (atomic_dec_and_test(&fp->fi_delegees)) {
vfs_setlease(fp->fi_deleg_file, F_UNLCK, &fp->fi_lease);
fp->fi_lease = NULL;
+ fput(fp->fi_deleg_file);
fp->fi_deleg_file = NULL;
}
}

2011-04-18 16:39:26

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [105/105] nfsd4: fix oops on lock failure

OGAWA Hirofumi <[email protected]> writes:

>> commit 5152c8a947359758862d4631863e68e83ec01048
>> Author: J. Bruce Fields <[email protected]>
>> Date: Fri Apr 15 18:08:26 2011 -0400
>>
>> nfsd4: fix struct file leak on delegation
>>
>> Introduced by acfdf5c383b38f7f4dddae41b97c97f1ae058f49.
>>
>> Cc: [email protected]
>> Reported-by: Gerhard Heift <[email protected]>
>> Signed-off-by: J. Bruce Fields <[email protected]>
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index aa309aa..c79a983 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -258,6 +258,7 @@ static void nfs4_put_deleg_lease(struct nfs4_file *fp)
>> if (atomic_dec_and_test(&fp->fi_delegees)) {
>> vfs_setlease(fp->fi_deleg_file, F_UNLCK, &fp->fi_lease);
>> fp->fi_lease = NULL;
>> + fput(fp->fi_deleg_file);
>> fp->fi_deleg_file = NULL;
>> }
>> }

For now, I feel this explain filp leak on my system. the leak is
increased slowly (filp, cred_jar, and no nfs* slabs), and leak is on
nfs server side.

I'll start test of this patch, and see what happens.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2011-04-18 16:08:08

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [105/105] nfsd4: fix oops on lock failure

"J. Bruce Fields" <[email protected]> writes:

> On Sun, Apr 17, 2011 at 09:10:12AM -0700, Linus Torvalds wrote:
>> On Sun, Apr 17, 2011 at 8:03 AM, OGAWA Hirofumi
>> <[email protected]> wrote:
>> >
>> > I'm looking filp leak on recent kernel. Well, anyway,
>
> Does this fix it?

It seems to be no, unfortunately. filp leak is not so fast, and obvious
reproduce process is not known yet. Well, sigh, I'm starting to add
debugging code for it...
--
OGAWA Hirofumi <[email protected]>

2011-04-19 21:17:11

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [105/105] nfsd4: fix oops on lock failure

"J. Bruce Fields" <[email protected]> writes:

> On Tue, Apr 19, 2011 at 05:21:57PM +0900, OGAWA Hirofumi wrote:
>> OGAWA Hirofumi <[email protected]> writes:
>>
>> >> I'll start test of this patch, and see what happens.
>> >
>> > OK. Although filp slabs are still slightly increasing (I'm not sure yet
>> > whether this is leak of filp on system). But watching before/after
>> > patch, the graph of filp slabs is clearly different.
>> >
>> > As far as I can say patches are fine.
>>
>> slightly increasing was stopped at 2200-2300. filp leak seems to be fixed.
>
> Another thing to check is whether you can always unmount the exported
> filesystem on the server after running your test. So something like:
>
> service nfs stop
> unmount /exports/fs
>
> should always succeed; if you get an inexplicable EBUSY on the final
> unmount then we likely still have a leak someplace.

It succeeded.
--
OGAWA Hirofumi <[email protected]>

2011-04-18 16:10:29

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [105/105] nfsd4: fix oops on lock failure

OGAWA Hirofumi <[email protected]> writes:

> "J. Bruce Fields" <[email protected]> writes:
>
>> On Sun, Apr 17, 2011 at 09:10:12AM -0700, Linus Torvalds wrote:
>>> On Sun, Apr 17, 2011 at 8:03 AM, OGAWA Hirofumi
>>> <[email protected]> wrote:
>>> >
>>> > I'm looking filp leak on recent kernel. Well, anyway,
>>
>> Does this fix it?
>
> It seems to be no, unfortunately. filp leak is not so fast, and obvious
> reproduce process is not known yet. Well, sigh, I'm starting to add
> debugging code for it...

Ah, I was missing patch itself. I'll see.

> commit 5152c8a947359758862d4631863e68e83ec01048
> Author: J. Bruce Fields <[email protected]>
> Date: Fri Apr 15 18:08:26 2011 -0400
>
> nfsd4: fix struct file leak on delegation
>
> Introduced by acfdf5c383b38f7f4dddae41b97c97f1ae058f49.
>
> Cc: [email protected]
> Reported-by: Gerhard Heift <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index aa309aa..c79a983 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -258,6 +258,7 @@ static void nfs4_put_deleg_lease(struct nfs4_file *fp)
> if (atomic_dec_and_test(&fp->fi_delegees)) {
> vfs_setlease(fp->fi_deleg_file, F_UNLCK, &fp->fi_lease);
> fp->fi_lease = NULL;
> + fput(fp->fi_deleg_file);
> fp->fi_deleg_file = NULL;
> }
> }
--
OGAWA Hirofumi <[email protected]>

2011-04-18 17:16:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [105/105] nfsd4: fix oops on lock failure

On Mon, Apr 18, 2011 at 09:59:26AM -0700, Linus Torvalds wrote:
> On Mon, Apr 18, 2011 at 9:39 AM, OGAWA Hirofumi
> <[email protected]> wrote:
> >
> > For now, I feel this explain filp leak on my system. the leak is
> > increased slowly (filp, cred_jar, and no nfs* slabs), and leak is on
> > nfs server side.
> >
> > I'll start test of this patch, and see what happens.
>
> Can somebody ping/remind me when that is verified - preferably about
> _both_ patches, even if it turns out that the first one by Ogawa
> wasn't the one that caused the problem?
>
> Or can I just assume that the fix will be in Bruce's pull requests some day?

I'll send a pull request when it's sorted out, thanks.

--b.

_______________________________________________
stable mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/stable

2011-04-18 18:21:23

by Greg KH

[permalink] [raw]
Subject: Re: [stable] [105/105] nfsd4: fix oops on lock failure

On Mon, Apr 18, 2011 at 01:16:43PM -0400, J. Bruce Fields wrote:
> On Mon, Apr 18, 2011 at 09:59:26AM -0700, Linus Torvalds wrote:
> > On Mon, Apr 18, 2011 at 9:39 AM, OGAWA Hirofumi
> > <[email protected]> wrote:
> > >
> > > For now, I feel this explain filp leak on my system. the leak is
> > > increased slowly (filp, cred_jar, and no nfs* slabs), and leak is on
> > > nfs server side.
> > >
> > > I'll start test of this patch, and see what happens.
> >
> > Can somebody ping/remind me when that is verified - preferably about
> > _both_ patches, even if it turns out that the first one by Ogawa
> > wasn't the one that caused the problem?
> >
> > Or can I just assume that the fix will be in Bruce's pull requests some day?
>
> I'll send a pull request when it's sorted out, thanks.

Please tag it for stable as well so I know to pick it up.

thanks,

greg k-h