2013-07-15 13:10:32

by Niels de Vos

[permalink] [raw]
Subject: [PATCH] fuse: fix occasional dentry leak when readdirplus is used

In case d_lookup() returns a dentry with d_inode == NULL, the dentry is
not returned with dput(). This results in triggering a BUG() in
shrink_dcache_for_umount_subtree():

BUG: Dentry ...{i=0,n=...} still in use (1) [unmount of fuse fuse]

Reported-by: Justin Clift <[email protected]>
Signed-off-by: Niels de Vos <[email protected]>

--
Reproducing the BUG() on kernels with fuse that support READDIRPLUS can
be done with the GlusterFS tests:
- http://www.gluster.org/community/documentation/index.php/Using_the_Gluster_Test_Framework

After some stressing of the VFS and fuse mountpoints, bug-860663.t will
hit the BUG(). It does not happen on running this test stand-alone.
---
fs/fuse/dir.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 0eda527..da67a15 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1246,7 +1246,9 @@ static int fuse_direntplus_link(struct file *file,
if (err)
goto out;
dput(dentry);
- dentry = NULL;
+ } else if (dentry) {
+ /* this dentry does not have a d_inode, just drop it */
+ dput(dentry);
}

dentry = d_alloc(parent, &name);
--
1.7.1


2013-07-15 20:04:00

by Brian Foster

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH] fuse: fix occasional dentry leak when readdirplus is used

On 07/15/2013 08:59 AM, Niels de Vos wrote:
> In case d_lookup() returns a dentry with d_inode == NULL, the dentry is
> not returned with dput(). This results in triggering a BUG() in
> shrink_dcache_for_umount_subtree():
>
> BUG: Dentry ...{i=0,n=...} still in use (1) [unmount of fuse fuse]
>
> Reported-by: Justin Clift <[email protected]>
> Signed-off-by: Niels de Vos <[email protected]>
>
> --
> Reproducing the BUG() on kernels with fuse that support READDIRPLUS can
> be done with the GlusterFS tests:
> - http://www.gluster.org/community/documentation/index.php/Using_the_Gluster_Test_Framework
>
> After some stressing of the VFS and fuse mountpoints, bug-860663.t will
> hit the BUG(). It does not happen on running this test stand-alone.

Hi Neils,

FYI, this is fairly easy to reproduce on-demand with gluster:

- mount a volume to two local mountpoints (i.e., I used a single
storage/posix translator volume):
glusterfs --volfile=./test.vol /mnt/{1,2} --use-readdirp=1
- create a negative dentry in one mountpoint:
ls /mnt/1/file (results in ENOENT)
- create the file via the second mountpoint:
touch /mnt/2/file
- run a readdirp on the first mountpoint:
ls /mnt/1/
- umount /mnt/2 /mnt/1

> ---
> fs/fuse/dir.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 0eda527..da67a15 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1246,7 +1246,9 @@ static int fuse_direntplus_link(struct file *file,
> if (err)
> goto out;
> dput(dentry);
> - dentry = NULL;
> + } else if (dentry) {
> + /* this dentry does not have a d_inode, just drop it */
> + dput(dentry);
> }

I'm not really familiar with the dcache code, but is it appropriate to
also d_invalidate() the dentry in this case (as the previous code block
does)? Perhaps Miklos or somebody more familiar with dcache can confirm...

Brian

>
> dentry = d_alloc(parent, &name);
>

2013-07-16 10:39:50

by Niels de Vos

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH] fuse: fix occasional dentry leak when readdirplus is used

On Mon, Jul 15, 2013 at 04:08:22PM -0400, Brian Foster wrote:
> On 07/15/2013 08:59 AM, Niels de Vos wrote:
> > In case d_lookup() returns a dentry with d_inode == NULL, the dentry is
> > not returned with dput(). This results in triggering a BUG() in
> > shrink_dcache_for_umount_subtree():
> >
> > BUG: Dentry ...{i=0,n=...} still in use (1) [unmount of fuse fuse]
> >
> > Reported-by: Justin Clift <[email protected]>
> > Signed-off-by: Niels de Vos <[email protected]>
> >
> > --
> > Reproducing the BUG() on kernels with fuse that support READDIRPLUS can
> > be done with the GlusterFS tests:
> > - http://www.gluster.org/community/documentation/index.php/Using_the_Gluster_Test_Framework
> >
> > After some stressing of the VFS and fuse mountpoints, bug-860663.t will
> > hit the BUG(). It does not happen on running this test stand-alone.
>
> Hi Neils,
>
> FYI, this is fairly easy to reproduce on-demand with gluster:
>
> - mount a volume to two local mountpoints (i.e., I used a single
> storage/posix translator volume):
> glusterfs --volfile=./test.vol /mnt/{1,2} --use-readdirp=1
> - create a negative dentry in one mountpoint:
> ls /mnt/1/file (results in ENOENT)
> - create the file via the second mountpoint:
> touch /mnt/2/file
> - run a readdirp on the first mountpoint:
> ls /mnt/1/
> - umount /mnt/2 /mnt/1

Thanks, that definitely makes it easier to verify the fix.

> > ---
> > fs/fuse/dir.c | 4 +++-
> > 1 files changed, 3 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index 0eda527..da67a15 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -1246,7 +1246,9 @@ static int fuse_direntplus_link(struct file *file,
> > if (err)
> > goto out;
> > dput(dentry);
> > - dentry = NULL;
> > + } else if (dentry) {
> > + /* this dentry does not have a d_inode, just drop it */
> > + dput(dentry);
> > }
>
> I'm not really familiar with the dcache code, but is it appropriate to
> also d_invalidate() the dentry in this case (as the previous code block
> does)? Perhaps Miklos or somebody more familiar with dcache can confirm...

I do not *think* d_invalidate() is needed. The vmcores I have seem where
this BUG() happened, only have dentry->d_flags = 0x18 which translates
to (DCACHE_OP_DELETE | DCACHE_OP_PRUNE) and d_subdirs as an empty list.
d_invalidate() only calls __d_drop(), which only does something when the
dentry is hashed.

I am not sure if a dentry can be hashed, but still does not have a valid
non-NULL d_inode. If that is the case, d_invalidate() should indeed be
called.

Thanks,
Niels

> Brian
>
> >
> > dentry = d_alloc(parent, &name);
> >
>

--
Niels de Vos
Sr. Software Maintenance Engineer
Support Engineering Group
Red Hat Global Support Services

2013-07-16 13:18:38

by Brian Foster

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH] fuse: fix occasional dentry leak when readdirplus is used

On 07/16/2013 06:39 AM, Niels de Vos wrote:
> On Mon, Jul 15, 2013 at 04:08:22PM -0400, Brian Foster wrote:
>> On 07/15/2013 08:59 AM, Niels de Vos wrote:
...
>
>>> ---
>>> fs/fuse/dir.c | 4 +++-
>>> 1 files changed, 3 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>>> index 0eda527..da67a15 100644
>>> --- a/fs/fuse/dir.c
>>> +++ b/fs/fuse/dir.c
>>> @@ -1246,7 +1246,9 @@ static int fuse_direntplus_link(struct file *file,
>>> if (err)
>>> goto out;
>>> dput(dentry);
>>> - dentry = NULL;
>>> + } else if (dentry) {
>>> + /* this dentry does not have a d_inode, just drop it */
>>> + dput(dentry);
>>> }
>>
>> I'm not really familiar with the dcache code, but is it appropriate to
>> also d_invalidate() the dentry in this case (as the previous code block
>> does)? Perhaps Miklos or somebody more familiar with dcache can confirm...
>
> I do not *think* d_invalidate() is needed. The vmcores I have seem where
> this BUG() happened, only have dentry->d_flags = 0x18 which translates
> to (DCACHE_OP_DELETE | DCACHE_OP_PRUNE) and d_subdirs as an empty list.
> d_invalidate() only calls __d_drop(), which only does something when the
> dentry is hashed.
>

I ran a little experiment to invoke a d_lookup() after the
fuse_direntplus_link() function has done its work in this particular
case. I do receive the newly allocated dentry.

Subsequently, I hacked __d_lookup_rcu() to continue iterating the list
after finding an entry to check for duplicates and print a message. What
I see is that after the dput() (via a subsequent lookup resulting from a
getxattr call), the negative dentry is still hashed and on the list (my
printk() kicks in after the existing d_unhashed() check). The flags for
both dentries are
(DCACHE_OP_REVALIDATE|DCACHE_REFERENCED|DCACHE_RCUACCESS). I'm not aware
of the steps involved in the original reproducer, but the simple
multi-mount test leads to this state. Running a d_invalidate() clears it
up. So perhaps it's required in some cases and not all.

That said, some of those flags seem to simply specify whether a dentry
op handler is defined for the particular operation or not.

> I am not sure if a dentry can be hashed, but still does not have a valid
> non-NULL d_inode. If that is the case, d_invalidate() should indeed be
> called.
>

I'm not sure why it would need to have a valid inode. A dentry with a
NULL inode is valid, no? I think the question is whether the above state
(multiple, hashed dentries) can be valid for whatever reason. It
certainly looks suspicious.

Brian

> Thanks,
> Niels
>
>> Brian
>>
>>>
>>> dentry = d_alloc(parent, &name);
>>>
>>
>

2013-07-16 16:14:59

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH] fuse: fix occasional dentry leak when readdirplus is used

On Tue, Jul 16, 2013 at 09:15:16AM -0400, Brian Foster wrote:

> I'm not sure why it would need to have a valid inode. A dentry with a
> NULL inode is valid, no?

It is valid, yes. It's called a "negative" dentry, which caches the information
that the file does not exist.

> I think the question is whether the above state (multiple, hashed dentries)
> can be valid for whatever reason. It certainly looks suspicious.

That state is not valid. So we certainly need to unhash the negative dentry
first, before hashing a new one. We could also reuse the old dentry, but that
is just an optimization.

Below patch fixes several issues with that function. Could you please give it a
go?

Thanks,
Miklos


diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 0eda527..a8208c5 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1223,28 +1223,45 @@ static int fuse_direntplus_link(struct file *file,
if (name.name[1] == '.' && name.len == 2)
return 0;
}
+
+ if (invalid_nodeid(o->nodeid))
+ return -EIO;
+ if (!fuse_valid_type(o->attr.mode))
+ return -EIO;
+
fc = get_fuse_conn(dir);

name.hash = full_name_hash(name.name, name.len);
dentry = d_lookup(parent, &name);
- if (dentry && dentry->d_inode) {
+ if (dentry) {
inode = dentry->d_inode;
- if (get_node_id(inode) == o->nodeid) {
+ if (!inode) {
+ d_drop(dentry);
+ } else if (get_node_id(inode) != o->nodeid ||
+ ((o->attr.mode ^ inode->i_mode) & S_IFMT)) {
+ err = d_invalidate(dentry);
+ if (err)
+ goto out;
+ } else if (is_bad_inode(inode)) {
+ err = -EIO;
+ goto out;
+ } else {
struct fuse_inode *fi;
fi = get_fuse_inode(inode);
spin_lock(&fc->lock);
fi->nlookup++;
spin_unlock(&fc->lock);

+ fuse_change_attributes(inode, &o->attr,
+ entry_attr_timeout(o),
+ attr_version);
+
/*
* The other branch to 'found' comes via fuse_iget()
* which bumps nlookup inside
*/
goto found;
}
- err = d_invalidate(dentry);
- if (err)
- goto out;
dput(dentry);
dentry = NULL;
}
@@ -1259,25 +1276,30 @@ static int fuse_direntplus_link(struct file *file,
if (!inode)
goto out;

- alias = d_materialise_unique(dentry, inode);
- err = PTR_ERR(alias);
- if (IS_ERR(alias))
- goto out;
+ if (S_ISDIR(inode->i_mode)) {
+ mutex_lock(&fc->inst_mutex);
+ alias = fuse_d_add_directory(dentry, inode);
+ mutex_unlock(&fc->inst_mutex);
+ err = PTR_ERR(alias);
+ if (IS_ERR(alias)) {
+ iput(inode);
+ goto out;
+ }
+ } else {
+ alias = d_splice_alias(inode, dentry);
+ }
+
if (alias) {
dput(dentry);
dentry = alias;
}

found:
- fuse_change_attributes(inode, &o->attr, entry_attr_timeout(o),
- attr_version);
-
fuse_change_entry_timeout(dentry, o);

err = 0;
out:
- if (dentry)
- dput(dentry);
+ dput(dentry);
return err;
}

2013-07-16 17:46:34

by Brian Foster

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH] fuse: fix occasional dentry leak when readdirplus is used

On 07/16/2013 12:14 PM, Miklos Szeredi wrote:
> On Tue, Jul 16, 2013 at 09:15:16AM -0400, Brian Foster wrote:
>
>> I'm not sure why it would need to have a valid inode. A dentry with a
>> NULL inode is valid, no?
>
> It is valid, yes. It's called a "negative" dentry, which caches the information
> that the file does not exist.
>
>> I think the question is whether the above state (multiple, hashed dentries)
>> can be valid for whatever reason. It certainly looks suspicious.
>
> That state is not valid. So we certainly need to unhash the negative dentry
> first, before hashing a new one. We could also reuse the old dentry, but that
> is just an optimization.
>

Thanks Miklos.

> Below patch fixes several issues with that function. Could you please give it a
> go?
>

This patch looks good and fixes the issue for me. It might be good to
give Neils a chance to beat on it as well, but otherwise:

Tested-by: Brian Foster <[email protected]>

Brian

> Thanks,
> Miklos
>
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 0eda527..a8208c5 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1223,28 +1223,45 @@ static int fuse_direntplus_link(struct file *file,
> if (name.name[1] == '.' && name.len == 2)
> return 0;
> }
> +
> + if (invalid_nodeid(o->nodeid))
> + return -EIO;
> + if (!fuse_valid_type(o->attr.mode))
> + return -EIO;
> +
> fc = get_fuse_conn(dir);
>
> name.hash = full_name_hash(name.name, name.len);
> dentry = d_lookup(parent, &name);
> - if (dentry && dentry->d_inode) {
> + if (dentry) {
> inode = dentry->d_inode;
> - if (get_node_id(inode) == o->nodeid) {
> + if (!inode) {
> + d_drop(dentry);
> + } else if (get_node_id(inode) != o->nodeid ||
> + ((o->attr.mode ^ inode->i_mode) & S_IFMT)) {
> + err = d_invalidate(dentry);
> + if (err)
> + goto out;
> + } else if (is_bad_inode(inode)) {
> + err = -EIO;
> + goto out;
> + } else {
> struct fuse_inode *fi;
> fi = get_fuse_inode(inode);
> spin_lock(&fc->lock);
> fi->nlookup++;
> spin_unlock(&fc->lock);
>
> + fuse_change_attributes(inode, &o->attr,
> + entry_attr_timeout(o),
> + attr_version);
> +
> /*
> * The other branch to 'found' comes via fuse_iget()
> * which bumps nlookup inside
> */
> goto found;
> }
> - err = d_invalidate(dentry);
> - if (err)
> - goto out;
> dput(dentry);
> dentry = NULL;
> }
> @@ -1259,25 +1276,30 @@ static int fuse_direntplus_link(struct file *file,
> if (!inode)
> goto out;
>
> - alias = d_materialise_unique(dentry, inode);
> - err = PTR_ERR(alias);
> - if (IS_ERR(alias))
> - goto out;
> + if (S_ISDIR(inode->i_mode)) {
> + mutex_lock(&fc->inst_mutex);
> + alias = fuse_d_add_directory(dentry, inode);
> + mutex_unlock(&fc->inst_mutex);
> + err = PTR_ERR(alias);
> + if (IS_ERR(alias)) {
> + iput(inode);
> + goto out;
> + }
> + } else {
> + alias = d_splice_alias(inode, dentry);
> + }
> +
> if (alias) {
> dput(dentry);
> dentry = alias;
> }
>
> found:
> - fuse_change_attributes(inode, &o->attr, entry_attr_timeout(o),
> - attr_version);
> -
> fuse_change_entry_timeout(dentry, o);
>
> err = 0;
> out:
> - if (dentry)
> - dput(dentry);
> + dput(dentry);
> return err;
> }
>
>

2013-07-17 11:20:25

by Niels de Vos

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH] fuse: fix occasional dentry leak when readdirplus is used

On Tue, Jul 16, 2013 at 06:14:58PM +0200, Miklos Szeredi wrote:
> On Tue, Jul 16, 2013 at 09:15:16AM -0400, Brian Foster wrote:
>
> > I'm not sure why it would need to have a valid inode. A dentry with a
> > NULL inode is valid, no?
>
> It is valid, yes. It's called a "negative" dentry, which caches the information
> that the file does not exist.
>
> > I think the question is whether the above state (multiple, hashed dentries)
> > can be valid for whatever reason. It certainly looks suspicious.
>
> That state is not valid. So we certainly need to unhash the negative dentry
> first, before hashing a new one. We could also reuse the old dentry, but that
> is just an optimization.
>
> Below patch fixes several issues with that function. Could you please give it a
> go?

Thanks Miklos! This looks much cleaner and the additional error checking
surely reduces the chance of any further problems. I've added only one
little comment in the patch below.

I can confirm that your patch fixes the original panic/BUG() with both
Brian's test-case and the GlusterFS regression tests.

Feel free to add my Tested-by/Reviewed-by signoff for this patch.

Cheers,
Niels


>
> Thanks,
> Miklos
>
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 0eda527..a8208c5 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1223,28 +1223,45 @@ static int fuse_direntplus_link(struct file *file,
> if (name.name[1] == '.' && name.len == 2)
> return 0;
> }
> +
> + if (invalid_nodeid(o->nodeid))
> + return -EIO;
> + if (!fuse_valid_type(o->attr.mode))
> + return -EIO;
> +
> fc = get_fuse_conn(dir);
>
> name.hash = full_name_hash(name.name, name.len);
> dentry = d_lookup(parent, &name);
> - if (dentry && dentry->d_inode) {
> + if (dentry) {
> inode = dentry->d_inode;
> - if (get_node_id(inode) == o->nodeid) {
> + if (!inode) {
> + d_drop(dentry);
> + } else if (get_node_id(inode) != o->nodeid ||
> + ((o->attr.mode ^ inode->i_mode) & S_IFMT)) {
> + err = d_invalidate(dentry);
> + if (err)
> + goto out;
> + } else if (is_bad_inode(inode)) {
> + err = -EIO;
> + goto out;
> + } else {
> struct fuse_inode *fi;
> fi = get_fuse_inode(inode);
> spin_lock(&fc->lock);
> fi->nlookup++;
> spin_unlock(&fc->lock);
>
> + fuse_change_attributes(inode, &o->attr,
> + entry_attr_timeout(o),
> + attr_version);
> +
> /*
> * The other branch to 'found' comes via fuse_iget()
> * which bumps nlookup inside
> */
> goto found;
> }
> - err = d_invalidate(dentry);
> - if (err)
> - goto out;
> dput(dentry);
> dentry = NULL;

You might want to remove that 'dentry = NULL' line, as the result of
d_alloc() is assigned to dentry on line below the if-statement.

> }
> @@ -1259,25 +1276,30 @@ static int fuse_direntplus_link(struct file *file,
> if (!inode)
> goto out;
>
> - alias = d_materialise_unique(dentry, inode);
> - err = PTR_ERR(alias);
> - if (IS_ERR(alias))
> - goto out;
> + if (S_ISDIR(inode->i_mode)) {
> + mutex_lock(&fc->inst_mutex);
> + alias = fuse_d_add_directory(dentry, inode);
> + mutex_unlock(&fc->inst_mutex);
> + err = PTR_ERR(alias);
> + if (IS_ERR(alias)) {
> + iput(inode);
> + goto out;
> + }
> + } else {
> + alias = d_splice_alias(inode, dentry);
> + }
> +
> if (alias) {
> dput(dentry);
> dentry = alias;
> }
>
> found:
> - fuse_change_attributes(inode, &o->attr, entry_attr_timeout(o),
> - attr_version);
> -
> fuse_change_entry_timeout(dentry, o);
>
> err = 0;
> out:
> - if (dentry)
> - dput(dentry);
> + dput(dentry);
> return err;
> }
>

2013-07-17 13:04:51

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [fuse-devel] [PATCH] fuse: fix occasional dentry leak when readdirplus is used

On Wed, Jul 17, 2013 at 1:20 PM, Niels de Vos <[email protected]> wrote:
> I can confirm that your patch fixes the original panic/BUG() with both
> Brian's test-case and the GlusterFS regression tests.
>
> Feel free to add my Tested-by/Reviewed-by signoff for this patch.

Thanks for the review and testing, patches pushed to

git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next

Thanks,
Miklos