2009-12-02 20:00:14

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 0/2] vfs: plug some holes involving LAST_BIND symlinks and file bind mounts (try #6)

This patchset is another attempt to add missing dentry revalidations to
certain places in the lookup codepath. The main difference from the last
patchset is:

* I've dropped the patch that added permissions checks when chasing
LAST_BIND symlinks. There's a lot disagreement about whether the
current behavior is even a bug. I'd prefer to see more concensus on
that point before we do anything here.

* I've rejiggered the error handling in this codepath to ensure that the
bind mounts could still be unmounted. Testing showed that returning an
error in do_lookup when a bind-mounted dentry went stale made it
un-unmountable.

Comments and suggestions appreciated...

Jeff Layton (2):
vfs: force reval of target when following LAST_BIND symlinks
vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT
filesystems

fs/namei.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 55 insertions(+), 1 deletions(-)


2009-12-02 20:00:14

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 1/2] vfs: force reval of target when following LAST_BIND symlinks

procfs-style symlinks return a last_type of LAST_BIND without an actual
path string. This causes __follow_link to skip calling __vfs_follow_link
and so the dentry isn't revalidated.

This is a problem when the link target sits on NFSv4 as it depends on
the VFS to revalidate the dentry before using it on an open call. Ensure
that this occurs by forcing a revalidation of the target dentry of
LAST_BIND symlinks.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 3374917..339789e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -414,6 +414,46 @@ do_revalidate(struct dentry *dentry, struct nameidata *nd)
}

/*
+ * force_reval_path - force revalidation of a dentry
+ *
+ * In some situations the path walking code will trust dentries without
+ * revalidating them. This causes problems for filesystems that depend on
+ * d_revalidate to handle file opens (e.g. NFSv4). When FS_REVAL_DOT is set
+ * (which indicates that it's possible for the dentry to go stale), force
+ * a d_revalidate call before proceeding.
+ *
+ * Returns 0 if the revalidation was successful. If the revalidation fails,
+ * either return the error returned by d_revalidate or -ESTALE if the
+ * revalidation it just returned 0. If d_revalidate returns 0, we attempt to
+ * invalidate the dentry. It's up to the caller to handle putting references
+ * to the path if necessary.
+ */
+static int
+force_reval_path(struct path *path, struct nameidata *nd)
+{
+ int status;
+ struct dentry *dentry = path->dentry;
+
+ /*
+ * only check on filesystems where it's possible for the dentry to
+ * become stale. It's assumed that if this flag is set then the
+ * d_revalidate op will also be defined.
+ */
+ if (!(dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT))
+ return 0;
+
+ status = dentry->d_op->d_revalidate(dentry, nd);
+ if (status > 0)
+ return 0;
+
+ if (!status) {
+ d_invalidate(dentry);
+ status = -ESTALE;
+ }
+ return status;
+}
+
+/*
* Internal lookup() using the new generic dcache.
* SMP-safe
*/
@@ -641,6 +681,11 @@ static __always_inline int __do_follow_link(struct path *path, struct nameidata
error = 0;
if (s)
error = __vfs_follow_link(nd, s);
+ else if (nd->last_type == LAST_BIND) {
+ error = force_reval_path(&nd->path, nd);
+ if (error)
+ path_put(&nd->path);
+ }
if (dentry->d_inode->i_op->put_link)
dentry->d_inode->i_op->put_link(dentry, nd, cookie);
}
--
1.5.5.6

2009-12-02 20:00:10

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems

In the case of a bind mounted file, the path walking code will assume
that the cached dentry that was bind mounted is valid. This is a problem
problem for NFSv4 in a way that's similar to LAST_BIND symlinks.

Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
__follow_mount returns true.

Note that in the non-open codepath, we cannot return an error to the
lookup if the revalidation fails. Doing so will leave a bind mount in
a state such that we can't unmount it. In that case we'll just have to
settle for d_invalidating it (which should mostly turn out to be a
d_drop in this case) and returning success.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/namei.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 339789e..0d55b6f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -851,7 +851,13 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
done:
path->mnt = mnt;
path->dentry = dentry;
- __follow_mount(path);
+
+ /*
+ * cannot return the error returned by force_reval_path as that can
+ * make it impossible to unmount a bind mounted dentry if it's stale.
+ */
+ if (__follow_mount(path))
+ force_reval_path(path, nd);
return 0;

need_lookup:
@@ -1840,6 +1846,9 @@ do_last:
error = -ELOOP;
if (flag & O_NOFOLLOW)
goto exit_dput;
+ error = force_reval_path(&path, &nd);
+ if (error)
+ goto exit_dput;
}

error = -ENOENT;
--
1.5.5.6

2009-12-02 23:53:49

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 1/2] vfs: force reval of target when following LAST_BIND symlinks

Jeff Layton <[email protected]> writes:

> procfs-style symlinks return a last_type of LAST_BIND without an actual
> path string. This causes __follow_link to skip calling __vfs_follow_link
> and so the dentry isn't revalidated.
>
> This is a problem when the link target sits on NFSv4 as it depends on
> the VFS to revalidate the dentry before using it on an open call. Ensure
> that this occurs by forcing a revalidation of the target dentry of
> LAST_BIND symlinks.

This seems reasonable to me, and in a quick run through I can't see any
problems.

Acked-by: "Eric W. Biederman" <[email protected]>

>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/namei.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 45 insertions(+), 0 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 3374917..339789e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -414,6 +414,46 @@ do_revalidate(struct dentry *dentry, struct nameidata *nd)
> }
>
> /*
> + * force_reval_path - force revalidation of a dentry
> + *
> + * In some situations the path walking code will trust dentries without
> + * revalidating them. This causes problems for filesystems that depend on
> + * d_revalidate to handle file opens (e.g. NFSv4). When FS_REVAL_DOT is set
> + * (which indicates that it's possible for the dentry to go stale), force
> + * a d_revalidate call before proceeding.
> + *
> + * Returns 0 if the revalidation was successful. If the revalidation fails,
> + * either return the error returned by d_revalidate or -ESTALE if the
> + * revalidation it just returned 0. If d_revalidate returns 0, we attempt to
> + * invalidate the dentry. It's up to the caller to handle putting references
> + * to the path if necessary.
> + */
> +static int
> +force_reval_path(struct path *path, struct nameidata *nd)
> +{
> + int status;
> + struct dentry *dentry = path->dentry;
> +
> + /*
> + * only check on filesystems where it's possible for the dentry to
> + * become stale. It's assumed that if this flag is set then the
> + * d_revalidate op will also be defined.
> + */
> + if (!(dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT))
> + return 0;
> +
> + status = dentry->d_op->d_revalidate(dentry, nd);
> + if (status > 0)
> + return 0;
> +
> + if (!status) {
> + d_invalidate(dentry);
> + status = -ESTALE;
> + }
> + return status;
> +}
> +
> +/*
> * Internal lookup() using the new generic dcache.
> * SMP-safe
> */
> @@ -641,6 +681,11 @@ static __always_inline int __do_follow_link(struct path *path, struct nameidata
> error = 0;
> if (s)
> error = __vfs_follow_link(nd, s);
> + else if (nd->last_type == LAST_BIND) {
> + error = force_reval_path(&nd->path, nd);
> + if (error)
> + path_put(&nd->path);
> + }
> if (dentry->d_inode->i_op->put_link)
> dentry->d_inode->i_op->put_link(dentry, nd, cookie);
> }
> --
> 1.5.5.6

2009-12-03 00:02:03

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems

Jeff Layton <[email protected]> writes:

> In the case of a bind mounted file, the path walking code will assume
> that the cached dentry that was bind mounted is valid. This is a problem
> problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
>
> Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
> __follow_mount returns true.
>
> Note that in the non-open codepath, we cannot return an error to the
> lookup if the revalidation fails. Doing so will leave a bind mount in
> a state such that we can't unmount it. In that case we'll just have to
> settle for d_invalidating it (which should mostly turn out to be a
> d_drop in this case) and returning success.

Looks reasonable to me. I wonder a little if we care about follow_mount
as well as __follow_mount.

This seems reasonable to me.

Acked-by: "Eric W. Biederman" <[email protected]>

>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/namei.c | 11 ++++++++++-
> 1 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 339789e..0d55b6f 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -851,7 +851,13 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
> done:
> path->mnt = mnt;
> path->dentry = dentry;
> - __follow_mount(path);
> +
> + /*
> + * cannot return the error returned by force_reval_path as that can
> + * make it impossible to unmount a bind mounted dentry if it's stale.
> + */
> + if (__follow_mount(path))
> + force_reval_path(path, nd);
> return 0;
>
> need_lookup:
> @@ -1840,6 +1846,9 @@ do_last:
> error = -ELOOP;
> if (flag & O_NOFOLLOW)
> goto exit_dput;
> + error = force_reval_path(&path, &nd);
> + if (error)
> + goto exit_dput;
> }
>
> error = -ENOENT;
> --
> 1.5.5.6

2009-12-03 01:23:45

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems

On Wed, 02 Dec 2009 16:01:59 -0800
[email protected] (Eric W. Biederman) wrote:

> Jeff Layton <[email protected]> writes:
>
> > In the case of a bind mounted file, the path walking code will assume
> > that the cached dentry that was bind mounted is valid. This is a problem
> > problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
> >
> > Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
> > __follow_mount returns true.
> >
> > Note that in the non-open codepath, we cannot return an error to the
> > lookup if the revalidation fails. Doing so will leave a bind mount in
> > a state such that we can't unmount it. In that case we'll just have to
> > settle for d_invalidating it (which should mostly turn out to be a
> > d_drop in this case) and returning success.
>
> Looks reasonable to me. I wonder a little if we care about follow_mount
> as well as __follow_mount.
>

Good question.

It does sort of seem like it. There's also follow_down too...

OTOH, we have a reproducible testcases that the two patches in this
series fix. I'm not aware of anything that's technically "broken" by
the lack of revalidations in the other codepaths so I'm hesitant to add
them in unless and until we do.

> This seems reasonable to me.
>
> Acked-by: "Eric W. Biederman" <[email protected]>
>
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/namei.c | 11 ++++++++++-
> > 1 files changed, 10 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 339789e..0d55b6f 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -851,7 +851,13 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
> > done:
> > path->mnt = mnt;
> > path->dentry = dentry;
> > - __follow_mount(path);
> > +
> > + /*
> > + * cannot return the error returned by force_reval_path as that can
> > + * make it impossible to unmount a bind mounted dentry if it's stale.
> > + */
> > + if (__follow_mount(path))
> > + force_reval_path(path, nd);
> > return 0;
> >
> > need_lookup:
> > @@ -1840,6 +1846,9 @@ do_last:
> > error = -ELOOP;
> > if (flag & O_NOFOLLOW)
> > goto exit_dput;
> > + error = force_reval_path(&path, &nd);
> > + if (error)
> > + goto exit_dput;
> > }
> >
> > error = -ENOENT;
> > --
> > 1.5.5.6


--
Jeff Layton <[email protected]>

2009-12-03 10:39:16

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 1/2] vfs: force reval of target when following LAST_BIND symlinks

On Wed, 2 Dec 2009, Jeff Layton wrote:
> procfs-style symlinks return a last_type of LAST_BIND without an actual
> path string. This causes __follow_link to skip calling __vfs_follow_link
> and so the dentry isn't revalidated.
>
> This is a problem when the link target sits on NFSv4 as it depends on
> the VFS to revalidate the dentry before using it on an open call. Ensure
> that this occurs by forcing a revalidation of the target dentry of
> LAST_BIND symlinks.

An added advantage is that the symlinks in /proc will show the
invalidated path as "(deleted)". Maybe proc_pid_readlink() should
also call force_reval_path() to get an up-to-date state of the path?

Acked-by: Miklos Szeredi <[email protected]>

>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/namei.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 45 insertions(+), 0 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 3374917..339789e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -414,6 +414,46 @@ do_revalidate(struct dentry *dentry, struct nameidata *nd)
> }
>
> /*
> + * force_reval_path - force revalidation of a dentry
> + *
> + * In some situations the path walking code will trust dentries without
> + * revalidating them. This causes problems for filesystems that depend on
> + * d_revalidate to handle file opens (e.g. NFSv4). When FS_REVAL_DOT is set
> + * (which indicates that it's possible for the dentry to go stale), force
> + * a d_revalidate call before proceeding.
> + *
> + * Returns 0 if the revalidation was successful. If the revalidation fails,
> + * either return the error returned by d_revalidate or -ESTALE if the
> + * revalidation it just returned 0. If d_revalidate returns 0, we attempt to
> + * invalidate the dentry. It's up to the caller to handle putting references
> + * to the path if necessary.
> + */
> +static int
> +force_reval_path(struct path *path, struct nameidata *nd)
> +{
> + int status;
> + struct dentry *dentry = path->dentry;
> +
> + /*
> + * only check on filesystems where it's possible for the dentry to
> + * become stale. It's assumed that if this flag is set then the
> + * d_revalidate op will also be defined.
> + */
> + if (!(dentry->d_sb->s_type->fs_flags & FS_REVAL_DOT))
> + return 0;
> +
> + status = dentry->d_op->d_revalidate(dentry, nd);
> + if (status > 0)
> + return 0;
> +
> + if (!status) {
> + d_invalidate(dentry);
> + status = -ESTALE;
> + }
> + return status;
> +}
> +
> +/*
> * Internal lookup() using the new generic dcache.
> * SMP-safe
> */
> @@ -641,6 +681,11 @@ static __always_inline int __do_follow_link(struct path *path, struct nameidata
> error = 0;
> if (s)
> error = __vfs_follow_link(nd, s);
> + else if (nd->last_type == LAST_BIND) {
> + error = force_reval_path(&nd->path, nd);
> + if (error)
> + path_put(&nd->path);
> + }
> if (dentry->d_inode->i_op->put_link)
> dentry->d_inode->i_op->put_link(dentry, nd, cookie);
> }
> --
> 1.5.5.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2009-12-03 10:58:43

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems

On Wed, 2 Dec 2009, Jeff Layton wrote:
> In the case of a bind mounted file, the path walking code will assume
> that the cached dentry that was bind mounted is valid. This is a problem
> problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
>
> Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
> __follow_mount returns true.
>
> Note that in the non-open codepath, we cannot return an error to the
> lookup if the revalidation fails. Doing so will leave a bind mount in
> a state such that we can't unmount it. In that case we'll just have to
> settle for d_invalidating it (which should mostly turn out to be a
> d_drop in this case) and returning success.

The only worry I have is that this adds an extra branch in a very hot
codepath (do_lookup). An error can't be returned, as you note, and
for bind mounted directories d_invalidate() will not succeed: the
directory is busy, it's referenced by the mount. So basically the
only thing this does is working around the NFSv4 issue. But Trond has
a proper solution to that, and a temporary solution could be added to
do_filp_open() rather than burdening do_lookup() with it, no?

Thanks,
Miklos

>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/namei.c | 11 ++++++++++-
> 1 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 339789e..0d55b6f 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -851,7 +851,13 @@ static int do_lookup(struct nameidata *nd, struct qstr *name,
> done:
> path->mnt = mnt;
> path->dentry = dentry;
> - __follow_mount(path);
> +
> + /*
> + * cannot return the error returned by force_reval_path as that can
> + * make it impossible to unmount a bind mounted dentry if it's stale.
> + */
> + if (__follow_mount(path))
> + force_reval_path(path, nd);
> return 0;
>
> need_lookup:
> @@ -1840,6 +1846,9 @@ do_last:
> error = -ELOOP;
> if (flag & O_NOFOLLOW)
> goto exit_dput;
> + error = force_reval_path(&path, &nd);
> + if (error)
> + goto exit_dput;
> }
>
> error = -ENOENT;
> --
> 1.5.5.6
>
>

2009-12-03 11:11:47

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems

Miklos Szeredi <[email protected]> writes:

> On Wed, 2 Dec 2009, Jeff Layton wrote:
>> In the case of a bind mounted file, the path walking code will assume
>> that the cached dentry that was bind mounted is valid. This is a problem
>> problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
>>
>> Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
>> __follow_mount returns true.
>>
>> Note that in the non-open codepath, we cannot return an error to the
>> lookup if the revalidation fails. Doing so will leave a bind mount in
>> a state such that we can't unmount it. In that case we'll just have to
>> settle for d_invalidating it (which should mostly turn out to be a
>> d_drop in this case) and returning success.
>
> The only worry I have is that this adds an extra branch in a very hot
> codepath (do_lookup). An error can't be returned, as you note, and
> for bind mounted directories d_invalidate() will not succeed: the
> directory is busy, it's referenced by the mount.

Not true. d_mountpoint is false, so d_invalidate can succeed.

> So basically the
> only thing this does is working around the NFSv4 issue.

No, this should catch other cases where we have a dentry goes
stale as well, and lets the distributed filesystem handle it.

It is probably worth a benchmark to ease the concerns about the hotpath.
I expect the cpu will predict the branch as unlikely and we won't see
any difference.

Eric

2009-12-03 11:19:53

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems

On Thu, 03 Dec 2009, Eric W. Biederman wrote:
> > The only worry I have is that this adds an extra branch in a very hot
> > codepath (do_lookup). An error can't be returned, as you note, and
> > for bind mounted directories d_invalidate() will not succeed: the
> > directory is busy, it's referenced by the mount.
>
> Not true. d_mountpoint is false, so d_invalidate can succeed.

Have a look at the code. d_invalidate() doesn't check for a
mountpoint, it checks the refcount. It needs to keep the directory
dentry hashed if it's in any way reachable other than from the cache
(file descriptor, cwd, mount, etc).

Thanks,
Miklos

2009-12-03 11:56:48

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems

Miklos Szeredi <[email protected]> writes:

> On Thu, 03 Dec 2009, Eric W. Biederman wrote:
>> > The only worry I have is that this adds an extra branch in a very hot
>> > codepath (do_lookup). An error can't be returned, as you note, and
>> > for bind mounted directories d_invalidate() will not succeed: the
>> > directory is busy, it's referenced by the mount.
>>
>> Not true. d_mountpoint is false, so d_invalidate can succeed.
>
> Have a look at the code. d_invalidate() doesn't check for a
> mountpoint, it checks the refcount. It needs to keep the directory
> dentry hashed if it's in any way reachable other than from the cache
> (file descriptor, cwd, mount, etc).

Ah. I thought you were thinking about the mandatory have_submounts()
check in dentry->d_op->d_revalidate().

I expect the generic d_invalidate will simply hit the:
spin_lock(&dcache_lock);
if (d_unhashed(dentry)) {
spin_unlock(&dcache_lock);
return 0;
}

After the distributed filesystem has called d_drop in
dentry->d_op->d_revalidate (when appropriate.

Eric


2009-12-03 15:20:52

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems

On Thu, 03 Dec 2009 11:58:43 +0100
Miklos Szeredi <[email protected]> wrote:

> On Wed, 2 Dec 2009, Jeff Layton wrote:
> > In the case of a bind mounted file, the path walking code will assume
> > that the cached dentry that was bind mounted is valid. This is a problem
> > problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
> >
> > Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
> > __follow_mount returns true.
> >
> > Note that in the non-open codepath, we cannot return an error to the
> > lookup if the revalidation fails. Doing so will leave a bind mount in
> > a state such that we can't unmount it. In that case we'll just have to
> > settle for d_invalidating it (which should mostly turn out to be a
> > d_drop in this case) and returning success.
>
> The only worry I have is that this adds an extra branch in a very hot
> codepath (do_lookup). An error can't be returned, as you note, and
> for bind mounted directories d_invalidate() will not succeed: the
> directory is busy, it's referenced by the mount. So basically the
> only thing this does is working around the NFSv4 issue. But Trond has
> a proper solution to that, and a temporary solution could be added to
> do_filp_open() rather than burdening do_lookup() with it, no?
>

(re-adding Trond. I forgot to cc him on this latest set)

Self-NAK on this patch...

That's my main worry too, and sadly it doesn't seem to be unfounded.
This patch adds a lot of extra d_revalidate calls here. I think it's
going to be too expensive to do this.

Unfortunately, adding the code to do_filp_open alone doesn't really
help. The path walking code in there is just for the O_CREAT case. If
we're not creating the file, we call into path_lookup_open which
eventually calls do_lookup.

What we probably need to do is only make it d_revalidate when it looks
like the final dentry is bind mounted. I'm not sure how to tell that
though and even if I could, I'm still leery of adding this to such a
hot codepath.

The only problem I've identified that this fixes is with file bind
mounts and I don't get the impression they're that common. Maybe the
best thing is to just fix the LAST_BIND symlink case for now and wait
for Trond or Al's overhaul of this code.

--
Jeff Layton <[email protected]>

2009-12-03 15:21:29

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems

On Thu, 03 Dec 2009, Eric W. Biederman wrote:
> Ah. I thought you were thinking about the mandatory have_submounts()
> check in dentry->d_op->d_revalidate().
>
> I expect the generic d_invalidate will simply hit the:
> spin_lock(&dcache_lock);
> if (d_unhashed(dentry)) {
> spin_unlock(&dcache_lock);
> return 0;
> }
>
> After the distributed filesystem has called d_drop in
> dentry->d_op->d_revalidate (when appropriate.

Ah, right, NFS's d_revalidate does do d_drop(). Okay, I have no
problem with the patch, as long as path lookup performance doesn't
show a regression.

Thanks,
Miklos

2009-12-03 18:35:29

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems

Jeff Layton <[email protected]> writes:

> On Thu, 03 Dec 2009 11:58:43 +0100
> Miklos Szeredi <[email protected]> wrote:
>
>> On Wed, 2 Dec 2009, Jeff Layton wrote:
>> > In the case of a bind mounted file, the path walking code will assume
>> > that the cached dentry that was bind mounted is valid. This is a problem
>> > problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
>> >
>> > Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
>> > __follow_mount returns true.
>> >
>> > Note that in the non-open codepath, we cannot return an error to the
>> > lookup if the revalidation fails. Doing so will leave a bind mount in
>> > a state such that we can't unmount it. In that case we'll just have to
>> > settle for d_invalidating it (which should mostly turn out to be a
>> > d_drop in this case) and returning success.
>>
>> The only worry I have is that this adds an extra branch in a very hot
>> codepath (do_lookup). An error can't be returned, as you note, and
>> for bind mounted directories d_invalidate() will not succeed: the
>> directory is busy, it's referenced by the mount. So basically the
>> only thing this does is working around the NFSv4 issue. But Trond has
>> a proper solution to that, and a temporary solution could be added to
>> do_filp_open() rather than burdening do_lookup() with it, no?
>>
>
> (re-adding Trond. I forgot to cc him on this latest set)
>
> Self-NAK on this patch...
>
> That's my main worry too, and sadly it doesn't seem to be unfounded.
> This patch adds a lot of extra d_revalidate calls here. I think it's
> going to be too expensive to do this.

How so? We should only see extra calls if we follow a mount point.
Currently we call d_revalidate on every path component.

> The only problem I've identified that this fixes is with file bind
> mounts and I don't get the impression they're that common. Maybe the
> best thing is to just fix the LAST_BIND symlink case for now and wait
> for Trond or Al's overhaul of this code.

Well right now following mount points breaks the VFS contract that we
will revalidate all dentries before we use them. That breaking of the
contract breaks NFS.

I don't know what else d_revalidate is good for. On the sysfs side
I only use it to unhash the dentry. Something we don't care about
from the do_lookup side of things if we have a bind mount.

I'm not clear what kind of changes revalidating a deleted but open
file will give you on NFS.

Eric

2009-12-03 19:16:22

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 2/2] vfs: force reval on dentry of bind mounted files on FS_REVAL_DOT filesystems

On Thu, 03 Dec 2009 10:35:21 -0800
[email protected] (Eric W. Biederman) wrote:

> Jeff Layton <[email protected]> writes:
>
> > On Thu, 03 Dec 2009 11:58:43 +0100
> > Miklos Szeredi <[email protected]> wrote:
> >
> >> On Wed, 2 Dec 2009, Jeff Layton wrote:
> >> > In the case of a bind mounted file, the path walking code will assume
> >> > that the cached dentry that was bind mounted is valid. This is a problem
> >> > problem for NFSv4 in a way that's similar to LAST_BIND symlinks.
> >> >
> >> > Fix this by revalidating the dentry if FS_FOLLOW_DOT is set and
> >> > __follow_mount returns true.
> >> >
> >> > Note that in the non-open codepath, we cannot return an error to the
> >> > lookup if the revalidation fails. Doing so will leave a bind mount in
> >> > a state such that we can't unmount it. In that case we'll just have to
> >> > settle for d_invalidating it (which should mostly turn out to be a
> >> > d_drop in this case) and returning success.
> >>
> >> The only worry I have is that this adds an extra branch in a very hot
> >> codepath (do_lookup). An error can't be returned, as you note, and
> >> for bind mounted directories d_invalidate() will not succeed: the
> >> directory is busy, it's referenced by the mount. So basically the
> >> only thing this does is working around the NFSv4 issue. But Trond has
> >> a proper solution to that, and a temporary solution could be added to
> >> do_filp_open() rather than burdening do_lookup() with it, no?
> >>
> >
> > (re-adding Trond. I forgot to cc him on this latest set)
> >
> > Self-NAK on this patch...
> >
> > That's my main worry too, and sadly it doesn't seem to be unfounded.
> > This patch adds a lot of extra d_revalidate calls here. I think it's
> > going to be too expensive to do this.
>
> How so? We should only see extra calls if we follow a mount point.
> Currently we call d_revalidate on every path component.
>
> > The only problem I've identified that this fixes is with file bind
> > mounts and I don't get the impression they're that common. Maybe the
> > best thing is to just fix the LAST_BIND symlink case for now and wait
> > for Trond or Al's overhaul of this code.
>
> Well right now following mount points breaks the VFS contract that we
> will revalidate all dentries before we use them. That breaking of the
> contract breaks NFS.
>
> I don't know what else d_revalidate is good for. On the sysfs side
> I only use it to unhash the dentry. Something we don't care about
> from the do_lookup side of things if we have a bind mount.
>
> I'm not clear what kind of changes revalidating a deleted but open
> file will give you on NFS.
>

My concern here is based mainly on a simple experiment I did to fire off
a printk and do a dump_stack every time we call d_revalidate from
force_reval_path. Even a very small set of operations caused that to
get called many, many times, mostly from do_lookup.

You're correct that the current behavior breaks the d_revalidate
"contract". What I'm not sure of is whether that truly breaks anything
beyond file bind mounts.

If it doesn't then we have to ask ourselves -- is it worth a potential
performance hit in a very hot codepath to fix bind mounted files that
live on NFSv4? My current thinking is "no".

Much of that decision is based upon an assumption that file bind mounts
are rarely ever used. If that's the case, then it's probably more
prudent to wait until the VFS has been fixed so that NFSv4 no longer
needs to depend on d_revalidate this way.

The LAST_BIND symlink case is a little more straightforward since the
fix is more targeted. I think that should be reasonable for 2.6.33.

Cheers,
--
Jeff Layton <[email protected]>