On 08/01/2013 12:39 PM, Miklos Szeredi wrote:
> On Tue, Jul 30, 2013 at 03:30:45PM -0400, Ric Wheeler wrote:
>> On 07/30/2013 12:16 PM, Miklos Szeredi wrote:
>>>>> fs/fuse/dir.c | 4 ++++
>>>>> 1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>>>>> index a1d9047..83c217e 100644
>>>>> --- a/fs/fuse/dir.c
>>>>> +++ b/fs/fuse/dir.c
>>>>> @@ -226,6 +226,10 @@ static int fuse_dentry_revalidate(struct dentry
>>>>> *entry, unsigned int flags)
>>>>> if (!err) {
>>>>> struct fuse_inode *fi = get_fuse_inode(inode);
>>>>> if (outarg.nodeid != get_node_id(inode)) {
>>>>> + if (!have_submounts(entry)) {
>>>>> + shrink_dcache_parent(entry);
>>>>> + d_drop(entry);
>>>>> + }
>>> Doing d_drop() on a subtree really has problems. Take this example:
>>>
>>> cd /mnt/foo/bar
>>> mkdir my-mount-point
>>> [->d_revalidate() drops "/mnt/foo"]
>>> mount whatever on ./my-mount-point
>>> cd /
>>>
>>> At this point that mount is not accessible in any way. The only way
>>> to umount it is to lazy umount the parent mount. If the parent mount
>>> was root, then that's not a practical thing to do.
>>>
>>> AFAICS nothing prevents this from happening on NFS and root privs are
>>> not required (e.g. mounting a fuse fs on an NFS home dir).
>>>
> Here's a patch that tries to address the mount issue in NFS (and in FUSE in the
> future). I haven't yet tested it, just interested in whether the concept looks
> OK or not.
>
> Not sure if the ENOENT is the best error. Also I have a little fear that this
> will cause a regression in some weird setup. Possibly we should use a new
> dentry flag for unhashing the directory dentry specifically to invalidate it.
>
> Thanks,
> Miklos
Adding in the NFS list for broader review.
I thought that the issue was primarily in FUSE itself, not seen in practice in NFS?
Ric
>
>
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 87bdb53..5c51217 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1103,6 +1103,34 @@ rename_retry:
> }
> EXPORT_SYMBOL(have_submounts);
>
> +static bool __has_unlinked_ancestor(struct dentry *dentry)
> +{
> + struct dentry *this;
> +
> + for (this = dentry; !IS_ROOT(this); this = this->d_parent) {
> + if (d_unhashed(this))
> + return true;
> + }
> + return false;
> +}
> +
> +/*
> + * Called by mount code to check if the mountpoint is reachable (e.g. NFS can
> + * unhash a directory dentry and then the complete subtree can become
> + * unreachable).
> + */
> +bool has_unlinked_ancestor(struct dentry *dentry)
> +{
> + bool found;
> +
> + /* Need exclusion wrt. have_submounts() */
> + write_seqlock(&rename_lock);
> + found = __has_unlinked_ancestor(dentry);
> + write_sequnlock(&rename_lock);
> +
> + return found;
> +}
> +
> /*
> * Search the dentry child list of the specified parent,
> * and move any unused dentries to the end of the unused
> diff --git a/fs/internal.h b/fs/internal.h
> index 7c5f01c..d232355 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -126,6 +126,7 @@ extern int invalidate_inodes(struct super_block *, bool);
> * dcache.c
> */
> extern struct dentry *__d_alloc(struct super_block *, const struct qstr *);
> +extern bool has_unlinked_ancestor(struct dentry *dentry);
>
> /*
> * read_write.c
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 7b1ca9b..bb92a9c 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -634,6 +634,15 @@ static struct mountpoint *new_mountpoint(struct dentry *dentry)
> }
> dentry->d_flags |= DCACHE_MOUNTED;
> spin_unlock(&dentry->d_lock);
> +
> + if (has_unlinked_ancestor(dentry)) {
> + spin_lock(&dentry->d_lock);
> + dentry->d_flags &= ~DCACHE_MOUNTED;
> + spin_unlock(&dentry->d_lock);
> + kfree(mp);
> + return ERR_PTR(-ENOENT);
> + }
> +
> mp->m_dentry = dentry;
> mp->m_count = 1;
> list_add(&mp->m_hash, chain);
On Thu, Aug 1, 2013 at 8:45 PM, Ric Wheeler <[email protected]> wrote:
>
> Adding in the NFS list for broader review.
>
> I thought that the issue was primarily in FUSE itself, not seen in practice
> in NFS?
I haven't tested NFS (have but a single notebook here) but AFAICS the
issue of not fuse specific but is inherent in the fact that NFS does
d_drop() on a non-empty directory dentry. It does check for submounts
*before*, but it doesn't do anything about submounts *after* (or
during) the d_drop(). It's probably not something people complain
about, because they choose mountpoints to be pretty stable points in
the tree that don't get removed or moved around.
Thanks,
Miklos
>>
>> diff --git a/fs/dcache.c b/fs/dcache.c
>> index 87bdb53..5c51217 100644
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -1103,6 +1103,34 @@ rename_retry:
>> }
>> EXPORT_SYMBOL(have_submounts);
>> +static bool __has_unlinked_ancestor(struct dentry *dentry)
>> +{
>> + struct dentry *this;
>> +
>> + for (this = dentry; !IS_ROOT(this); this = this->d_parent) {
>> + if (d_unhashed(this))
>> + return true;
>> + }
>> + return false;
>> +}
>> +
>> +/*
>> + * Called by mount code to check if the mountpoint is reachable (e.g. NFS
>> can
>> + * unhash a directory dentry and then the complete subtree can become
>> + * unreachable).
>> + */
>> +bool has_unlinked_ancestor(struct dentry *dentry)
>> +{
>> + bool found;
>> +
>> + /* Need exclusion wrt. have_submounts() */
>> + write_seqlock(&rename_lock);
>> + found = __has_unlinked_ancestor(dentry);
>> + write_sequnlock(&rename_lock);
>> +
>> + return found;
>> +}
>> +
>> /*
>> * Search the dentry child list of the specified parent,
>> * and move any unused dentries to the end of the unused
>> diff --git a/fs/internal.h b/fs/internal.h
>> index 7c5f01c..d232355 100644
>> --- a/fs/internal.h
>> +++ b/fs/internal.h
>> @@ -126,6 +126,7 @@ extern int invalidate_inodes(struct super_block *,
>> bool);
>> * dcache.c
>> */
>> extern struct dentry *__d_alloc(struct super_block *, const struct qstr
>> *);
>> +extern bool has_unlinked_ancestor(struct dentry *dentry);
>> /*
>> * read_write.c
>> diff --git a/fs/namespace.c b/fs/namespace.c
>> index 7b1ca9b..bb92a9c 100644
>> --- a/fs/namespace.c
>> +++ b/fs/namespace.c
>> @@ -634,6 +634,15 @@ static struct mountpoint *new_mountpoint(struct
>> dentry *dentry)
>> }
>> dentry->d_flags |= DCACHE_MOUNTED;
>> spin_unlock(&dentry->d_lock);
>> +
>> + if (has_unlinked_ancestor(dentry)) {
>> + spin_lock(&dentry->d_lock);
>> + dentry->d_flags &= ~DCACHE_MOUNTED;
>> + spin_unlock(&dentry->d_lock);
>> + kfree(mp);
>> + return ERR_PTR(-ENOENT);
>> + }
>> +
>> mp->m_dentry = dentry;
>> mp->m_count = 1;
>> list_add(&mp->m_hash, chain);
>
>
On Fri, Aug 2, 2013 at 1:43 PM, Jeff Layton <[email protected]> wrote:
> Ok, took me a couple of times to look over the code but I think I
> understand the problem now...
>
> IIUC, then this patch should only ever cause this to return -ENOENT in
> a situation similar to the one in Anand's reproducer, right? The
> mountpoint-to-be was unlinked in another tree, and thus we found it to
> be invalid in the tree that we're mounting in. If so, then the dentry
> didn't exist at some point during the race window. Returning -ENOENT
> seems reasonable to me in that situation.
Yes, that's one part of it and ENOENT fits perfectly.
The other part is when the subtree is moved on another host. Yes, NFS
can reconnect it, but only if it is accessed through the new location.
Until then it will be inaccessible and the new location of the
mountpoint not discoverable through /proc/mounts or in any way without
outside knowledge.
And there was a pre-existing mount under the moved directory we don't
allow the d_drop in this "move" case either, and the mount is
accessible through the old name. I seem to recall that there was a
discussion about this back then and Linus was quite adamant about
mountpoints not being allowed to be dissolved or moved without an
explicit action on the localhost (i.e. something that happens on
remote hosts shouldn't affect the status or location of mounts on the
localhost).
So what happens in this case:
host1: cd /nfs/foo/bar
host2: mv /nfs/foo /nfs/old-foo
host2: mkdir /nfs/foo
host1: ls /nfs/foo [drops "old-foo" and adds a new foo dentry]
host1: mkdir bin [cwd is now not accessible from root]
host1: mount --bind /bin ./bin [???]
Currently that last one succeeds, with my patch it gives ENOENT, but
that's not the best error, since the mountpoint does exist.
Thanks,
Miklos
On Fri, 2 Aug 2013 16:30:23 +0200
Miklos Szeredi <[email protected]> wrote:
> On Fri, Aug 2, 2013 at 1:43 PM, Jeff Layton <[email protected]> wrote:
> > Ok, took me a couple of times to look over the code but I think I
> > understand the problem now...
> >
> > IIUC, then this patch should only ever cause this to return -ENOENT in
> > a situation similar to the one in Anand's reproducer, right? The
> > mountpoint-to-be was unlinked in another tree, and thus we found it to
> > be invalid in the tree that we're mounting in. If so, then the dentry
> > didn't exist at some point during the race window. Returning -ENOENT
> > seems reasonable to me in that situation.
>
> Yes, that's one part of it and ENOENT fits perfectly.
>
> The other part is when the subtree is moved on another host. Yes, NFS
> can reconnect it, but only if it is accessed through the new location.
> Until then it will be inaccessible and the new location of the
> mountpoint not discoverable through /proc/mounts or in any way without
> outside knowledge.
>
> And there was a pre-existing mount under the moved directory we don't
> allow the d_drop in this "move" case either, and the mount is
> accessible through the old name. I seem to recall that there was a
> discussion about this back then and Linus was quite adamant about
> mountpoints not being allowed to be dissolved or moved without an
> explicit action on the localhost (i.e. something that happens on
> remote hosts shouldn't affect the status or location of mounts on the
> localhost).
>
> So what happens in this case:
>
> host1: cd /nfs/foo/bar
> host2: mv /nfs/foo /nfs/old-foo
> host2: mkdir /nfs/foo
> host1: ls /nfs/foo [drops "old-foo" and adds a new foo dentry]
> host1: mkdir bin [cwd is now not accessible from root]
> host1: mount --bind /bin ./bin [???]
>
> Currently that last one succeeds, with my patch it gives ENOENT, but
> that's not the best error, since the mountpoint does exist.
>
Ok, good point...
That's a tricky situation. We're rejecting the mount there because we
can't _currently_ reach the mountpoint from root. It could become
reachable later though, at which point you could mount on there just
fine...
It almost sounds like it could use a new error code (EUNREACH?). Or,
maybe you could repurpose ENOLINK?
In any case, I'd be inclined not to worry about it and just go with
-ENOENT there. If someone complains we could consider a new error for
that case later.
--
Jeff Layton <[email protected]>
On Fri, 2 Aug 2013 11:02:25 +0200
Miklos Szeredi <[email protected]> wrote:
> On Thu, Aug 1, 2013 at 8:45 PM, Ric Wheeler <[email protected]> wrote:
> >
> > Adding in the NFS list for broader review.
> >
> > I thought that the issue was primarily in FUSE itself, not seen in practice
> > in NFS?
>
> I haven't tested NFS (have but a single notebook here) but AFAICS the
> issue of not fuse specific but is inherent in the fact that NFS does
> d_drop() on a non-empty directory dentry. It does check for submounts
> *before*, but it doesn't do anything about submounts *after* (or
> during) the d_drop(). It's probably not something people complain
> about, because they choose mountpoints to be pretty stable points in
> the tree that don't get removed or moved around.
>
> Thanks,
> Miklos
>
> >>
> >> diff --git a/fs/dcache.c b/fs/dcache.c
> >> index 87bdb53..5c51217 100644
> >> --- a/fs/dcache.c
> >> +++ b/fs/dcache.c
> >> @@ -1103,6 +1103,34 @@ rename_retry:
> >> }
> >> EXPORT_SYMBOL(have_submounts);
> >> +static bool __has_unlinked_ancestor(struct dentry *dentry)
> >> +{
> >> + struct dentry *this;
> >> +
> >> + for (this = dentry; !IS_ROOT(this); this = this->d_parent) {
> >> + if (d_unhashed(this))
> >> + return true;
> >> + }
> >> + return false;
> >> +}
> >> +
> >> +/*
> >> + * Called by mount code to check if the mountpoint is reachable (e.g. NFS
> >> can
> >> + * unhash a directory dentry and then the complete subtree can become
> >> + * unreachable).
> >> + */
> >> +bool has_unlinked_ancestor(struct dentry *dentry)
> >> +{
> >> + bool found;
> >> +
> >> + /* Need exclusion wrt. have_submounts() */
> >> + write_seqlock(&rename_lock);
> >> + found = __has_unlinked_ancestor(dentry);
> >> + write_sequnlock(&rename_lock);
> >> +
> >> + return found;
> >> +}
> >> +
> >> /*
> >> * Search the dentry child list of the specified parent,
> >> * and move any unused dentries to the end of the unused
> >> diff --git a/fs/internal.h b/fs/internal.h
> >> index 7c5f01c..d232355 100644
> >> --- a/fs/internal.h
> >> +++ b/fs/internal.h
> >> @@ -126,6 +126,7 @@ extern int invalidate_inodes(struct super_block *,
> >> bool);
> >> * dcache.c
> >> */
> >> extern struct dentry *__d_alloc(struct super_block *, const struct qstr
> >> *);
> >> +extern bool has_unlinked_ancestor(struct dentry *dentry);
> >> /*
> >> * read_write.c
> >> diff --git a/fs/namespace.c b/fs/namespace.c
> >> index 7b1ca9b..bb92a9c 100644
> >> --- a/fs/namespace.c
> >> +++ b/fs/namespace.c
> >> @@ -634,6 +634,15 @@ static struct mountpoint *new_mountpoint(struct
> >> dentry *dentry)
> >> }
> >> dentry->d_flags |= DCACHE_MOUNTED;
> >> spin_unlock(&dentry->d_lock);
> >> +
> >> + if (has_unlinked_ancestor(dentry)) {
> >> + spin_lock(&dentry->d_lock);
> >> + dentry->d_flags &= ~DCACHE_MOUNTED;
> >> + spin_unlock(&dentry->d_lock);
> >> + kfree(mp);
> >> + return ERR_PTR(-ENOENT);
> >> + }
> >> +
> >> mp->m_dentry = dentry;
> >> mp->m_count = 1;
> >> list_add(&mp->m_hash, chain);
> >
> >
Ok, took me a couple of times to look over the code but I think I
understand the problem now...
IIUC, then this patch should only ever cause this to return -ENOENT in
a situation similar to the one in Anand's reproducer, right? The
mountpoint-to-be was unlinked in another tree, and thus we found it to
be invalid in the tree that we're mounting in. If so, then the dentry
didn't exist at some point during the race window. Returning -ENOENT
seems reasonable to me in that situation.
It might cause some strange setups to regress, but aren't those already
broken anyway? Or does the fact that NFS uses d_materialise_* helpers to
hook these dentries back into the tree help paper over that?
--
Jeff Layton <[email protected]>