2014-01-15 15:17:53

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH] dcache: fix d_splice_alias handling of aliases

From: "J. Bruce Fields" <[email protected]>

d_splice_alias can create duplicate directory aliases (in the !new
case), or (in the new case) d_move without holding appropriate locks.

d_materialise_unique deals with both of these problems. (The latter
seems to be dealt by trylocks (see __d_unalias), which look like they
could cause spurious lookup failures--but that's at least better than
corrupting the dcache.)

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/dcache.c | 25 +------------------------
1 file changed, 1 insertion(+), 24 deletions(-)

Only lightly tested.... If this is right, then we can also just ditch
d_splice_alias completely, and clean up the various d_find_alias's.

I think the only reason we have both d_splice_alias and
d_materialise_unique is that the former was written for exportable
filesystems and the latter for distributed filesystems.

But we have at least one exportable filesystem (fuse) using
d_materialise_unique. And I doubt d_splice_alias was ever completely
correct even for on-disk filesystems.

Am I missing some subtlety?

--b.

diff --git a/fs/dcache.c b/fs/dcache.c
index 4bdb300..da82fa9 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1926,33 +1926,10 @@ EXPORT_SYMBOL(d_obtain_alias);
*/
struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
{
- struct dentry *new = NULL;
-
if (IS_ERR(inode))
return ERR_CAST(inode);

- if (inode && S_ISDIR(inode->i_mode)) {
- spin_lock(&inode->i_lock);
- new = __d_find_alias(inode, 1);
- if (new) {
- BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
- spin_unlock(&inode->i_lock);
- security_d_instantiate(new, inode);
- d_move(new, dentry);
- iput(inode);
- } else {
- /* already taking inode->i_lock, so d_add() by hand */
- __d_instantiate(dentry, inode);
- spin_unlock(&inode->i_lock);
- security_d_instantiate(dentry, inode);
- d_rehash(dentry);
- }
- } else {
- d_instantiate(dentry, inode);
- if (d_unhashed(dentry))
- d_rehash(dentry);
- }
- return new;
+ return d_materialise_unique(dentry, inode);
}
EXPORT_SYMBOL(d_splice_alias);

--
1.7.9.5


2014-01-15 17:34:09

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Wed, 2014-01-15 at 10:17 -0500, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> d_splice_alias can create duplicate directory aliases (in the !new
> case), or (in the new case) d_move without holding appropriate locks.

It can d_move, because the dentry is known to be disconnected, i.e. it
doesn't have a parent for which we could obtain the lock.

> d_materialise_unique deals with both of these problems. (The latter
> seems to be dealt by trylocks (see __d_unalias), which look like they
> could cause spurious lookup failures--but that's at least better than
> corrupting the dcache.)
>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/dcache.c | 25 +------------------------
> 1 file changed, 1 insertion(+), 24 deletions(-)
>
> Only lightly tested.... If this is right, then we can also just ditch
> d_splice_alias completely, and clean up the various d_find_alias's.
>
> I think the only reason we have both d_splice_alias and
> d_materialise_unique is that the former was written for exportable
> filesystems and the latter for distributed filesystems.
>
> But we have at least one exportable filesystem (fuse) using
> d_materialise_unique. And I doubt d_splice_alias was ever completely
> correct even for on-disk filesystems.
>
> Am I missing some subtlety?

One subtle difference is that for a non-directory d_splice_alias() will
reconnect a DCACHE_DISCONNECTED dentry if one exists, while
d_materialise_unique() will not.

Does this matter in practice? The small number of extra dentries
probably does not matter.

Thanks,
Miklos


>
> --b.
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 4bdb300..da82fa9 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1926,33 +1926,10 @@ EXPORT_SYMBOL(d_obtain_alias);
> */
> struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
> {
> - struct dentry *new = NULL;
> -
> if (IS_ERR(inode))
> return ERR_CAST(inode);
>
> - if (inode && S_ISDIR(inode->i_mode)) {
> - spin_lock(&inode->i_lock);
> - new = __d_find_alias(inode, 1);
> - if (new) {
> - BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
> - spin_unlock(&inode->i_lock);
> - security_d_instantiate(new, inode);
> - d_move(new, dentry);
> - iput(inode);
> - } else {
> - /* already taking inode->i_lock, so d_add() by hand */
> - __d_instantiate(dentry, inode);
> - spin_unlock(&inode->i_lock);
> - security_d_instantiate(dentry, inode);
> - d_rehash(dentry);
> - }
> - } else {
> - d_instantiate(dentry, inode);
> - if (d_unhashed(dentry))
> - d_rehash(dentry);
> - }
> - return new;
> + return d_materialise_unique(dentry, inode);
> }
> EXPORT_SYMBOL(d_splice_alias);
>

2014-01-15 17:57:31

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Wed, Jan 15, 2014 at 06:34:56PM +0100, Miklos Szeredi wrote:
> On Wed, 2014-01-15 at 10:17 -0500, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <[email protected]>
> >
> > d_splice_alias can create duplicate directory aliases (in the !new
> > case), or (in the new case) d_move without holding appropriate locks.
>
> It can d_move, because the dentry is known to be disconnected, i.e. it
> doesn't have a parent for which we could obtain the lock.

DCACHE_DISCONNECTED doesn't mean that.

When you lookup a dentry by filehandle that dentry is initially marked
DCACHE_DISCONNECTED. It is cleared only after reconnect_path() has
verified that the dentry is reachable all the way from the root.

So !DCACHE_DISCONNECTED implies that the dentry is connected all the way
up to the root, but the converse is not true.

This has been a source of confusion, but it is explained in
Documentation/filesystems/nfs/Exporting. Recently I've cleaned up a few
odd uses of DCACHE_DISCONNECTED and rewritten reconnect_path(), partly
as an attempt to clarify the situation.

Let me know if any of that doesn't look right to you....

> > d_materialise_unique deals with both of these problems. (The latter
> > seems to be dealt by trylocks (see __d_unalias), which look like they
> > could cause spurious lookup failures--but that's at least better than
> > corrupting the dcache.)
> >
> > Signed-off-by: J. Bruce Fields <[email protected]>
> > ---
> > fs/dcache.c | 25 +------------------------
> > 1 file changed, 1 insertion(+), 24 deletions(-)
> >
> > Only lightly tested.... If this is right, then we can also just ditch
> > d_splice_alias completely, and clean up the various d_find_alias's.
> >
> > I think the only reason we have both d_splice_alias and
> > d_materialise_unique is that the former was written for exportable
> > filesystems and the latter for distributed filesystems.
> >
> > But we have at least one exportable filesystem (fuse) using
> > d_materialise_unique. And I doubt d_splice_alias was ever completely
> > correct even for on-disk filesystems.
> >
> > Am I missing some subtlety?
>
> One subtle difference is that for a non-directory d_splice_alias() will
> reconnect a DCACHE_DISCONNECTED dentry if one exists, while
> d_materialise_unique() will not.

Actually until f80de2cde10350b8d146e375ff8b634e72e6a827 "dcache: don't
clear DCACHE_DISCONNECTED too early", it was the reverse:
d_materialise_unique cleared DISCONNECTED and d_splice_alias (correctly)
did not.

The only place where it should be cleared is reconnect_path().

> Does this matter in practice? The small number of extra dentries
> probably does not matter.

Directories are assumed to have unique aliases. When they don't, the
kernel can deadlock or crash.

--b.

2014-01-15 18:25:19

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Wed, Jan 15, 2014 at 6:57 PM, J. Bruce Fields <[email protected]> wrote:
> On Wed, Jan 15, 2014 at 06:34:56PM +0100, Miklos Szeredi wrote:
>> On Wed, 2014-01-15 at 10:17 -0500, J. Bruce Fields wrote:
>> > From: "J. Bruce Fields" <[email protected]>
>> >
>> > d_splice_alias can create duplicate directory aliases (in the !new
>> > case), or (in the new case) d_move without holding appropriate locks.
>>
>> It can d_move, because the dentry is known to be disconnected, i.e. it
>> doesn't have a parent for which we could obtain the lock.
>
> DCACHE_DISCONNECTED doesn't mean that.

You're right, but I'm also right, because __d_find_alias() will check
IS_ROOT() too. So only "root" disconnected dentries will be moved.

>
> When you lookup a dentry by filehandle that dentry is initially marked
> DCACHE_DISCONNECTED. It is cleared only after reconnect_path() has
> verified that the dentry is reachable all the way from the root.
>
> So !DCACHE_DISCONNECTED implies that the dentry is connected all the way
> up to the root, but the converse is not true.
>
> This has been a source of confusion, but it is explained in
> Documentation/filesystems/nfs/Exporting. Recently I've cleaned up a few
> odd uses of DCACHE_DISCONNECTED and rewritten reconnect_path(), partly
> as an attempt to clarify the situation.
>
> Let me know if any of that doesn't look right to you....
>
>> > d_materialise_unique deals with both of these problems. (The latter
>> > seems to be dealt by trylocks (see __d_unalias), which look like they
>> > could cause spurious lookup failures--but that's at least better tha
>> > corrupting the dcache.)
>> >
>> > Signed-off-by: J. Bruce Fields <[email protected]>
>> > ---
>> > fs/dcache.c | 25 +------------------------
>> > 1 file changed, 1 insertion(+), 24 deletions(-)
>> >
>> > Only lightly tested.... If this is right, then we can also just ditch
>> > d_splice_alias completely, and clean up the various d_find_alias's.
>> >
>> > I think the only reason we have both d_splice_alias and
>> > d_materialise_unique is that the former was written for exportable
>> > filesystems and the latter for distributed filesystems.
>> >
>> > But we have at least one exportable filesystem (fuse) using
>> > d_materialise_unique. And I doubt d_splice_alias was ever completely
>> > correct even for on-disk filesystems.
>> >
>> > Am I missing some subtlety?
>>
>> One subtle difference is that for a non-directory d_splice_alias() will
>> reconnect a DCACHE_DISCONNECTED dentry if one exists, while
>> d_materialise_unique() will not.
>
> Actually until f80de2cde10350b8d146e375ff8b634e72e6a827 "dcache: don't
> clear DCACHE_DISCONNECTED too early", it was the reverse:
> d_materialise_unique cleared DISCONNECTED and d_splice_alias (correctly)
> did not.
>
> The only place where it should be cleared is reconnect_path().
>
>> Does this matter in practice? The small number of extra dentries
>> probably does not matter.
>
> Directories are assumed to have unique aliases. When they don't, the
> kernel can deadlock or crash.

What I meant is that d_materialise_unique() will currently not reuse
disconnected *nondirectory* dentries, hence there may be more aliases
than necessary. This could easily be fixed, though.

Thanks,
Miklos


>
> --b.

2014-01-16 15:41:36

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Wed, Jan 15, 2014 at 07:25:11PM +0100, Miklos Szeredi wrote:
> On Wed, Jan 15, 2014 at 6:57 PM, J. Bruce Fields <[email protected]> wrote:
> > On Wed, Jan 15, 2014 at 06:34:56PM +0100, Miklos Szeredi wrote:
> >> On Wed, 2014-01-15 at 10:17 -0500, J. Bruce Fields wrote:
> >> > From: "J. Bruce Fields" <[email protected]>
> >> >
> >> > d_splice_alias can create duplicate directory aliases (in the !new
> >> > case), or (in the new case) d_move without holding appropriate locks.
> >>
> >> It can d_move, because the dentry is known to be disconnected, i.e. it
> >> doesn't have a parent for which we could obtain the lock.
> >
> > DCACHE_DISCONNECTED doesn't mean that.
>
> You're right, but I'm also right, because __d_find_alias() will check
> IS_ROOT() too. So only "root" disconnected dentries will be moved.

You're right, I forgot that check.

> >> One subtle difference is that for a non-directory d_splice_alias() will
> >> reconnect a DCACHE_DISCONNECTED dentry if one exists, while
> >> d_materialise_unique() will not.
...
> >> Does this matter in practice? The small number of extra dentries
> >> probably does not matter.
> >
> > Directories are assumed to have unique aliases. When they don't, the
> > kernel can deadlock or crash.
>
> What I meant is that d_materialise_unique() will currently not reuse
> disconnected *nondirectory* dentries, hence there may be more aliases
> than necessary. This could easily be fixed, though.

And, sorry, I did miss that you said "non-directory". But I think you
have that backwards: d_splice_alias looks like:

if (inode && S_ISDIR(inode->i_mode)) {
...
} else {
d_instantiate(dentry, inode);
if (d_unhashed(dentry))
d_rehash(dentry);
}

So it ignores any existing aliases in the non-directory case.

d_materialise_unique by contrast calls __d_instantiate_unique, which
looks like it should avoid adding duplicates.

So I think switching everyone to d_materialiase_unique would result in
fewer dentries. But I've never seen any complaint about the issue and
like you don't see a reason this would matter much either way.

--b.

2014-01-16 16:10:22

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Wed, Jan 15, 2014 at 10:17:49AM -0500, bfields wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> d_splice_alias can create duplicate directory aliases (in the !new
> case), or (in the new case) d_move without holding appropriate locks.
>
> d_materialise_unique deals with both of these problems. (The latter
> seems to be dealt by trylocks (see __d_unalias), which look like they
> could cause spurious lookup failures--but that's at least better than
> corrupting the dcache.)
>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/dcache.c | 25 +------------------------
> 1 file changed, 1 insertion(+), 24 deletions(-)
>
> Only lightly tested.... If this is right, then we can also just ditch
> d_splice_alias completely, and clean up the various d_find_alias's.
>
> I think the only reason we have both d_splice_alias and
> d_materialise_unique is that the former was written for exportable
> filesystems and the latter for distributed filesystems.
>
> But we have at least one exportable filesystem (fuse) using
> d_materialise_unique. And I doubt d_splice_alias was ever completely
> correct even for on-disk filesystems.
>
> Am I missing some subtlety?

Hm, I just noticed:

commit 0d0d110720d7960b77c03c9f2597faaff4b484ae
Author: Miklos Szeredi <[email protected]>
Date: Mon Sep 16 14:52:00 2013 +0200

GFS2: d_splice_alias() can't return error

unless it was given an IS_ERR(inode), which isn't the case here. So clean
up the unnecessary error handling in gfs2_create_inode().

This paves the way for real fixes (hence the stable Cc).

Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>
Cc: [email protected]

While the statement is true for the current implementation of
d_splice_alias, I don't think it's actually true for any correct
implementation of d_splice_alias, which must be able to return at least
-ELOOP in the directory case. Does gfs2 need fixing?

--b.

2014-01-16 16:13:34

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Thu, Jan 16, 2014 at 4:41 PM, J. Bruce Fields <[email protected]> wrote:
> And, sorry, I did miss that you said "non-directory". But I think you
> have that backwards: d_splice_alias looks like:
>
> if (inode && S_ISDIR(inode->i_mode)) {
> ...
> } else {
> d_instantiate(dentry, inode);
> if (d_unhashed(dentry))
> d_rehash(dentry);
> }
>
> So it ignores any existing aliases in the non-directory case.

Okay.

>
> d_materialise_unique by contrast calls __d_instantiate_unique, which
> looks like it should avoid adding duplicates.
>
> So I think switching everyone to d_materialiase_unique would result in
> fewer dentries. But I've never seen any complaint about the issue and
> like you don't see a reason this would matter much either way.

So, yes, d_materialise_unique() looks like it has superior
functionality compared to d_splice_alias().

Thanks,
Miklos

2014-01-16 16:16:34

by Steven Whitehouse

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

Hi,

On Thu, 2014-01-16 at 11:10 -0500, J. Bruce Fields wrote:
> On Wed, Jan 15, 2014 at 10:17:49AM -0500, bfields wrote:
> > From: "J. Bruce Fields" <[email protected]>
> >
> > d_splice_alias can create duplicate directory aliases (in the !new
> > case), or (in the new case) d_move without holding appropriate locks.
> >
> > d_materialise_unique deals with both of these problems. (The latter
> > seems to be dealt by trylocks (see __d_unalias), which look like they
> > could cause spurious lookup failures--but that's at least better than
> > corrupting the dcache.)
> >
> > Signed-off-by: J. Bruce Fields <[email protected]>
> > ---
> > fs/dcache.c | 25 +------------------------
> > 1 file changed, 1 insertion(+), 24 deletions(-)
> >
> > Only lightly tested.... If this is right, then we can also just ditch
> > d_splice_alias completely, and clean up the various d_find_alias's.
> >
> > I think the only reason we have both d_splice_alias and
> > d_materialise_unique is that the former was written for exportable
> > filesystems and the latter for distributed filesystems.
> >
> > But we have at least one exportable filesystem (fuse) using
> > d_materialise_unique. And I doubt d_splice_alias was ever completely
> > correct even for on-disk filesystems.
> >
> > Am I missing some subtlety?
>
> Hm, I just noticed:
>
> commit 0d0d110720d7960b77c03c9f2597faaff4b484ae
> Author: Miklos Szeredi <[email protected]>
> Date: Mon Sep 16 14:52:00 2013 +0200
>
> GFS2: d_splice_alias() can't return error
>
> unless it was given an IS_ERR(inode), which isn't the case here. So clean
> up the unnecessary error handling in gfs2_create_inode().
>
> This paves the way for real fixes (hence the stable Cc).
>
> Signed-off-by: Miklos Szeredi <[email protected]>
> Signed-off-by: Steven Whitehouse <[email protected]>
> Cc: [email protected]
>
> While the statement is true for the current implementation of
> d_splice_alias, I don't think it's actually true for any correct
> implementation of d_splice_alias, which must be able to return at least
> -ELOOP in the directory case. Does gfs2 need fixing?
>
> --b.

Yes, in that case, probably in two places,

Steve.

2014-01-16 16:44:17

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Thu, Jan 16, 2014 at 04:15:42PM +0000, Steven Whitehouse wrote:
> Hi,
>
> On Thu, 2014-01-16 at 11:10 -0500, J. Bruce Fields wrote:
> > On Wed, Jan 15, 2014 at 10:17:49AM -0500, bfields wrote:
> > > From: "J. Bruce Fields" <[email protected]>
> > >
> > > d_splice_alias can create duplicate directory aliases (in the !new
> > > case), or (in the new case) d_move without holding appropriate locks.
> > >
> > > d_materialise_unique deals with both of these problems. (The latter
> > > seems to be dealt by trylocks (see __d_unalias), which look like they
> > > could cause spurious lookup failures--but that's at least better than
> > > corrupting the dcache.)
> > >
> > > Signed-off-by: J. Bruce Fields <[email protected]>
> > > ---
> > > fs/dcache.c | 25 +------------------------
> > > 1 file changed, 1 insertion(+), 24 deletions(-)
> > >
> > > Only lightly tested.... If this is right, then we can also just ditch
> > > d_splice_alias completely, and clean up the various d_find_alias's.
> > >
> > > I think the only reason we have both d_splice_alias and
> > > d_materialise_unique is that the former was written for exportable
> > > filesystems and the latter for distributed filesystems.
> > >
> > > But we have at least one exportable filesystem (fuse) using
> > > d_materialise_unique. And I doubt d_splice_alias was ever completely
> > > correct even for on-disk filesystems.
> > >
> > > Am I missing some subtlety?
> >
> > Hm, I just noticed:
> >
> > commit 0d0d110720d7960b77c03c9f2597faaff4b484ae
> > Author: Miklos Szeredi <[email protected]>
> > Date: Mon Sep 16 14:52:00 2013 +0200
> >
> > GFS2: d_splice_alias() can't return error
> >
> > unless it was given an IS_ERR(inode), which isn't the case here. So clean
> > up the unnecessary error handling in gfs2_create_inode().
> >
> > This paves the way for real fixes (hence the stable Cc).
> >
> > Signed-off-by: Miklos Szeredi <[email protected]>
> > Signed-off-by: Steven Whitehouse <[email protected]>
> > Cc: [email protected]
> >
> > While the statement is true for the current implementation of
> > d_splice_alias, I don't think it's actually true for any correct
> > implementation of d_splice_alias, which must be able to return at least
> > -ELOOP in the directory case. Does gfs2 need fixing?
> >
> > --b.
>
> Yes, in that case, probably in two places,

Something like this?

(Except: is the inode cleanup right in the first chunk? And in the
second chunk the cleanup could maybe be organized better even if I got
it right....)

--b.

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 7119504..19e0924 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -585,6 +585,9 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
error = PTR_ERR(inode);
if (!IS_ERR(inode)) {
d = d_splice_alias(inode, dentry);
+ error = PTR_ERR(d);
+ if (IS_ERR(d))
+ goto fail_gunlock;
error = 0;
if (file) {
if (S_ISREG(inode->i_mode)) {
@@ -779,6 +782,11 @@ static struct dentry *__gfs2_lookup(struct inode *dir, struct dentry *dentry,
}

d = d_splice_alias(inode, dentry);
+ if (IS_ERR(d)) {
+ iput(inode);
+ gfs2_glock_dq_uninit(&gh);
+ return ERR_PTR(error);
+ }
if (file && S_ISREG(inode->i_mode))
error = finish_open(file, dentry, gfs2_open_common, opened);

2014-01-16 16:54:14

by Bob Peterson

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

----- Original Message -----
| Something like this?
(snip)
| @@ -779,6 +782,11 @@ static struct dentry *__gfs2_lookup(struct inode *dir,
| struct dentry *dentry,
| }
|
| d = d_splice_alias(inode, dentry);
| + if (IS_ERR(d)) {
| + iput(inode);
| + gfs2_glock_dq_uninit(&gh);
| + return ERR_PTR(error);

---------------------------------^
Shouldn't that be ERR_PTR(d)?

Bob Peterson
Red Hat File Systems

2014-01-16 18:51:14

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Thu, Jan 16, 2014 at 11:54:07AM -0500, Bob Peterson wrote:
> ----- Original Message -----
> | Something like this?
> (snip)
> | @@ -779,6 +782,11 @@ static struct dentry *__gfs2_lookup(struct inode *dir,
> | struct dentry *dentry,
> | }
> |
> | d = d_splice_alias(inode, dentry);
> | + if (IS_ERR(d)) {
> | + iput(inode);
> | + gfs2_glock_dq_uninit(&gh);
> | + return ERR_PTR(error);
>
> ---------------------------------^
> Shouldn't that be ERR_PTR(d)?

Oops, yeah--well, actually just "d" I guess.

This is what I've got for what it's worth.

--b.

commit 6fba5295019b52a03d76c9e9570952466051a7a6
Author: J. Bruce Fields <[email protected]>
Date: Thu Jan 16 11:44:53 2014 -0500

gfs2: revert "GFS2: d_splice_alias() can't return error"

0d0d110720d7960b77c03c9f2597faaff4b484ae asserts that "d_splice_alias()
can't return error unless it was given an IS_ERR(inode)".

That was true of the implementation of d_splice_alias, but this is
really a problem with d_splice_alias: at a minimum it should be able to
return -ELOOP in the case where inserting the given dentry would cause a
directory loop.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 7119504..3f44902 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -585,6 +585,9 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
error = PTR_ERR(inode);
if (!IS_ERR(inode)) {
d = d_splice_alias(inode, dentry);
+ error = PTR_ERR(d);
+ if (IS_ERR(d))
+ goto fail_gunlock;
error = 0;
if (file) {
if (S_ISREG(inode->i_mode)) {
@@ -779,6 +782,11 @@ static struct dentry *__gfs2_lookup(struct inode *dir, struct dentry *dentry,
}

d = d_splice_alias(inode, dentry);
+ if (IS_ERR(d)) {
+ iput(inode);
+ gfs2_glock_dq_uninit(&gh);
+ return d;
+ }
if (file && S_ISREG(inode->i_mode))
error = finish_open(file, dentry, gfs2_open_common, opened);

2014-01-17 10:10:23

by Steven Whitehouse

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

Hi,

On Thu, 2014-01-16 at 13:51 -0500, J. Bruce Fields wrote:
> On Thu, Jan 16, 2014 at 11:54:07AM -0500, Bob Peterson wrote:
> > ----- Original Message -----
> > | Something like this?
> > (snip)
> > | @@ -779,6 +782,11 @@ static struct dentry *__gfs2_lookup(struct inode *dir,
> > | struct dentry *dentry,
> > | }
> > |
> > | d = d_splice_alias(inode, dentry);
> > | + if (IS_ERR(d)) {
> > | + iput(inode);
> > | + gfs2_glock_dq_uninit(&gh);
> > | + return ERR_PTR(error);
> >
> > ---------------------------------^
> > Shouldn't that be ERR_PTR(d)?
>
> Oops, yeah--well, actually just "d" I guess.
>
> This is what I've got for what it's worth.
>
> --b.
>

I think the patch looks ok to me. Did you give it a spin to test it? If
so then I will put it in the GFS2 -nmw tree,

Steve.

> commit 6fba5295019b52a03d76c9e9570952466051a7a6
> Author: J. Bruce Fields <[email protected]>
> Date: Thu Jan 16 11:44:53 2014 -0500
>
> gfs2: revert "GFS2: d_splice_alias() can't return error"
>
> 0d0d110720d7960b77c03c9f2597faaff4b484ae asserts that "d_splice_alias()
> can't return error unless it was given an IS_ERR(inode)".
>
> That was true of the implementation of d_splice_alias, but this is
> really a problem with d_splice_alias: at a minimum it should be able to
> return -ELOOP in the case where inserting the given dentry would cause a
> directory loop.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
> index 7119504..3f44902 100644
> --- a/fs/gfs2/inode.c
> +++ b/fs/gfs2/inode.c
> @@ -585,6 +585,9 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
> error = PTR_ERR(inode);
> if (!IS_ERR(inode)) {
> d = d_splice_alias(inode, dentry);
> + error = PTR_ERR(d);
> + if (IS_ERR(d))
> + goto fail_gunlock;
> error = 0;
> if (file) {
> if (S_ISREG(inode->i_mode)) {
> @@ -779,6 +782,11 @@ static struct dentry *__gfs2_lookup(struct inode *dir, struct dentry *dentry,
> }
>
> d = d_splice_alias(inode, dentry);
> + if (IS_ERR(d)) {
> + iput(inode);
> + gfs2_glock_dq_uninit(&gh);
> + return d;
> + }
> if (file && S_ISREG(inode->i_mode))
> error = finish_open(file, dentry, gfs2_open_common, opened);
>

2014-01-17 12:17:31

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Wed, Jan 15, 2014 at 10:17:49AM -0500, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> d_splice_alias can create duplicate directory aliases (in the !new
> case), or (in the new case) d_move without holding appropriate locks.
>
> d_materialise_unique deals with both of these problems. (The latter
> seems to be dealt by trylocks (see __d_unalias), which look like they
> could cause spurious lookup failures--but that's at least better than
> corrupting the dcache.)

I'm a bit worried about those spurious failures, maybe we should
retry in that case?

Also looking over the changes I wonder if the explicit cecking for
aliases for every non-directory might have a major performance impact,
all the dcache growling already was a major issues in NFS workloads
years ago and I dumb it's become any better.

Also looking at this area I'd like to suggest that if you end up
merging the two I'd continue using the d_splice_alias name and
calling conventions.

Also the inode == NULL case really should be split out from
d_materialise_unique into a separate helper. It shares almost no
code, is entirely undocumented to the point that I don't really
understand what the purpose is, and the only caller that can get
there (fuse) already branches around that case in the caller anyway.

2014-01-17 15:39:51

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Fri, Jan 17, 2014 at 04:17:23AM -0800, Christoph Hellwig wrote:
> On Wed, Jan 15, 2014 at 10:17:49AM -0500, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <[email protected]>
> >
> > d_splice_alias can create duplicate directory aliases (in the !new
> > case), or (in the new case) d_move without holding appropriate locks.
> >
> > d_materialise_unique deals with both of these problems. (The latter
> > seems to be dealt by trylocks (see __d_unalias), which look like they
> > could cause spurious lookup failures--but that's at least better than
> > corrupting the dcache.)
>
> I'm a bit worried about those spurious failures, maybe we should
> retry in that case?

Maybe so. I'm not sure how. d_materialise_unique is called from lookup
and we'd need to at least drop the parent i_mutex to give a concurrent
rename a chance to progress.

I think NFS or cluster filesystem clients could hit this case with:

host A host B
--------- -------------------------
process 1 process 1 process 2
--------- --------- ---------

mkdir foo/X
mv foo/X bar/
stat bar/X mv baz qux


When (B,1) looks up X in bar it finds that X still has an alias in foo,
tries to rename that alias to bar/X, but can't because the current
baz->qux rename is holding the rename mutex. So __d_unalias and the
lookup return -EBUSY.

None of those operations are particularly fast, so I'm a bit surprised
we haven't already heard complaints. I must be missing some reason this
doesn't happen. I guess I should set up a test.

> Also looking over the changes I wonder if the explicit cecking for
> aliases for every non-directory might have a major performance impact,
> all the dcache growling already was a major issues in NFS workloads
> years ago and I dumb it's become any better.

This only happens on the first (uncached) lookup. So we've already
acquired a bunch of locks and probably done a round trip to a disk or a
server--is walking a (typically short) list really something to worry
about?

> Also looking at this area I'd like to suggest that if you end up
> merging the two I'd continue using the d_splice_alias name and
> calling conventions.

OK, I guess I don't care which one we keep.

> Also the inode == NULL case really should be split out from
> d_materialise_unique into a separate helper. It shares almost no
> code, is entirely undocumented to the point that I don't really
> understand what the purpose is, and the only caller that can get
> there (fuse) already branches around that case in the caller anyway.

I think I see what you mean, I can fix that.

--b.

2014-01-17 18:04:53

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Fri, Jan 17, 2014 at 10:04:30AM +0000, Steven Whitehouse wrote:
> Hi,
>
> On Thu, 2014-01-16 at 13:51 -0500, J. Bruce Fields wrote:
> > On Thu, Jan 16, 2014 at 11:54:07AM -0500, Bob Peterson wrote:
> > > ----- Original Message -----
> > > | Something like this?
> > > (snip)
> > > | @@ -779,6 +782,11 @@ static struct dentry *__gfs2_lookup(struct inode *dir,
> > > | struct dentry *dentry,
> > > | }
> > > |
> > > | d = d_splice_alias(inode, dentry);
> > > | + if (IS_ERR(d)) {
> > > | + iput(inode);
> > > | + gfs2_glock_dq_uninit(&gh);
> > > | + return ERR_PTR(error);
> > >
> > > ---------------------------------^
> > > Shouldn't that be ERR_PTR(d)?
> >
> > Oops, yeah--well, actually just "d" I guess.
> >
> > This is what I've got for what it's worth.
> >
> > --b.
> >
>
> I think the patch looks ok to me. Did you give it a spin to test it?

Sure, just
mkfs.gfs2 -p lock_nolock /dev/vdb
mount /dev/vdb /mnt/
cd cthon
NFSTESTDIR=/mnt/TMP ./runtests -a -f

Which doesn't test much of anything (we'd need to inject artificial
errors for now to test the error paths), but at least it doesn't blow up
in some obvious way....

> If so then I will put it in the GFS2 -nmw tree,

Thanks!

--b.

>
> Steve.
>
> > commit 6fba5295019b52a03d76c9e9570952466051a7a6
> > Author: J. Bruce Fields <[email protected]>
> > Date: Thu Jan 16 11:44:53 2014 -0500
> >
> > gfs2: revert "GFS2: d_splice_alias() can't return error"
> >
> > 0d0d110720d7960b77c03c9f2597faaff4b484ae asserts that "d_splice_alias()
> > can't return error unless it was given an IS_ERR(inode)".
> >
> > That was true of the implementation of d_splice_alias, but this is
> > really a problem with d_splice_alias: at a minimum it should be able to
> > return -ELOOP in the case where inserting the given dentry would cause a
> > directory loop.
> >
> > Signed-off-by: J. Bruce Fields <[email protected]>
> >
> > diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
> > index 7119504..3f44902 100644
> > --- a/fs/gfs2/inode.c
> > +++ b/fs/gfs2/inode.c
> > @@ -585,6 +585,9 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
> > error = PTR_ERR(inode);
> > if (!IS_ERR(inode)) {
> > d = d_splice_alias(inode, dentry);
> > + error = PTR_ERR(d);
> > + if (IS_ERR(d))
> > + goto fail_gunlock;
> > error = 0;
> > if (file) {
> > if (S_ISREG(inode->i_mode)) {
> > @@ -779,6 +782,11 @@ static struct dentry *__gfs2_lookup(struct inode *dir, struct dentry *dentry,
> > }
> >
> > d = d_splice_alias(inode, dentry);
> > + if (IS_ERR(d)) {
> > + iput(inode);
> > + gfs2_glock_dq_uninit(&gh);
> > + return d;
> > + }
> > if (file && S_ISREG(inode->i_mode))
> > error = finish_open(file, dentry, gfs2_open_common, opened);
> >
>
>

2014-01-17 21:04:12

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Fri, Jan 17, 2014 at 10:39:17AM -0500, J. Bruce Fields wrote:
> On Fri, Jan 17, 2014 at 04:17:23AM -0800, Christoph Hellwig wrote:
> > Also the inode == NULL case really should be split out from
> > d_materialise_unique into a separate helper. It shares almost no
> > code, is entirely undocumented to the point that I don't really
> > understand what the purpose is, and the only caller that can get
> > there (fuse) already branches around that case in the caller anyway.
>
> I think I see what you mean, I can fix that.

Actually:

- two callers (fuse and nfs) take advantage of the NULL case.

- d_splice_alias handles inode == NULL in the same way, and
almost every caller takes advantage of that.

So at least we wouldn't want to actually make the caller handle this
case.

But maybe there's still some opportunity for cleanup or documentation.

--b.

2014-01-17 21:27:25

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] dcache: fix d_splice_alias handling of aliases

On Fri, Jan 17, 2014 at 04:03:43PM -0500, J. Bruce Fields wrote:
> - d_splice_alias handles inode == NULL in the same way,

Actually, not exactly; simplifying a bit, in the NULL case they do:

d_splice_alias:

__d_instantiate(dentry, NULL);
security_d_instantiate(dentry, NULL);
if (d_unhashed(dentry))
d_rehash(dentry);

d_materialise_unique:

BUG_ON(!d_unhashed(dentry));

__d_instantiate(dentry, NULL);
d_rehash(dentry);
security_d_instantiate(dentry, NULL);

and a comment on d_splice_alias says

Cluster filesystems may call this function with a negative,
hashed dentry. In that case, we know that the inode will be a
regular file, and also this will only occur during atomic_open.

I don't understand those callers. But I guess it would be easy enough
to handle in d_materialise_unique.

--b.

2014-01-23 21:27:32

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH] dcache: make d_splice_alias use d_materialise_unique

From: "J. Bruce Fields" <[email protected]>

d_splice_alias can create duplicate directory aliases (in the !new
case), or (in the new case) d_move directories without holding
appropriate locks.

d_materialise_unique deals with both of these problems. (The latter
seems to be dealt by trylocks (see __d_unalias), which look like they
could cause spurious lookup failures--but that's at least better than
corrupting the dcache.)

We have to fix up a couple minor differences between d_splice_alias and
d_materialise_unique:
- d_splice_alias also handles IS_ERR(inode)
- d_splice_alias allows already-hashed dentries in one special
case.

We keep the d_splice_alias name and calling convention and deprecate
d_materialise_unique, which has fewer callers.

Also add some documentation.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/dcache.c | 96 ++++++++++++++++++++++-------------------------------------
1 file changed, 35 insertions(+), 61 deletions(-)

Here's a revised patch.

If it looks reasonable then there are some further minor simplifications
possible. (See git://linux-nfs.org/~bfields/linux-topics.git for-viro.)

diff --git a/fs/dcache.c b/fs/dcache.c
index 4bdb300..ec43194 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1765,7 +1765,7 @@ EXPORT_SYMBOL(d_instantiate_unique);
* @inode: inode to attach to this dentry
*
* Fill in inode information in the entry. If a directory alias is found, then
- * return an error (and drop inode). Together with d_materialise_unique() this
+ * return an error (and drop inode). Together with d_splice_alias() this
* guarantees that a directory inode may never have more than one alias.
*/
int d_instantiate_no_diralias(struct dentry *entry, struct inode *inode)
@@ -1905,58 +1905,6 @@ struct dentry *d_obtain_alias(struct inode *inode)
EXPORT_SYMBOL(d_obtain_alias);

/**
- * d_splice_alias - splice a disconnected dentry into the tree if one exists
- * @inode: the inode which may have a disconnected dentry
- * @dentry: a negative dentry which we want to point to the inode.
- *
- * If inode is a directory and has a 'disconnected' dentry (i.e. IS_ROOT and
- * DCACHE_DISCONNECTED), then d_move that in place of the given dentry
- * and return it, else simply d_add the inode to the dentry and return NULL.
- *
- * This is needed in the lookup routine of any filesystem that is exportable
- * (via knfsd) so that we can build dcache paths to directories effectively.
- *
- * If a dentry was found and moved, then it is returned. Otherwise NULL
- * is returned. This matches the expected return value of ->lookup.
- *
- * Cluster filesystems may call this function with a negative, hashed dentry.
- * In that case, we know that the inode will be a regular file, and also this
- * will only occur during atomic_open. So we need to check for the dentry
- * being already hashed only in the final case.
- */
-struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
-{
- struct dentry *new = NULL;
-
- if (IS_ERR(inode))
- return ERR_CAST(inode);
-
- if (inode && S_ISDIR(inode->i_mode)) {
- spin_lock(&inode->i_lock);
- new = __d_find_alias(inode, 1);
- if (new) {
- BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
- spin_unlock(&inode->i_lock);
- security_d_instantiate(new, inode);
- d_move(new, dentry);
- iput(inode);
- } else {
- /* already taking inode->i_lock, so d_add() by hand */
- __d_instantiate(dentry, inode);
- spin_unlock(&inode->i_lock);
- security_d_instantiate(dentry, inode);
- d_rehash(dentry);
- }
- } else {
- d_instantiate(dentry, inode);
- if (d_unhashed(dentry))
- d_rehash(dentry);
- }
- return new;
-}
-EXPORT_SYMBOL(d_splice_alias);
-
-/**
* d_add_ci - lookup or allocate new dentry with case-exact name
* @inode: the inode case-insensitive lookup has found
* @dentry: the negative dentry that was passed to the parent's lookup func
@@ -2716,19 +2664,37 @@ static void __d_materialise_dentry(struct dentry *dentry, struct dentry *anon)
}

/**
- * d_materialise_unique - introduce an inode into the tree
- * @dentry: candidate dentry
+ * d_splice_alias - introduce an inode into the tree
* @inode: inode to bind to the dentry, to which aliases may be attached
+ * @dentry: candidate dentry
+ *
+ * Introduces a dentry into the tree, using either the provided dentry
+ * or, if appropriate, a preexisting alias for the same inode. Caller must
+ * hold the i_mutex of the parent directory.
+ *
+ * The inode may also be an error or NULL; in the former case the error is
+ * just passed through, in the latter we hash and instantiate the negative
+ * dentry. This allows filesystems to use d_splice_alias as the
+ * unconditional last step of their lookup method.
+ *
+ * d_splice_alias guarantees that directories will continue to have at
+ * most one alias, by moving an existing alias if necessary. If doing
+ * so would create a directory loop, it will fail with -ELOOP.
+ *
+ * d_splice_alias makes no such guarantee for files, but may still
+ * use a preexisting alias when it's convenient.
*
- * Introduces an dentry into the tree, substituting an extant disconnected
- * root directory alias in its place if there is one. Caller must hold the
- * i_mutex of the parent directory.
+ * Note that d_splice_alias normally expects an unhashed dentry, the single
+ * exception being that cluster filesystems may call this function
+ * during atomic_open with a negative hashed dentry, and with inode a
+ * regular file.
*/
-struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode)
+struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
{
struct dentry *actual;

- BUG_ON(!d_unhashed(dentry));
+ if (IS_ERR(inode))
+ return ERR_CAST(inode);

if (!inode) {
actual = dentry;
@@ -2788,7 +2754,8 @@ struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode)

spin_lock(&actual->d_lock);
found:
- _d_rehash(actual);
+ if (d_unhashed(actual))
+ _d_rehash(actual);
spin_unlock(&actual->d_lock);
spin_unlock(&inode->i_lock);
out_nolock:
@@ -2800,6 +2767,13 @@ out_nolock:
iput(inode);
return actual;
}
+EXPORT_SYMBOL(d_splice_alias);
+
+/* deprecated synonym for d_splice_alias(inode, dentry): */
+struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode)
+{
+ return d_splice_alias(inode, dentry);
+}
EXPORT_SYMBOL_GPL(d_materialise_unique);

static int prepend(char **buffer, int *buflen, const char *str, int namelen)
--
1.7.9.5

2014-01-31 18:43:05

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] dcache: make d_splice_alias use d_materialise_unique

On Thu, Jan 23, 2014 at 04:27:00PM -0500, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> d_splice_alias can create duplicate directory aliases (in the !new
> case), or (in the new case) d_move directories without holding
> appropriate locks.

Details, please. In the new case, we have IS_ROOT() alias found;
what locks would that need? Note that d_materialise_unique() won't
bother with __d_unalias() in such case - it does what d_move() would've
done, without taking any mutex.

In the !new case, we'd need a preexisting dentry alias, complete with
parent. IOW, that's the case when directory already in the tree
has been found during lookup from another parent. In which case
we shouldn't be using d_splice_alias() at all, as it is (and it
certainly can't happen for any local fs).

Now, I agree that merging that with d_materialise_unique() might be
a good idea, but commit message is wrong as it, AFAICS.

2014-01-31 19:48:34

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] dcache: make d_splice_alias use d_materialise_unique

On Fri, Jan 31, 2014 at 06:42:58PM +0000, Al Viro wrote:
> On Thu, Jan 23, 2014 at 04:27:00PM -0500, J. Bruce Fields wrote:
> > From: "J. Bruce Fields" <[email protected]>
> >
> > d_splice_alias can create duplicate directory aliases (in the !new
> > case), or (in the new case) d_move directories without holding
> > appropriate locks.
>
> Details, please. In the new case, we have IS_ROOT() alias found;
> what locks would that need? Note that d_materialise_unique() won't
> bother with __d_unalias() in such case - it does what d_move() would've
> done, without taking any mutex.

Of course you're right, and Miklos had pointed this out already and I
forgot to update the changelog. Apologies!

> In the !new case, we'd need a preexisting dentry alias, complete with
> parent. IOW, that's the case when directory already in the tree
> has been found during lookup from another parent. In which case
> we shouldn't be using d_splice_alias() at all, as it is (and it
> certainly can't happen for any local fs).

Yes, except: won't a local filesystem will still hit this case on a
filesystem that's corrupted to link a directory into multiple parents?

Though in that case arguably the right behavior might be, say, WARN and
return -EIO.

> Now, I agree that merging that with d_materialise_unique() might be
> a good idea, but commit message is wrong as it, AFAICS.

Agreed, I'll fix and resend.

Though now I wonder whether it's worth keeping two different interfaces,
one for the case when finding a parent in a different directory is an
error and one for the case when it's normal and you'd just like it fixed
up.

(Then one remaining thing I don't understand is how to make that fixing
up reliable. Or is there some reason nobody hits the _EBUSY case of
__d_unalias?)

--b.

2014-02-06 17:04:18

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] dcache: make d_splice_alias use d_materialise_unique

On Fri, Jan 31, 2014 at 02:47:58PM -0500, J. Bruce Fields wrote:
> (Then one remaining thing I don't understand is how to make that fixing
> up reliable. Or is there some reason nobody hits the _EBUSY case of
> __d_unalias?)

In fact, a reproducer found thanks to Hideshi Yamaoka:

On server:

while true; do
mv /exports/DIR /exports/TO/DIR
mv /exports/TO/DIR /exports/DIR
done

On client:

mount -olookupcache=pos /mnt

while true; do
ls /mnt/TO;
done

Also on client:

while true; do
strace -e open cat /mnt/DIR/test.txt 2>&1 | grep EBUSY
done

Once all three of those loops are running I hit

open("/mnt/DIR/test.txt", O_RDONLY) = -1 EBUSY

very quickly.

The "lookupcache=pos" isn't really necessary but makes the reproducer
more reliable.

(Originally this was seen on a single client: the client itself was
doing the renames but also continually killing the second mv. I suspect
that means the client sends the RENAME but then fails to update its
dcache, the result being again that the client's dcache is out of sync
with the server's tree and hence lookup is stuck trying to grab a dentry
from another directory.)

Is there some solution short of making ->lookup callers drop the i_mutex
and retry???

--b.