2010-11-12 18:43:55

by J. Bruce Fields

[permalink] [raw]
Subject: lifetime of DCACHE_DISCONECTED dentries

DCACHE_DISCONECTED dentries aren't always getting destroyed as soon as
I'd have expected in the NFSv4 case.

I'm not sure what the right fix is; any ideas? The below at least
demonstrates the problem.

--b.

commit bb125ffbe5fc3f80ac7a5b20f51cc542c175cd49
Author: J. Bruce Fields <[email protected]>
Date: Thu Nov 11 19:22:00 2010 -0500

dput: free DCACHE_DISCONNECTED dentries sooner

DCACHE_DISCONECTED dentries are normally left around for the benefit of
future nfsd operations. But there's no point keeping them around once
the inode has been deleted.

Without this patch

client$ mount -tnfs4 server:/export/ /mnt/
client$ tail -f /mnt/FOO
...
server$ df -i /export
server$ rm /export/FOO
(^C the tail -f)
server$ df -i /export
server$ echo 2 >/proc/sys/vm/drop_caches
server$ df -i /export

the df's will show that the inode is not freed on the filesystem until
the last step, when it could have been freed after killing the client's
tail -f. On-disk data won't be deallocated either, leading to possible
spurious ENOSPC.

This occurs because when the client does the close, it arrives in a
compound with a putfh and a close, processed like:

- putfh: look up the filehandle. The only alias found for the
inode will be DCACHE_UNHASHED alias referenced by the filp
associated with the nfsd open. d_obtain_alias() doesn't like
this, so it creates a new DCACHE_DISCONECTED dentry and
returns that instead.
- close: closes the existing filp, which is destroyed
immediately by dput() since it's DCACHE_UNHASHED.
- end of the compound: release the reference to the current
filehandle, and dput() the new DCACHE_DISCONECTED dentry,
which gets put on the unused list instead of being destroyed
immediately.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/dcache.c b/fs/dcache.c
index 28fa7e5..5132f13 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -241,6 +241,10 @@ repeat:
/* Unreachable? Get rid of it */
if (d_unhashed(dentry))
goto kill_it;
+ if (dentry->d_flags & DCACHE_DISCONNECTED
+ && dentry->d_inode
+ && dentry->d_inode->i_nlink == 0)
+ goto kill_it;
if (list_empty(&dentry->d_lru)) {
dentry->d_flags |= DCACHE_REFERENCED;
dentry_lru_add(dentry);


2010-11-15 17:48:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: lifetime of DCACHE_DISCONECTED dentries

On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote:
> On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <[email protected]> wrote:
> > DCACHE_DISCONECTED dentries aren't always getting destroyed as soon as
> > I'd have expected in the NFSv4 case.
> >
> > I'm not sure what the right fix is; any ideas?  The below at least
> > demonstrates the problem.
> >
> > --b.
> >
> > commit bb125ffbe5fc3f80ac7a5b20f51cc542c175cd49
> > Author: J. Bruce Fields <[email protected]>
> > Date:   Thu Nov 11 19:22:00 2010 -0500
> >
> >    dput: free DCACHE_DISCONNECTED dentries sooner
> >
> >    DCACHE_DISCONECTED dentries are normally left around for the benefit of
> >    future nfsd operations.  But there's no point keeping them around once
> >    the inode has been deleted.
> >
> >    Without this patch
> >
> >        client$ mount -tnfs4 server:/export/ /mnt/
> >        client$ tail -f /mnt/FOO
> >        ...
> >        server$ df -i /export
> >        server$ rm /export/FOO
> >        (^C the tail -f)
> >        server$ df -i /export
> >        server$ echo 2 >/proc/sys/vm/drop_caches
> >        server$ df -i /export
> >
> >    the df's will show that the inode is not freed on the filesystem until
> >    the last step, when it could have been freed after killing the client's
> >    tail -f.  On-disk data won't be deallocated either, leading to possible
> >    spurious ENOSPC.
> >
> >    This occurs because when the client does the close, it arrives in a
> >    compound with a putfh and a close, processed like:
> >
> >        - putfh: look up the filehandle.  The only alias found for the
> >          inode will be DCACHE_UNHASHED alias referenced by the filp
> >          associated with the nfsd open.  d_obtain_alias() doesn't like
> >          this, so it creates a new DCACHE_DISCONECTED dentry and
> >          returns that instead.
>
> This seems to be where the thing goes wrong. It isn't a hashed dentry at
> this point here, so d_obtain_alias should not be making one.

Sounds sensible. (But can you think of any actual bugs that will result
from trying to add a new hashed dentry in this case?)

> I think the inode i_nlink games are much more appropriate on this side of
> the equation, rather than the dput side (after all, d_obtain_alias is setting
> up an alias for the inode).
>
> Can you even put the link check into __d_find_alias?
>
> - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
> + if (S_ISDIR(inode->i_mode) || !inode->i_nlink ||
> !d_unhashed(alias)) {
>
> Something like that?

The immediate result of that would be for the close rpc (or any rpc's
sent after the file was unlinked) to fail with ESTALE.

But nfsd already holds an open file in this case, and you could argue
that it should be using that from the start.

So, we could modify nfsd to add a hash mapping filehandles to the filp's
that it knows about, and have nfsd consult that hash before calling
dentry_to_fh.

--b.

2010-11-13 11:53:15

by Nicholas Piggin

[permalink] [raw]
Subject: Re: lifetime of DCACHE_DISCONECTED dentries

On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <[email protected]> wrote:
> DCACHE_DISCONECTED dentries aren't always getting destroyed as soon as
> I'd have expected in the NFSv4 case.
>
> I'm not sure what the right fix is; any ideas? ?The below at least
> demonstrates the problem.
>
> --b.
>
> commit bb125ffbe5fc3f80ac7a5b20f51cc542c175cd49
> Author: J. Bruce Fields <[email protected]>
> Date: ? Thu Nov 11 19:22:00 2010 -0500
>
> ? ?dput: free DCACHE_DISCONNECTED dentries sooner
>
> ? ?DCACHE_DISCONECTED dentries are normally left around for the benefit of
> ? ?future nfsd operations. ?But there's no point keeping them around once
> ? ?the inode has been deleted.
>
> ? ?Without this patch
>
> ? ? ? ?client$ mount -tnfs4 server:/export/ /mnt/
> ? ? ? ?client$ tail -f /mnt/FOO
> ? ? ? ?...
> ? ? ? ?server$ df -i /export
> ? ? ? ?server$ rm /export/FOO
> ? ? ? ?(^C the tail -f)
> ? ? ? ?server$ df -i /export
> ? ? ? ?server$ echo 2 >/proc/sys/vm/drop_caches
> ? ? ? ?server$ df -i /export
>
> ? ?the df's will show that the inode is not freed on the filesystem until
> ? ?the last step, when it could have been freed after killing the client's
> ? ?tail -f. ?On-disk data won't be deallocated either, leading to possible
> ? ?spurious ENOSPC.
>
> ? ?This occurs because when the client does the close, it arrives in a
> ? ?compound with a putfh and a close, processed like:
>
> ? ? ? ?- putfh: look up the filehandle. ?The only alias found for the
> ? ? ? ? ?inode will be DCACHE_UNHASHED alias referenced by the filp
> ? ? ? ? ?associated with the nfsd open. ?d_obtain_alias() doesn't like
> ? ? ? ? ?this, so it creates a new DCACHE_DISCONECTED dentry and
> ? ? ? ? ?returns that instead.

This seems to be where the thing goes wrong. It isn't a hashed dentry at
this point here, so d_obtain_alias should not be making one.

I think the inode i_nlink games are much more appropriate on this side of
the equation, rather than the dput side (after all, d_obtain_alias is setting
up an alias for the inode).

Can you even put the link check into __d_find_alias?

- if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
+ if (S_ISDIR(inode->i_mode) || !inode->i_nlink ||
!d_unhashed(alias)) {

Something like that?

2010-11-29 19:32:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: lifetime of DCACHE_DISCONECTED dentries

On Mon, Nov 29, 2010 at 02:56:22PM +1100, Nick Piggin wrote:
> On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <[email protected]> wrote:
> > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <[email protected]> wrote:
> >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote:
> >>> On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <[email protected]> wrote:
> >
> >>> >        - putfh: look up the filehandle.  The only alias found for the
> >>> >          inode will be DCACHE_UNHASHED alias referenced by the filp
> >>> >          associated with the nfsd open.  d_obtain_alias() doesn't like
> >>> >          this, so it creates a new DCACHE_DISCONECTED dentry and
> >>> >          returns that instead.
> >>>
> >>> This seems to be where the thing goes wrong. It isn't a hashed dentry at
> >>> this point here, so d_obtain_alias should not be making one.
> >>
> >> Sounds sensible.  (But can you think of any actual bugs that will result
> >> from trying to add a new hashed dentry in this case?)
> >
> > Well, this one? :)
> >
> >
> >>> I think the inode i_nlink games are much more appropriate on this side of
> >>> the equation, rather than the dput side (after all, d_obtain_alias is setting
> >>> up an alias for the inode).
> >>>
> >>> Can you even put the link check into __d_find_alias?
> >>>
> >>> -               if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
> >>> +               if (S_ISDIR(inode->i_mode) || !inode->i_nlink ||
> >>> !d_unhashed(alias)) {
> >>>
> >>> Something like that?
> >>
> >> The immediate result of that would be for the close rpc (or any rpc's
> >> sent after the file was unlinked) to fail with ESTALE.
> >
> > Why is that? Seems like it would be a bug, because a hashed dentry may
> > be unhashed at any time concurrently to nfsd operation, so it should be
> > able to tolerate that so long as it has a ref on the inode?
>
> Ping? Did you work out why nfs fails with ESTALE in that case? It seems
> to work in my testing (and do the right thing with freeing the inode).

Bah, sorry, I read too quickly, got the sense of the test backwards, and
thought you were suggesting __d_find_alias() shouldn't return an alias
in the i_nlink == 0 case!

Yes, agreed, that should solve my problem.

But what's the reason for the d_unhashed() check now? Could we get rid
of it entirely?

--b.

2010-11-16 06:45:07

by Nicholas Piggin

[permalink] [raw]
Subject: Re: lifetime of DCACHE_DISCONECTED dentries

On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <[email protected]> wrote:
> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote:
>> On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <[email protected]> wrote:

>> > ? ? ? ?- putfh: look up the filehandle. ?The only alias found for the
>> > ? ? ? ? ?inode will be DCACHE_UNHASHED alias referenced by the filp
>> > ? ? ? ? ?associated with the nfsd open. ?d_obtain_alias() doesn't like
>> > ? ? ? ? ?this, so it creates a new DCACHE_DISCONECTED dentry and
>> > ? ? ? ? ?returns that instead.
>>
>> This seems to be where the thing goes wrong. It isn't a hashed dentry at
>> this point here, so d_obtain_alias should not be making one.
>
> Sounds sensible. ?(But can you think of any actual bugs that will result
> from trying to add a new hashed dentry in this case?)

Well, this one? :)


>> I think the inode i_nlink games are much more appropriate on this side of
>> the equation, rather than the dput side (after all, d_obtain_alias is setting
>> up an alias for the inode).
>>
>> Can you even put the link check into __d_find_alias?
>>
>> - ? ? ? ? ? ? ? if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
>> + ? ? ? ? ? ? ? if (S_ISDIR(inode->i_mode) || !inode->i_nlink ||
>> !d_unhashed(alias)) {
>>
>> Something like that?
>
> The immediate result of that would be for the close rpc (or any rpc's
> sent after the file was unlinked) to fail with ESTALE.

Why is that? Seems like it would be a bug, because a hashed dentry may
be unhashed at any time concurrently to nfsd operation, so it should be
able to tolerate that so long as it has a ref on the inode?


> But nfsd already holds an open file in this case, and you could argue
> that it should be using that from the start.

Yes.


> So, we could modify nfsd to add a hash mapping filehandles to the filp's
> that it knows about, and have nfsd consult that hash before calling
> dentry_to_fh.

Could be an option. It would be a pity not just be able to use the alias list.
What exactly goes wrong when it gets an unhashed alias back?

2010-11-30 01:00:19

by Nicholas Piggin

[permalink] [raw]
Subject: Re: lifetime of DCACHE_DISCONECTED dentries

On Tue, Nov 30, 2010 at 6:32 AM, J. Bruce Fields <[email protected]> wrote:
> On Mon, Nov 29, 2010 at 02:56:22PM +1100, Nick Piggin wrote:
>> On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <[email protected]> wrote:
>> > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <[email protected]> wrote:
>> >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote:
>> >>> On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <[email protected]> wrote:
>> >
>> >>> > ? ? ? ?- putfh: look up the filehandle. ?The only alias found for the
>> >>> > ? ? ? ? ?inode will be DCACHE_UNHASHED alias referenced by the filp
>> >>> > ? ? ? ? ?associated with the nfsd open. ?d_obtain_alias() doesn't like
>> >>> > ? ? ? ? ?this, so it creates a new DCACHE_DISCONECTED dentry and
>> >>> > ? ? ? ? ?returns that instead.
>> >>>
>> >>> This seems to be where the thing goes wrong. It isn't a hashed dentry at
>> >>> this point here, so d_obtain_alias should not be making one.
>> >>
>> >> Sounds sensible. ?(But can you think of any actual bugs that will result
>> >> from trying to add a new hashed dentry in this case?)
>> >
>> > Well, this one? :)
>> >
>> >
>> >>> I think the inode i_nlink games are much more appropriate on this side of
>> >>> the equation, rather than the dput side (after all, d_obtain_alias is setting
>> >>> up an alias for the inode).
>> >>>
>> >>> Can you even put the link check into __d_find_alias?
>> >>>
>> >>> - ? ? ? ? ? ? ? if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
>> >>> + ? ? ? ? ? ? ? if (S_ISDIR(inode->i_mode) || !inode->i_nlink ||
>> >>> !d_unhashed(alias)) {
>> >>>
>> >>> Something like that?
>> >>
>> >> The immediate result of that would be for the close rpc (or any rpc's
>> >> sent after the file was unlinked) to fail with ESTALE.
>> >
>> > Why is that? Seems like it would be a bug, because a hashed dentry may
>> > be unhashed at any time concurrently to nfsd operation, so it should be
>> > able to tolerate that so long as it has a ref on the inode?
>>
>> Ping? Did you work out why nfs fails with ESTALE in that case? It seems
>> to work in my testing (and do the right thing with freeing the inode).
>
> Bah, sorry, I read too quickly, got the sense of the test backwards, and
> thought you were suggesting __d_find_alias() shouldn't return an alias
> in the i_nlink == 0 case!
>
> Yes, agreed, that should solve my problem.

OK, good.

> But what's the reason for the d_unhashed() check now? ?Could we get rid
> of it entirely?

Well when the inode still has links I think we actually do want any new
references to go to hashed dentries. Definitely for d_splice_alias.

2010-11-30 18:39:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: lifetime of DCACHE_DISCONECTED dentries

On Tue, Nov 30, 2010 at 12:00:16PM +1100, Nick Piggin wrote:
> On Tue, Nov 30, 2010 at 6:32 AM, J. Bruce Fields <[email protected]> wrote:
> > On Mon, Nov 29, 2010 at 02:56:22PM +1100, Nick Piggin wrote:
> >> On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <[email protected]> wrote:
> >> > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <[email protected]> wrote:
> >> >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote:
> >> >>> On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <[email protected]> wrote:
> >> >
> >> >>> >        - putfh: look up the filehandle.  The only alias found for the
> >> >>> >          inode will be DCACHE_UNHASHED alias referenced by the filp
> >> >>> >          associated with the nfsd open.  d_obtain_alias() doesn't like
> >> >>> >          this, so it creates a new DCACHE_DISCONECTED dentry and
> >> >>> >          returns that instead.
> >> >>>
> >> >>> This seems to be where the thing goes wrong. It isn't a hashed dentry at
> >> >>> this point here, so d_obtain_alias should not be making one.
> >> >>
> >> >> Sounds sensible.  (But can you think of any actual bugs that will result
> >> >> from trying to add a new hashed dentry in this case?)
> >> >
> >> > Well, this one? :)
> >> >
> >> >
> >> >>> I think the inode i_nlink games are much more appropriate on this side of
> >> >>> the equation, rather than the dput side (after all, d_obtain_alias is setting
> >> >>> up an alias for the inode).
> >> >>>
> >> >>> Can you even put the link check into __d_find_alias?
> >> >>>
> >> >>> -               if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
> >> >>> +               if (S_ISDIR(inode->i_mode) || !inode->i_nlink ||
> >> >>> !d_unhashed(alias)) {
> >> >>>
> >> >>> Something like that?
> >> >>
> >> >> The immediate result of that would be for the close rpc (or any rpc's
> >> >> sent after the file was unlinked) to fail with ESTALE.
> >> >
> >> > Why is that? Seems like it would be a bug, because a hashed dentry may
> >> > be unhashed at any time concurrently to nfsd operation, so it should be
> >> > able to tolerate that so long as it has a ref on the inode?
> >>
> >> Ping? Did you work out why nfs fails with ESTALE in that case? It seems
> >> to work in my testing (and do the right thing with freeing the inode).
> >
> > Bah, sorry, I read too quickly, got the sense of the test backwards, and
> > thought you were suggesting __d_find_alias() shouldn't return an alias
> > in the i_nlink == 0 case!
> >
> > Yes, agreed, that should solve my problem.
>
> OK, good.
>
> > But what's the reason for the d_unhashed() check now?  Could we get rid
> > of it entirely?
>
> Well when the inode still has links I think we actually do want any new
> references to go to hashed dentries. Definitely for d_splice_alias.

OK, makes sense. Should we stick a changelog on it and pass it along to
someone?

--b.

2010-11-29 03:56:24

by Nicholas Piggin

[permalink] [raw]
Subject: Re: lifetime of DCACHE_DISCONECTED dentries

On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <[email protected]> wrote:
> On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <[email protected]> wrote:
>> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote:
>>> On Sat, Nov 13, 2010 at 5:43 AM, J. Bruce Fields <[email protected]> wrote:
>
>>> > ? ? ? ?- putfh: look up the filehandle. ?The only alias found for the
>>> > ? ? ? ? ?inode will be DCACHE_UNHASHED alias referenced by the filp
>>> > ? ? ? ? ?associated with the nfsd open. ?d_obtain_alias() doesn't like
>>> > ? ? ? ? ?this, so it creates a new DCACHE_DISCONECTED dentry and
>>> > ? ? ? ? ?returns that instead.
>>>
>>> This seems to be where the thing goes wrong. It isn't a hashed dentry at
>>> this point here, so d_obtain_alias should not be making one.
>>
>> Sounds sensible. ?(But can you think of any actual bugs that will result
>> from trying to add a new hashed dentry in this case?)
>
> Well, this one? :)
>
>
>>> I think the inode i_nlink games are much more appropriate on this side of
>>> the equation, rather than the dput side (after all, d_obtain_alias is setting
>>> up an alias for the inode).
>>>
>>> Can you even put the link check into __d_find_alias?
>>>
>>> - ? ? ? ? ? ? ? if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
>>> + ? ? ? ? ? ? ? if (S_ISDIR(inode->i_mode) || !inode->i_nlink ||
>>> !d_unhashed(alias)) {
>>>
>>> Something like that?
>>
>> The immediate result of that would be for the close rpc (or any rpc's
>> sent after the file was unlinked) to fail with ESTALE.
>
> Why is that? Seems like it would be a bug, because a hashed dentry may
> be unhashed at any time concurrently to nfsd operation, so it should be
> able to tolerate that so long as it has a ref on the inode?

Ping? Did you work out why nfs fails with ESTALE in that case? It seems
to work in my testing (and do the right thing with freeing the inode).

2010-12-17 18:00:25

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 2/2] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

From: J. Bruce Fields <[email protected]>

Without this patch

       client$ mount -tnfs4 server:/export/ /mnt/
       client$ tail -f /mnt/FOO
       ...
       server$ df -i /export
       server$ rm /export/FOO
       (^C the tail -f)
       server$ df -i /export
       server$ echo 2 >/proc/sys/vm/drop_caches
       server$ df -i /export

the df's will show that the inode is not freed on the filesystem until
the last step, when it could have been freed after killing the client's
tail -f.  On-disk data won't be deallocated either, leading to possible
spurious ENOSPC.

This occurs because when the client does the close, it arrives in a
compound with a putfh and a close, processed like:

       - putfh: look up the filehandle.  The only alias found for the
         inode will be DCACHE_UNHASHED alias referenced by the filp
         associated with the nfsd open.  d_obtain_alias() doesn't like
         this, so it creates a new DCACHE_DISCONECTED dentry and
         returns that instead.

Nick Piggin suggested fixing this by allowing d_obtain_alias to return
the unhashed dentry that is referenced by the filp, instead of making it
create a new dentry.

Leave __d_find_alias() alone to avoid changing behavior of other
callers.

Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
hashed or unhashed, disconnected or not, should work.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/dcache.c | 26 ++++++++++++++++++++++++--
1 files changed, 24 insertions(+), 2 deletions(-)

On Tue, Dec 14, 2010 at 05:01:02PM -0500, J. Bruce Fields wrote:
> On Mon, Dec 13, 2010 at 04:19:44PM +1100, Nick Piggin wrote:
> > Not sure where Al's hiding...
> >
> > But I would like to update the comments, and perhaps even a new
> > add a new function here (or new flag to __d_find_alias).
> >
> > AFAIKS, the callers are OK, however I suppose d_splice_alias and
> > d_materialise_unique should not have unlinked inodes at this point,
> > so at least a BUG_ON for them might be a good idea?
>
> That does sound safer. I'm pretty confused by the various
> __di_splice_alias callers. I'll go search through them and see if I can
> understand better....

I think a new __d_find_alias flag would just make it more confusing.
And it looks like all d_obtain_alias needs is something much simpler.
This works for me.

--b.

diff --git a/fs/dcache.c b/fs/dcache.c
index 5ed93cd..d5f9da4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1135,6 +1135,28 @@ static inline struct hlist_head *d_hash(struct dentry *parent,
return dentry_hashtable + (hash & D_HASHMASK);
}

+static struct dentry * __d_find_any_alias(struct inode *inode)
+{
+ struct dentry *alias;
+
+ if (list_empty(&inode->i_dentry))
+ return NULL;
+ alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias);
+ __dget_locked(alias);
+ return alias;
+}
+
+static struct dentry * d_find_any_alias(struct inode *inode)
+{
+ struct dentry *de;
+
+ spin_lock(&dcache_lock);
+ de = __d_find_alias(inode, 0);
+ spin_unlock(&dcache_lock);
+ return de;
+}
+
+
/**
* d_obtain_alias - find or allocate a dentry for a given inode
* @inode: inode to allocate the dentry for
@@ -1164,7 +1186,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
if (IS_ERR(inode))
return ERR_CAST(inode);

- res = d_find_alias(inode);
+ res = d_find_any_alias(inode);
if (res)
goto out_iput;

@@ -1176,7 +1198,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
tmp->d_parent = tmp; /* make sure dput doesn't croak */

spin_lock(&dcache_lock);
- res = __d_find_alias(inode, 0);
+ res = __d_find_any_alias(inode);
if (res) {
spin_unlock(&dcache_lock);
dput(tmp);
--
1.7.1


2010-12-19 14:53:13

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH 2/2] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Sun, Dec 19, 2010 at 3:16 AM, J. Bruce Fields <[email protected]> wrote:
> On Sat, Dec 18, 2010 at 01:01:18PM +1100, Nick Piggin wrote:
>> On Fri, Dec 17, 2010 at 01:00:23PM -0500, J. Bruce Fields wrote:
>> > From: J. Bruce Fields <[email protected]>

>> > On Tue, Dec 14, 2010 at 05:01:02PM -0500, J. Bruce Fields wrote:
>> > I think a new __d_find_alias flag would just make it more confusing.
>> > And it looks like all d_obtain_alias needs is something much simpler.
>> > This works for me.
>>
>> Well I don't really know if this is clearer. By convention,
>> __ prefix version should be same as non prefix version but just
>> take fewer locks. Why do you have the new d_find_any_alias()?
>
> Argh, apologies; with typo fixed.

OK that makes a lot more sense :)

> --b.
>
> commit 5a911af645aae9992d465d57c397237fb35d1f93
> Author: J. Bruce Fields <[email protected]>
> Date: ? Fri Dec 17 09:05:04 2010 -0500
>
> ? ?fs/dcache: allow __d_obtain_alias() to return unhashed dentries
>
> ? ?Without this patch
>
> ? ?? ? ? ?client$ mount -tnfs4 server:/export/ /mnt/
> ? ?? ? ? ?client$ tail -f /mnt/FOO
> ? ?? ? ? ?...
> ? ?? ? ? ?server$ df -i /export
> ? ?? ? ? ?server$ rm /export/FOO
> ? ?? ? ? ?(^C the tail -f)
> ? ?? ? ? ?server$ df -i /export
> ? ?? ? ? ?server$ echo 2 >/proc/sys/vm/drop_caches
> ? ?? ? ? ?server$ df -i /export
>
> ? ?the df's will show that the inode is not freed on the filesystem until
> ? ?the last step, when it could have been freed after killing the client's
> ? ?tail -f. ?On-disk data won't be deallocated either, leading to possible
> ? ?spurious ENOSPC.
>
> ? ?This occurs because when the client does the close, it arrives in a
> ? ?compound with a putfh and a close, processed like:
>
> ? ?? ? ? ?- putfh: look up the filehandle. ?The only alias found for the
> ? ?? ? ? ? ?inode will be DCACHE_UNHASHED alias referenced by the filp
> ? ?? ? ? ? ?associated with the nfsd open. ?d_obtain_alias() doesn't like
> ? ?? ? ? ? ?this, so it creates a new DCACHE_DISCONECTED dentry and
> ? ?? ? ? ? ?returns that instead.
>
> ? ?Nick Piggin suggested fixing this by allowing d_obtain_alias to return
> ? ?the unhashed dentry that is referenced by the filp, instead of making it
> ? ?create a new dentry.
>
> ? ?Leave __d_find_alias() alone to avoid changing behavior of other
> ? ?callers.
>
> ? ?Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
> ? ?hashed or unhashed, disconnected or not, should work.

Yeah, because when the call returns, the name could be concurrently
unhashed, but the nfsd operation must still work (either handle the race
or be unaffected by it) presumably. Therefore: hashed status should not
matter here.

Acked-by: Nick Piggin <[email protected]>

>
> ? ?Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 5ed93cd..5c014a5 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1135,6 +1135,28 @@ static inline struct hlist_head *d_hash(struct dentry *parent,
> ? ? ? ?return dentry_hashtable + (hash & D_HASHMASK);
> ?}
>
> +static struct dentry * __d_find_any_alias(struct inode *inode)
> +{
> + ? ? ? struct dentry *alias;
> +
> + ? ? ? if (list_empty(&inode->i_dentry))
> + ? ? ? ? ? ? ? return NULL;
> + ? ? ? alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias);
> + ? ? ? __dget_locked(alias);
> + ? ? ? return alias;
> +}
> +
> +static struct dentry * d_find_any_alias(struct inode *inode)
> +{
> + ? ? ? struct dentry *de;
> +
> + ? ? ? spin_lock(&dcache_lock);
> + ? ? ? de = __d_find_any_alias(inode);
> + ? ? ? spin_unlock(&dcache_lock);
> + ? ? ? return de;
> +}
> +
> +
> ?/**
> ?* d_obtain_alias - find or allocate a dentry for a given inode
> ?* @inode: inode to allocate the dentry for
> @@ -1164,7 +1186,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
> ? ? ? ?if (IS_ERR(inode))
> ? ? ? ? ? ? ? ?return ERR_CAST(inode);
>
> - ? ? ? res = d_find_alias(inode);
> + ? ? ? res = d_find_any_alias(inode);
> ? ? ? ?if (res)
> ? ? ? ? ? ? ? ?goto out_iput;
>
> @@ -1176,7 +1198,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
> ? ? ? ?tmp->d_parent = tmp; /* make sure dput doesn't croak */
>
> ? ? ? ?spin_lock(&dcache_lock);
> - ? ? ? res = __d_find_alias(inode, 0);
> + ? ? ? res = __d_find_any_alias(inode);
> ? ? ? ?if (res) {
> ? ? ? ? ? ? ? ?spin_unlock(&dcache_lock);
> ? ? ? ? ? ? ? ?dput(tmp);
>

2010-12-18 16:16:13

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/2] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Sat, Dec 18, 2010 at 01:01:18PM +1100, Nick Piggin wrote:
> On Fri, Dec 17, 2010 at 01:00:23PM -0500, J. Bruce Fields wrote:
> > From: J. Bruce Fields <[email protected]>
> >
> > Without this patch
> >
> >        client$ mount -tnfs4 server:/export/ /mnt/
> >        client$ tail -f /mnt/FOO
> >        ...
> >        server$ df -i /export
> >        server$ rm /export/FOO
> >        (^C the tail -f)
> >        server$ df -i /export
> >        server$ echo 2 >/proc/sys/vm/drop_caches
> >        server$ df -i /export
> >
> > the df's will show that the inode is not freed on the filesystem until
> > the last step, when it could have been freed after killing the client's
> > tail -f.  On-disk data won't be deallocated either, leading to possible
> > spurious ENOSPC.
> >
> > This occurs because when the client does the close, it arrives in a
> > compound with a putfh and a close, processed like:
> >
> >        - putfh: look up the filehandle.  The only alias found for the
> >          inode will be DCACHE_UNHASHED alias referenced by the filp
> >          associated with the nfsd open.  d_obtain_alias() doesn't like
> >          this, so it creates a new DCACHE_DISCONECTED dentry and
> >          returns that instead.
> >
> > Nick Piggin suggested fixing this by allowing d_obtain_alias to return
> > the unhashed dentry that is referenced by the filp, instead of making it
> > create a new dentry.
> >
> > Leave __d_find_alias() alone to avoid changing behavior of other
> > callers.
> >
> > Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
> > hashed or unhashed, disconnected or not, should work.
> >
> > Signed-off-by: J. Bruce Fields <[email protected]>
> > ---
> > fs/dcache.c | 26 ++++++++++++++++++++++++--
> > 1 files changed, 24 insertions(+), 2 deletions(-)
> >
> > On Tue, Dec 14, 2010 at 05:01:02PM -0500, J. Bruce Fields wrote:
> > > On Mon, Dec 13, 2010 at 04:19:44PM +1100, Nick Piggin wrote:
> > > > Not sure where Al's hiding...
> > > >
> > > > But I would like to update the comments, and perhaps even a new
> > > > add a new function here (or new flag to __d_find_alias).
> > > >
> > > > AFAIKS, the callers are OK, however I suppose d_splice_alias and
> > > > d_materialise_unique should not have unlinked inodes at this point,
> > > > so at least a BUG_ON for them might be a good idea?
> > >
> > > That does sound safer. I'm pretty confused by the various
> > > __di_splice_alias callers. I'll go search through them and see if I can
> > > understand better....
> >
> > I think a new __d_find_alias flag would just make it more confusing.
> > And it looks like all d_obtain_alias needs is something much simpler.
> > This works for me.
>
> Well I don't really know if this is clearer. By convention,
> __ prefix version should be same as non prefix version but just
> take fewer locks. Why do you have the new d_find_any_alias()?

Argh, apologies; with typo fixed.

--b.

commit 5a911af645aae9992d465d57c397237fb35d1f93
Author: J. Bruce Fields <[email protected]>
Date: Fri Dec 17 09:05:04 2010 -0500

fs/dcache: allow __d_obtain_alias() to return unhashed dentries

Without this patch

       client$ mount -tnfs4 server:/export/ /mnt/
       client$ tail -f /mnt/FOO
       ...
       server$ df -i /export
       server$ rm /export/FOO
       (^C the tail -f)
       server$ df -i /export
       server$ echo 2 >/proc/sys/vm/drop_caches
       server$ df -i /export

the df's will show that the inode is not freed on the filesystem until
the last step, when it could have been freed after killing the client's
tail -f.  On-disk data won't be deallocated either, leading to possible
spurious ENOSPC.

This occurs because when the client does the close, it arrives in a
compound with a putfh and a close, processed like:

       - putfh: look up the filehandle.  The only alias found for the
         inode will be DCACHE_UNHASHED alias referenced by the filp
         associated with the nfsd open.  d_obtain_alias() doesn't like
         this, so it creates a new DCACHE_DISCONECTED dentry and
         returns that instead.

Nick Piggin suggested fixing this by allowing d_obtain_alias to return
the unhashed dentry that is referenced by the filp, instead of making it
create a new dentry.

Leave __d_find_alias() alone to avoid changing behavior of other
callers.

Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
hashed or unhashed, disconnected or not, should work.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/dcache.c b/fs/dcache.c
index 5ed93cd..5c014a5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1135,6 +1135,28 @@ static inline struct hlist_head *d_hash(struct dentry *parent,
return dentry_hashtable + (hash & D_HASHMASK);
}

+static struct dentry * __d_find_any_alias(struct inode *inode)
+{
+ struct dentry *alias;
+
+ if (list_empty(&inode->i_dentry))
+ return NULL;
+ alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias);
+ __dget_locked(alias);
+ return alias;
+}
+
+static struct dentry * d_find_any_alias(struct inode *inode)
+{
+ struct dentry *de;
+
+ spin_lock(&dcache_lock);
+ de = __d_find_any_alias(inode);
+ spin_unlock(&dcache_lock);
+ return de;
+}
+
+
/**
* d_obtain_alias - find or allocate a dentry for a given inode
* @inode: inode to allocate the dentry for
@@ -1164,7 +1186,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
if (IS_ERR(inode))
return ERR_CAST(inode);

- res = d_find_alias(inode);
+ res = d_find_any_alias(inode);
if (res)
goto out_iput;

@@ -1176,7 +1198,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
tmp->d_parent = tmp; /* make sure dput doesn't croak */

spin_lock(&dcache_lock);
- res = __d_find_alias(inode, 0);
+ res = __d_find_any_alias(inode);
if (res) {
spin_unlock(&dcache_lock);
dput(tmp);

2010-12-18 02:01:25

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 2/2] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Fri, Dec 17, 2010 at 01:00:23PM -0500, J. Bruce Fields wrote:
> From: J. Bruce Fields <[email protected]>
>
> Without this patch
>
> ? ? ? ?client$ mount -tnfs4 server:/export/ /mnt/
> ? ? ? ?client$ tail -f /mnt/FOO
> ? ? ? ?...
> ? ? ? ?server$ df -i /export
> ? ? ? ?server$ rm /export/FOO
> ? ? ? ?(^C the tail -f)
> ? ? ? ?server$ df -i /export
> ? ? ? ?server$ echo 2 >/proc/sys/vm/drop_caches
> ? ? ? ?server$ df -i /export
>
> the df's will show that the inode is not freed on the filesystem until
> the last step, when it could have been freed after killing the client's
> tail -f. ?On-disk data won't be deallocated either, leading to possible
> spurious ENOSPC.
>
> This occurs because when the client does the close, it arrives in a
> compound with a putfh and a close, processed like:
>
> ? ? ? ?- putfh: look up the filehandle. ?The only alias found for the
> ? ? ? ? ?inode will be DCACHE_UNHASHED alias referenced by the filp
> ? ? ? ? ?associated with the nfsd open. ?d_obtain_alias() doesn't like
> ? ? ? ? ?this, so it creates a new DCACHE_DISCONECTED dentry and
> ? ? ? ? ?returns that instead.
>
> Nick Piggin suggested fixing this by allowing d_obtain_alias to return
> the unhashed dentry that is referenced by the filp, instead of making it
> create a new dentry.
>
> Leave __d_find_alias() alone to avoid changing behavior of other
> callers.
>
> Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
> hashed or unhashed, disconnected or not, should work.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/dcache.c | 26 ++++++++++++++++++++++++--
> 1 files changed, 24 insertions(+), 2 deletions(-)
>
> On Tue, Dec 14, 2010 at 05:01:02PM -0500, J. Bruce Fields wrote:
> > On Mon, Dec 13, 2010 at 04:19:44PM +1100, Nick Piggin wrote:
> > > Not sure where Al's hiding...
> > >
> > > But I would like to update the comments, and perhaps even a new
> > > add a new function here (or new flag to __d_find_alias).
> > >
> > > AFAIKS, the callers are OK, however I suppose d_splice_alias and
> > > d_materialise_unique should not have unlinked inodes at this point,
> > > so at least a BUG_ON for them might be a good idea?
> >
> > That does sound safer. I'm pretty confused by the various
> > __di_splice_alias callers. I'll go search through them and see if I can
> > understand better....
>
> I think a new __d_find_alias flag would just make it more confusing.
> And it looks like all d_obtain_alias needs is something much simpler.
> This works for me.

Well I don't really know if this is clearer. By convention,
__ prefix version should be same as non prefix version but just
take fewer locks. Why do you have the new d_find_any_alias()?



>
> --b.
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 5ed93cd..d5f9da4 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1135,6 +1135,28 @@ static inline struct hlist_head *d_hash(struct dentry *parent,
> return dentry_hashtable + (hash & D_HASHMASK);
> }
>
> +static struct dentry * __d_find_any_alias(struct inode *inode)
> +{
> + struct dentry *alias;
> +
> + if (list_empty(&inode->i_dentry))
> + return NULL;
> + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias);
> + __dget_locked(alias);
> + return alias;
> +}
> +
> +static struct dentry * d_find_any_alias(struct inode *inode)
> +{
> + struct dentry *de;
> +
> + spin_lock(&dcache_lock);
> + de = __d_find_alias(inode, 0);
> + spin_unlock(&dcache_lock);
> + return de;
> +}
> +
> +
> /**
> * d_obtain_alias - find or allocate a dentry for a given inode
> * @inode: inode to allocate the dentry for
> @@ -1164,7 +1186,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
> if (IS_ERR(inode))
> return ERR_CAST(inode);
>
> - res = d_find_alias(inode);
> + res = d_find_any_alias(inode);
> if (res)
> goto out_iput;
>
> @@ -1176,7 +1198,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
> tmp->d_parent = tmp; /* make sure dput doesn't croak */
>
> spin_lock(&dcache_lock);
> - res = __d_find_alias(inode, 0);
> + res = __d_find_any_alias(inode);
> if (res) {
> spin_unlock(&dcache_lock);
> dput(tmp);
> --
> 1.7.1

2010-12-13 05:19:50

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH] nfsd4: allow __d_obtain_alias() to return unhashed dentries

On Fri, Dec 03, 2010 at 05:33:27PM -0500, J. Bruce Fields wrote:
> From: J. Bruce Fields <[email protected]>
>
> Without this patch
>
> ? ? ? ?client$ mount -tnfs4 server:/export/ /mnt/
> ? ? ? ?client$ tail -f /mnt/FOO
> ? ? ? ?...
> ? ? ? ?server$ df -i /export
> ? ? ? ?server$ rm /export/FOO
> ? ? ? ?(^C the tail -f)
> ? ? ? ?server$ df -i /export
> ? ? ? ?server$ echo 2 >/proc/sys/vm/drop_caches
> ? ? ? ?server$ df -i /export
>
> the df's will show that the inode is not freed on the filesystem until
> the last step, when it could have been freed after killing the client's
> tail -f. ?On-disk data won't be deallocated either, leading to possible
> spurious ENOSPC.
>
> This occurs because when the client does the close, it arrives in a
> compound with a putfh and a close, processed like:
>
> ? ? ? ?- putfh: look up the filehandle. ?The only alias found for the
> ? ? ? ? ?inode will be DCACHE_UNHASHED alias referenced by the filp
> ? ? ? ? ?associated with the nfsd open. ?d_obtain_alias() doesn't like
> ? ? ? ? ?this, so it creates a new DCACHE_DISCONECTED dentry and
> ? ? ? ? ?returns that instead.
>
> Nick Piggin suggested fixing this by allowing d_obtain_alias to return
> the unhashed dentry that is referenced by the filp, instead of making it
> create a new dentry.
>
> Cc: Nick Piggin <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/dcache.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> On Tue, Nov 30, 2010 at 12:00:16PM +1100, Nick Piggin wrote:
> > On Tue, Nov 30, 2010 at 6:32 AM, J. Bruce Fields <[email protected]> wrote:
> > > On Mon, Nov 29, 2010 at 02:56:22PM +1100, Nick Piggin wrote:
> > >> On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <[email protected]> wrote:
> > >> > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <[email protected]> wrote:
> > >> >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote:
> > >> >>> Can you even put the link check into __d_find_alias?
> > >> >>>
> > >> >>> - ? ? ? ? ? ? ? if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
> > >> >>> + ? ? ? ? ? ? ? if (S_ISDIR(inode->i_mode) || !inode->i_nlink ||
> > >> >>> !d_unhashed(alias)) {
> > >> >>>
> > >> >>> Something like that?
> > >> >>
> > >> >> The immediate result of that would be for the close rpc (or any rpc's
> > >> >> sent after the file was unlinked) to fail with ESTALE.
> > >> >
> > >> > Why is that? Seems like it would be a bug, because a hashed dentry may
> > >> > be unhashed at any time concurrently to nfsd operation, so it should be
> > >> > able to tolerate that so long as it has a ref on the inode?
> > >>
> > >> Ping? Did you work out why nfs fails with ESTALE in that case? It seems
> > >> to work in my testing (and do the right thing with freeing the inode).
> > >
> > > Bah, sorry, I read too quickly, got the sense of the test backwards, and
> > > thought you were suggesting __d_find_alias() shouldn't return an alias
> > > in the i_nlink == 0 case!
> > >
> > > Yes, agreed, that should solve my problem.
> >
> > OK, good.
> >
> > > But what's the reason for the d_unhashed() check now? ?Could we get rid
> > > of it entirely?
> >
> > Well when the inode still has links I think we actually do want any new
> > references to go to hashed dentries. Definitely for d_splice_alias.
>
> So here's a version with a changelog; objections?

Not sure where Al's hiding...

But I would like to update the comments, and perhaps even a new
add a new function here (or new flag to __d_find_alias).

AFAIKS, the callers are OK, however I suppose d_splice_alias and
d_materialise_unique should not have unlinked inodes at this point,
so at least a BUG_ON for them might be a good idea?

>
> --b.
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 23702a9..afa8a0d 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -368,7 +368,7 @@ static struct dentry * __d_find_alias(struct inode *inode, int want_discon)
> next = tmp->next;
> prefetch(next);
> alias = list_entry(tmp, struct dentry, d_alias);
> - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
> + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || !d_unhashed(alias)) {
> if (IS_ROOT(alias) &&
> (alias->d_flags & DCACHE_DISCONNECTED))
> discon_alias = alias;

2010-12-27 23:46:51

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

From: J. Bruce Fields <[email protected]>

Without this patch, inodes are not promptly freed on last close of an
unlinked file by an nfs client:

client$ mount -tnfs4 server:/export/ /mnt/
client$ tail -f /mnt/FOO
...
server$ df -i /export
server$ rm /export/FOO
(^C the tail -f)
server$ df -i /export
server$ echo 2 >/proc/sys/vm/drop_caches
server$ df -i /export

the df's will show that the inode is not freed on the filesystem until
the last step, when it could have been freed after killing the client's
tail -f.  On-disk data won't be deallocated either, leading to possible
spurious ENOSPC.

This occurs because when the client does the close, it arrives in a
compound with a putfh and a close, processed like:

- putfh: look up the filehandle.  The only alias found for the
inode will be DCACHE_UNHASHED alias referenced by the filp
this, so it creates a new DCACHE_DISCONECTED dentry and
returns that instead.
- close: closes the existing filp, which is destroyed
immediately by dput() since it's DCACHE_UNHASHED.
- end of the compound: release the reference
to the current filehandle, and dput() the new
DCACHE_DISCONECTED dentry, which gets put on the
unused list instead of being destroyed immediately.

Nick Piggin suggested fixing this by allowing d_obtain_alias to return
the unhashed dentry that is referenced by the filp, instead of making it
create a new dentry.

Leave __d_find_alias() alone to avoid changing behavior of other
callers.

Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
hashed or unhashed, disconnected or not, should work.

Acked-by: Nick Piggin <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/dcache.c | 26 ++++++++++++++++++++++++--
1 files changed, 24 insertions(+), 2 deletions(-)

On Mon, Dec 20, 2010 at 01:53:10AM +1100, Nick Piggin wrote:
> On Sun, Dec 19, 2010 at 3:16 AM, J. Bruce Fields <[email protected]> wrote:
> > On Sat, Dec 18, 2010 at 01:01:18PM +1100, Nick Piggin wrote:
> >> Well I don't really know if this is clearer. By convention,
> >> __ prefix version should be same as non prefix version but just
> >> take fewer locks. Why do you have the new d_find_any_alias()?
> >
> > Argh, apologies; with typo fixed.
>
> OK that makes a lot more sense :)

Thanks very much for your help!--b.

diff --git a/fs/dcache.c b/fs/dcache.c
index 5ed93cd..5c014a5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1135,6 +1135,28 @@ static inline struct hlist_head *d_hash(struct dentry *parent,
return dentry_hashtable + (hash & D_HASHMASK);
}

+static struct dentry * __d_find_any_alias(struct inode *inode)
+{
+ struct dentry *alias;
+
+ if (list_empty(&inode->i_dentry))
+ return NULL;
+ alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias);
+ __dget_locked(alias);
+ return alias;
+}
+
+static struct dentry * d_find_any_alias(struct inode *inode)
+{
+ struct dentry *de;
+
+ spin_lock(&dcache_lock);
+ de = __d_find_any_alias(inode);
+ spin_unlock(&dcache_lock);
+ return de;
+}
+
+
/**
* d_obtain_alias - find or allocate a dentry for a given inode
* @inode: inode to allocate the dentry for
@@ -1164,7 +1186,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
if (IS_ERR(inode))
return ERR_CAST(inode);

- res = d_find_alias(inode);
+ res = d_find_any_alias(inode);
if (res)
goto out_iput;

@@ -1176,7 +1198,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
tmp->d_parent = tmp; /* make sure dput doesn't croak */

spin_lock(&dcache_lock);
- res = __d_find_alias(inode, 0);
+ res = __d_find_any_alias(inode);
if (res) {
spin_unlock(&dcache_lock);
dput(tmp);
--
1.7.3.4


2010-12-17 17:53:59

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH] fs/dcache: use standard list macro for d_find_alias

From: J. Bruce Fields <[email protected]>

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/dcache.c | 9 +--------
1 files changed, 1 insertions(+), 8 deletions(-)

While I'm here.... Am I overlooking some reason we're not doing this
the easy way?

diff --git a/fs/dcache.c b/fs/dcache.c
index 23702a9..5ed93cd 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -358,16 +358,9 @@ EXPORT_SYMBOL(dget_locked);

static struct dentry * __d_find_alias(struct inode *inode, int want_discon)
{
- struct list_head *head, *next, *tmp;
struct dentry *alias, *discon_alias=NULL;

- head = &inode->i_dentry;
- next = inode->i_dentry.next;
- while (next != head) {
- tmp = next;
- next = tmp->next;
- prefetch(next);
- alias = list_entry(tmp, struct dentry, d_alias);
+ list_for_each_entry(alias, &inode->i_dentry, d_alias) {
if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
if (IS_ROOT(alias) &&
(alias->d_flags & DCACHE_DISCONNECTED))
--
1.7.1


2010-12-14 22:01:12

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd4: allow __d_obtain_alias() to return unhashed dentries

On Mon, Dec 13, 2010 at 04:19:44PM +1100, Nick Piggin wrote:
> On Fri, Dec 03, 2010 at 05:33:27PM -0500, J. Bruce Fields wrote:
> > From: J. Bruce Fields <[email protected]>
> >
> > Without this patch
> >
> >        client$ mount -tnfs4 server:/export/ /mnt/
> >        client$ tail -f /mnt/FOO
> >        ...
> >        server$ df -i /export
> >        server$ rm /export/FOO
> >        (^C the tail -f)
> >        server$ df -i /export
> >        server$ echo 2 >/proc/sys/vm/drop_caches
> >        server$ df -i /export
> >
> > the df's will show that the inode is not freed on the filesystem until
> > the last step, when it could have been freed after killing the client's
> > tail -f.  On-disk data won't be deallocated either, leading to possible
> > spurious ENOSPC.
> >
> > This occurs because when the client does the close, it arrives in a
> > compound with a putfh and a close, processed like:
> >
> >        - putfh: look up the filehandle.  The only alias found for the
> >          inode will be DCACHE_UNHASHED alias referenced by the filp
> >          associated with the nfsd open.  d_obtain_alias() doesn't like
> >          this, so it creates a new DCACHE_DISCONECTED dentry and
> >          returns that instead.
> >
> > Nick Piggin suggested fixing this by allowing d_obtain_alias to return
> > the unhashed dentry that is referenced by the filp, instead of making it
> > create a new dentry.
> >
> > Cc: Nick Piggin <[email protected]>
> > Signed-off-by: J. Bruce Fields <[email protected]>
> > ---
> > fs/dcache.c | 2 +-
> > 1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > On Tue, Nov 30, 2010 at 12:00:16PM +1100, Nick Piggin wrote:
> > > On Tue, Nov 30, 2010 at 6:32 AM, J. Bruce Fields <[email protected]> wrote:
> > > > On Mon, Nov 29, 2010 at 02:56:22PM +1100, Nick Piggin wrote:
> > > >> On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <[email protected]> wrote:
> > > >> > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <[email protected]> wrote:
> > > >> >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote:
> > > >> >>> Can you even put the link check into __d_find_alias?
> > > >> >>>
> > > >> >>> -               if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
> > > >> >>> +               if (S_ISDIR(inode->i_mode) || !inode->i_nlink ||
> > > >> >>> !d_unhashed(alias)) {
> > > >> >>>
> > > >> >>> Something like that?
> > > >> >>
> > > >> >> The immediate result of that would be for the close rpc (or any rpc's
> > > >> >> sent after the file was unlinked) to fail with ESTALE.
> > > >> >
> > > >> > Why is that? Seems like it would be a bug, because a hashed dentry may
> > > >> > be unhashed at any time concurrently to nfsd operation, so it should be
> > > >> > able to tolerate that so long as it has a ref on the inode?
> > > >>
> > > >> Ping? Did you work out why nfs fails with ESTALE in that case? It seems
> > > >> to work in my testing (and do the right thing with freeing the inode).
> > > >
> > > > Bah, sorry, I read too quickly, got the sense of the test backwards, and
> > > > thought you were suggesting __d_find_alias() shouldn't return an alias
> > > > in the i_nlink == 0 case!
> > > >
> > > > Yes, agreed, that should solve my problem.
> > >
> > > OK, good.
> > >
> > > > But what's the reason for the d_unhashed() check now?  Could we get rid
> > > > of it entirely?
> > >
> > > Well when the inode still has links I think we actually do want any new
> > > references to go to hashed dentries. Definitely for d_splice_alias.
> >
> > So here's a version with a changelog; objections?
>
> Not sure where Al's hiding...
>
> But I would like to update the comments, and perhaps even a new
> add a new function here (or new flag to __d_find_alias).
>
> AFAIKS, the callers are OK, however I suppose d_splice_alias and
> d_materialise_unique should not have unlinked inodes at this point,
> so at least a BUG_ON for them might be a good idea?

That does sound safer. I'm pretty confused by the various
__di_splice_alias callers. I'll go search through them and see if I can
understand better....

--b.

>
> >
> > --b.
> >
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index 23702a9..afa8a0d 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -368,7 +368,7 @@ static struct dentry * __d_find_alias(struct inode *inode, int want_discon)
> > next = tmp->next;
> > prefetch(next);
> > alias = list_entry(tmp, struct dentry, d_alias);
> > - if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
> > + if (S_ISDIR(inode->i_mode) || !inode->i_nlink || !d_unhashed(alias)) {
> > if (IS_ROOT(alias) &&
> > (alias->d_flags & DCACHE_DISCONNECTED))
> > discon_alias = alias;

2010-12-03 22:33:29

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH] nfsd4: allow __d_obtain_alias() to return unhashed dentries

From: J. Bruce Fields <[email protected]>

Without this patch

       client$ mount -tnfs4 server:/export/ /mnt/
       client$ tail -f /mnt/FOO
       ...
       server$ df -i /export
       server$ rm /export/FOO
       (^C the tail -f)
       server$ df -i /export
       server$ echo 2 >/proc/sys/vm/drop_caches
       server$ df -i /export

the df's will show that the inode is not freed on the filesystem until
the last step, when it could have been freed after killing the client's
tail -f.  On-disk data won't be deallocated either, leading to possible
spurious ENOSPC.

This occurs because when the client does the close, it arrives in a
compound with a putfh and a close, processed like:

       - putfh: look up the filehandle.  The only alias found for the
         inode will be DCACHE_UNHASHED alias referenced by the filp
         associated with the nfsd open.  d_obtain_alias() doesn't like
         this, so it creates a new DCACHE_DISCONECTED dentry and
         returns that instead.

Nick Piggin suggested fixing this by allowing d_obtain_alias to return
the unhashed dentry that is referenced by the filp, instead of making it
create a new dentry.

Cc: Nick Piggin <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/dcache.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

On Tue, Nov 30, 2010 at 12:00:16PM +1100, Nick Piggin wrote:
> On Tue, Nov 30, 2010 at 6:32 AM, J. Bruce Fields <[email protected]> wrote:
> > On Mon, Nov 29, 2010 at 02:56:22PM +1100, Nick Piggin wrote:
> >> On Tue, Nov 16, 2010 at 5:45 PM, Nick Piggin <[email protected]> wrote:
> >> > On Tue, Nov 16, 2010 at 4:48 AM, J. Bruce Fields <[email protected]> wrote:
> >> >> On Sat, Nov 13, 2010 at 10:53:12PM +1100, Nick Piggin wrote:
> >> >>> Can you even put the link check into __d_find_alias?
> >> >>>
> >> >>> -               if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
> >> >>> +               if (S_ISDIR(inode->i_mode) || !inode->i_nlink ||
> >> >>> !d_unhashed(alias)) {
> >> >>>
> >> >>> Something like that?
> >> >>
> >> >> The immediate result of that would be for the close rpc (or any rpc's
> >> >> sent after the file was unlinked) to fail with ESTALE.
> >> >
> >> > Why is that? Seems like it would be a bug, because a hashed dentry may
> >> > be unhashed at any time concurrently to nfsd operation, so it should be
> >> > able to tolerate that so long as it has a ref on the inode?
> >>
> >> Ping? Did you work out why nfs fails with ESTALE in that case? It seems
> >> to work in my testing (and do the right thing with freeing the inode).
> >
> > Bah, sorry, I read too quickly, got the sense of the test backwards, and
> > thought you were suggesting __d_find_alias() shouldn't return an alias
> > in the i_nlink == 0 case!
> >
> > Yes, agreed, that should solve my problem.
>
> OK, good.
>
> > But what's the reason for the d_unhashed() check now?  Could we get rid
> > of it entirely?
>
> Well when the inode still has links I think we actually do want any new
> references to go to hashed dentries. Definitely for d_splice_alias.

So here's a version with a changelog; objections?

--b.

diff --git a/fs/dcache.c b/fs/dcache.c
index 23702a9..afa8a0d 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -368,7 +368,7 @@ static struct dentry * __d_find_alias(struct inode *inode, int want_discon)
next = tmp->next;
prefetch(next);
alias = list_entry(tmp, struct dentry, d_alias);
- if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
+ if (S_ISDIR(inode->i_mode) || !inode->i_nlink || !d_unhashed(alias)) {
if (IS_ROOT(alias) &&
(alias->d_flags & DCACHE_DISCONNECTED))
discon_alias = alias;
--
1.7.1


2011-01-18 22:08:21

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Wed, Jan 19, 2011 at 09:02:59AM +1100, Nick Piggin wrote:
> On Wed, Jan 19, 2011 at 7:45 AM, J. Bruce Fields <[email protected]> wrote:
> > @@ -1521,6 +1521,28 @@ struct dentry * d_alloc_root(struct inode * root_inode)
> >  }
> >  EXPORT_SYMBOL(d_alloc_root);
> >
> > +static struct dentry * __d_find_any_alias(struct inode *inode)
> > +{
> > +       struct dentry *alias;
> > +
> > +       if (list_empty(&inode->i_dentry))
> > +               return NULL;
> > +       alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias);
> > +       __dget(alias);
> > +       return alias;
> > +}
> > +
> > +static struct dentry * d_find_any_alias(struct inode *inode)
> > +{
> > +       struct dentry *de;
> > +
> > +       spin_lock(&inode->i_lock);
>
> Yes, i_dentry/d_alias is protected by i_lock, so it looks fine.

OK, thanks; I'm assuming it's OK to re-add your ack in that case. Al,
could you apply? (Or if this should go in through an nfsd tree, or some
other way, let me know.)

--b.

From: J. Bruce Fields <[email protected]>
Date: Fri, 17 Dec 2010 09:05:04 -0500
Subject: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

Without this patch, inodes are not promptly freed on last close of an
unlinked file by an nfs client:

       client$ mount -tnfs4 server:/export/ /mnt/
       client$ tail -f /mnt/FOO
       ...
       server$ df -i /export
       server$ rm /export/FOO
       (^C the tail -f)
       server$ df -i /export
       server$ echo 2 >/proc/sys/vm/drop_caches
       server$ df -i /export

the df's will show that the inode is not freed on the filesystem until
the last step, when it could have been freed after killing the client's
tail -f.  On-disk data won't be deallocated either, leading to possible
spurious ENOSPC.

This occurs because when the client does the close, it arrives in a
compound with a putfh and a close, processed like:

- putfh: look up the filehandle.  The only alias found for the
inode will be DCACHE_UNHASHED alias referenced by the filp
this, so it creates a new DCACHE_DISCONECTED dentry and
returns that instead.
- close: closes the existing filp, which is destroyed
immediately by dput() since it's DCACHE_UNHASHED.
- end of the compound: release the reference
to the current filehandle, and dput() the new
DCACHE_DISCONECTED dentry, which gets put on the
unused list instead of being destroyed immediately.

Nick Piggin suggested fixing this by allowing d_obtain_alias to return
the unhashed dentry that is referenced by the filp, instead of making it
create a new dentry.

Leave __d_find_alias() alone to avoid changing behavior of other
callers.

Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
hashed or unhashed, disconnected or not, should work.

Acked-by: Nick Piggin <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/dcache.c | 26 ++++++++++++++++++++++++--
1 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 9f493ee..2849258 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1521,6 +1521,28 @@ struct dentry * d_alloc_root(struct inode * root_inode)
}
EXPORT_SYMBOL(d_alloc_root);

+static struct dentry * __d_find_any_alias(struct inode *inode)
+{
+ struct dentry *alias;
+
+ if (list_empty(&inode->i_dentry))
+ return NULL;
+ alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias);
+ __dget(alias);
+ return alias;
+}
+
+static struct dentry * d_find_any_alias(struct inode *inode)
+{
+ struct dentry *de;
+
+ spin_lock(&inode->i_lock);
+ de = __d_find_any_alias(inode);
+ spin_unlock(&inode->i_lock);
+ return de;
+}
+
+
/**
* d_obtain_alias - find or allocate a dentry for a given inode
* @inode: inode to allocate the dentry for
@@ -1550,7 +1572,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
if (IS_ERR(inode))
return ERR_CAST(inode);

- res = d_find_alias(inode);
+ res = d_find_any_alias(inode);
if (res)
goto out_iput;

@@ -1563,7 +1585,7 @@ struct dentry *d_obtain_alias(struct inode *inode)


spin_lock(&inode->i_lock);
- res = __d_find_alias(inode, 0);
+ res = __d_find_any_alias(inode);
if (res) {
spin_unlock(&inode->i_lock);
dput(tmp);
--
1.7.1


2011-01-18 22:03:01

by Nicholas Piggin

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Wed, Jan 19, 2011 at 7:45 AM, J. Bruce Fields <[email protected]> wrote:

> Nick Piggin suggested fixing this by allowing d_obtain_alias to return
> the unhashed dentry that is referenced by the filp, instead of making it
> create a new dentry.
>
> Leave __d_find_alias() alone to avoid changing behavior of other
> callers.
>
> Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
> hashed or unhashed, disconnected or not, should work.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> ?fs/dcache.c | ? 26 ++++++++++++++++++++++++--
> ?1 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 9f493ee..2849258 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1521,6 +1521,28 @@ struct dentry * d_alloc_root(struct inode * root_inode)
> ?}
> ?EXPORT_SYMBOL(d_alloc_root);
>
> +static struct dentry * __d_find_any_alias(struct inode *inode)
> +{
> + ? ? ? struct dentry *alias;
> +
> + ? ? ? if (list_empty(&inode->i_dentry))
> + ? ? ? ? ? ? ? return NULL;
> + ? ? ? alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias);
> + ? ? ? __dget(alias);
> + ? ? ? return alias;
> +}
> +
> +static struct dentry * d_find_any_alias(struct inode *inode)
> +{
> + ? ? ? struct dentry *de;
> +
> + ? ? ? spin_lock(&inode->i_lock);

Yes, i_dentry/d_alias is protected by i_lock, so it looks fine.

2011-01-18 20:45:17

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

An updated version attempting to take into account the new locking.
(Nick, did I get this right?)

--b.

From: J. Bruce Fields <[email protected]>
Date: Fri, 17 Dec 2010 09:05:04 -0500
Subject: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

Without this patch, inodes are not promptly freed on last close of an
unlinked file by an nfs client:

       client$ mount -tnfs4 server:/export/ /mnt/
       client$ tail -f /mnt/FOO
       ...
       server$ df -i /export
       server$ rm /export/FOO
       (^C the tail -f)
       server$ df -i /export
       server$ echo 2 >/proc/sys/vm/drop_caches
       server$ df -i /export

the df's will show that the inode is not freed on the filesystem until
the last step, when it could have been freed after killing the client's
tail -f.  On-disk data won't be deallocated either, leading to possible
spurious ENOSPC.

This occurs because when the client does the close, it arrives in a
compound with a putfh and a close, processed like:

- putfh: look up the filehandle.  The only alias found for the
inode will be DCACHE_UNHASHED alias referenced by the filp
this, so it creates a new DCACHE_DISCONECTED dentry and
returns that instead.
- close: closes the existing filp, which is destroyed
immediately by dput() since it's DCACHE_UNHASHED.
- end of the compound: release the reference
to the current filehandle, and dput() the new
DCACHE_DISCONECTED dentry, which gets put on the
unused list instead of being destroyed immediately.

Nick Piggin suggested fixing this by allowing d_obtain_alias to return
the unhashed dentry that is referenced by the filp, instead of making it
create a new dentry.

Leave __d_find_alias() alone to avoid changing behavior of other
callers.

Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
hashed or unhashed, disconnected or not, should work.

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/dcache.c | 26 ++++++++++++++++++++++++--
1 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 9f493ee..2849258 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1521,6 +1521,28 @@ struct dentry * d_alloc_root(struct inode * root_inode)
}
EXPORT_SYMBOL(d_alloc_root);

+static struct dentry * __d_find_any_alias(struct inode *inode)
+{
+ struct dentry *alias;
+
+ if (list_empty(&inode->i_dentry))
+ return NULL;
+ alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias);
+ __dget(alias);
+ return alias;
+}
+
+static struct dentry * d_find_any_alias(struct inode *inode)
+{
+ struct dentry *de;
+
+ spin_lock(&inode->i_lock);
+ de = __d_find_any_alias(inode);
+ spin_unlock(&inode->i_lock);
+ return de;
+}
+
+
/**
* d_obtain_alias - find or allocate a dentry for a given inode
* @inode: inode to allocate the dentry for
@@ -1550,7 +1572,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
if (IS_ERR(inode))
return ERR_CAST(inode);

- res = d_find_alias(inode);
+ res = d_find_any_alias(inode);
if (res)
goto out_iput;

@@ -1563,7 +1585,7 @@ struct dentry *d_obtain_alias(struct inode *inode)


spin_lock(&inode->i_lock);
- res = __d_find_alias(inode, 0);
+ res = __d_find_any_alias(inode);
if (res) {
spin_unlock(&inode->i_lock);
dput(tmp);
--
1.7.1


2011-03-10 10:58:25

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Tue, Mar 08, 2011 at 01:13:20PM -0500, J. Bruce Fields wrote:

> Al, do you have this in your queue to look at? Need me to resend? Or
> should it take some other route?

It's in queue, but I'd be a lot happier if I understood what's going on
with __d_find_alias() elsewhere. Namely, in d_splice_alias(). The thing
is, unless I'm missing something we ought to use __d_find_any_alias()
there as well. Directories really, _really_ should not have more than
one alias. And what we get is really weird:
* find (the only) alias
* if it doesn't exist, create one (OK, no problem)
* if it does exist and happens to be IS_ROOT and DCACHE_DISCONNECTED,
move it (also fine, modulo rather useless BUG_ON() in there)
* if it does exist and either isn't IS_ROOT or not DCACHE_DISCONNECTED,
add a new alias and say nothing.

The last part looks very strange. I'd been staring at the history of this
function and I _think_ it's a rudiment of period when we used that stuff for
non-directories as well. It used to be directory-only; then Neil had
switched it to non-directories as well (in "Fix disconnected dentries on NFS
exports" back in 2004), adding want_discon argument to __d_find_alias() in
process and using it instead of open-coded equivalent of __d_find_any_alias().
Then, two years later, in "knfsd: close a race-opportunity in d_splice_alias"
he'd made that code directory-only again, at which point we arrived to the
current situation.

Neil, is there some reason I'm missing that makes __d_find_alias() right in
there? And do we still need the second argument in __d_find_alias()?

Anyway, Bruce's patch goes in tonight's push, but I'd love to see that
mess cleaned up as well or at least explained.

2011-03-11 04:08:01

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Thu, 10 Mar 2011 10:58:21 +0000 Al Viro <[email protected]> wrote:

> On Tue, Mar 08, 2011 at 01:13:20PM -0500, J. Bruce Fields wrote:
>
> > Al, do you have this in your queue to look at? Need me to resend? Or
> > should it take some other route?
>
> It's in queue, but I'd be a lot happier if I understood what's going on
> with __d_find_alias() elsewhere. Namely, in d_splice_alias(). The thing
> is, unless I'm missing something we ought to use __d_find_any_alias()
> there as well. Directories really, _really_ should not have more than
> one alias. And what we get is really weird:
> * find (the only) alias
> * if it doesn't exist, create one (OK, no problem)
> * if it does exist and happens to be IS_ROOT and DCACHE_DISCONNECTED,
> move it (also fine, modulo rather useless BUG_ON() in there)
> * if it does exist and either isn't IS_ROOT or not DCACHE_DISCONNECTED,
> add a new alias and say nothing.
>
> The last part looks very strange. I'd been staring at the history of this
> function and I _think_ it's a rudiment of period when we used that stuff for
> non-directories as well. It used to be directory-only; then Neil had
> switched it to non-directories as well (in "Fix disconnected dentries on NFS
> exports" back in 2004), adding want_discon argument to __d_find_alias() in
> process and using it instead of open-coded equivalent of __d_find_any_alias().
> Then, two years later, in "knfsd: close a race-opportunity in d_splice_alias"
> he'd made that code directory-only again, at which point we arrived to the
> current situation.
>
> Neil, is there some reason I'm missing that makes __d_find_alias() right in
> there? And do we still need the second argument in __d_find_alias()?
>
> Anyway, Bruce's patch goes in tonight's push, but I'd love to see that
> mess cleaned up as well or at least explained.


ahh.... that takes me back ... :-)

Thanks for identifying those patches Al - it saved me a lot of time.

What can I say.... the first patch is clearly wrong, for the reasons
mentioned in the second patch. And the second patch is wrong, partly
because of the reasons you give and partly because it re-introduces the
the bug that the first patch aimed to fix. All very sad really.

So what should it do.

1/ Originally DCACHE_DISCONNECTED didn't really mean much - it's presence
was only a hint, its absence was a strong statement.
If the flag is set, the dentry might not be linked to the root.
If it is clear, it definitely is link through to the root.
However I think it was used with stronger intent than that.

Now it seems to mean a little bit more: If it is set and the dentry
is hashed, then it must be on the sb->s_anon list. This is a significant
which I never noticed (I haven't been watching). Originally a
disconnected dentry would be attached (and hashed) to its parent. Then
that parent would get its own parent and so on until it was attached all
the way to the root. Only then would be start clearing
DCACHE_DISCONNECTED. It seems we must clear it sooner now... I wonder if
that is correct.

Anyway, the original idea was:
1/ when you create an anonymous dentry for nfsd, set DCACHE_DISCONNECTED.
2/ when nfsd sees DCACHE_DISCONNECTED it makes sure there is a
connection to the root, and only clears DCACHE_DISCONNECTED when
that connection is certain.
3/ no-one else should look at DCACHE_DISCONNECTED.

2/ directories always have at most one dentry. It might be IS_ROOT(), and
it might be DCACHE_DISCONNECTED.
non-directories can have multiple dentries, but (and here is where the
various patches changes things) if there is an IS_ROOT() dentry, it must
be the only hashed dentry. (IS_ROOT dentries are 'hashed' on the
s_anon chain).

3/ when nfsd finds an inode, it first tries to get a hashed dentry - any
hashed dentry. If necessary (and as a last resort) it will create an
anonymous dentry. For directories, an unhashed dentry will do too
(I think ... not 100% clear on that just now).
This is d_obtain_alias

4/ When nfsd wants the dentry to have a name - either because it is a
directory or because 'subtree_check' is set - and finds
DCACHE_DISCONNECTED is set is find the parent inode and makes sure it has
a non-DCACHE_DISCONNECTED entry (see 3 and 4). Once it finds that the
parent has a non-DCACHE_DISCONNECTED dentry, it clears DCACHE_DISCONNECTED
on the child.

So what do we have:

d_obtain_alias:
finds a hashed alias (or any alias for a directory) and if there
isn't one, creates an IS_ROOT alias, sets DCACHE_DISCONNECTED, and
adds to sb->s_anon.
This is used by various "get_parent" and "get_dentry" routines as you
would expect. ceph uses it in open_root_dentry which seems a bit
odd - maybe it is OK.

d_splice_alias
This is more complicated than it should be - with the complication in
__d_find_alias.
It's job is to see if the inode already has an IS_ROOT alias. If it
does, it should d_move that alias in place of the one it was given,
else instantiate the one it was given.
It doesn't have to look very hard. A directory will only have one
dentry, so that one either IS_ROOT or not.
If a non-directory has an IS_ROOT dentry, it will be the only hashed
dentry. So there is no question of preferring non-IS_ROOT dentries.
If there is one, it will be the only one.

I'm a bit uncomfortable that d_splice_alias drops all locks before
calling d_move. Normally when d_move is called, both directories are
locked in some way. In the case the source is not a directory bit
is the state of being anonymous. Theoretically two calls to
d_splice_alias on the same inode could try to d_move the same
anonymous dentry to two different places, which would be bad.
So I think some locking is needed there.

d_materialise_unique
Not sure what this is for... it looks a lot like d_splice_alias,
only it is a lot more complicated. The comments in the code and
in the original commit sound exactly like d_splice_alias.
One of these two should certainly go.

It clears DCACHE_DISCONNECTED which is probably the wrong thing to
do.

__d_drop assumes that if DCACHE_DISCONNECTED is set then the dentry
is hashed to s_anon. This is wrong - d_splice_alias can invalidate
this. It should be testing IS_ROOT(). And IS_ROOT will only ever
be hashed to s_anon.

So I'm suggesting the following patch as a start.
I'm still worried about the locking for d_move in d_splice_alias,
and I think we can discard d_materialise_unique, but I would
want to understand why it is different first.

NeilBrown

diff --git a/fs/dcache.c b/fs/dcache.c
index 2a6bd9a..a6f1884 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -327,7 +327,7 @@ static struct dentry *d_kill(struct dentry *dentry, struct dentry *parent)
void __d_drop(struct dentry *dentry)
{
if (!(dentry->d_flags & DCACHE_UNHASHED)) {
- if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED)) {
+ if (unlikely(IS_ROOT(dentry))) {
bit_spin_lock(0,
(unsigned long *)&dentry->d_sb->s_anon.first);
dentry->d_flags |= DCACHE_UNHASHED;
@@ -565,52 +565,29 @@ EXPORT_SYMBOL(dget_parent);
/**
* d_find_alias - grab a hashed alias of inode
* @inode: inode in question
- * @want_discon: flag, used by d_splice_alias, to request
- * that only a DISCONNECTED alias be returned.
+ * @want_discon: flag, used by d_splice_alias to request
+ * that only an IS_ROOT alias be returned.
*
* If inode has a hashed alias, or is a directory and has any alias,
* acquire the reference to alias and return it. Otherwise return NULL.
* Notice that if inode is a directory there can be only one alias and
* it can be unhashed only if it has no children, or if it is the root
* of a filesystem.
- *
- * If the inode has an IS_ROOT, DCACHE_DISCONNECTED alias, then prefer
- * any other hashed alias over that one unless @want_discon is set,
- * in which case only return an IS_ROOT, DCACHE_DISCONNECTED alias.
+ * If want_discon is set, then only return an IS_ROOT alias.
*/
static struct dentry *__d_find_alias(struct inode *inode, int want_discon)
{
- struct dentry *alias, *discon_alias;
+ struct dentry *alias;

-again:
- discon_alias = NULL;
list_for_each_entry(alias, &inode->i_dentry, d_alias) {
spin_lock(&alias->d_lock);
- if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
- if (IS_ROOT(alias) &&
- (alias->d_flags & DCACHE_DISCONNECTED)) {
- discon_alias = alias;
- } else if (!want_discon) {
- __dget_dlock(alias);
- spin_unlock(&alias->d_lock);
- return alias;
- }
- }
- spin_unlock(&alias->d_lock);
- }
- if (discon_alias) {
- alias = discon_alias;
- spin_lock(&alias->d_lock);
- if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
- if (IS_ROOT(alias) &&
- (alias->d_flags & DCACHE_DISCONNECTED)) {
- __dget_dlock(alias);
- spin_unlock(&alias->d_lock);
- return alias;
- }
+ if ((S_ISDIR(inode->i_mode) || !d_unhashed(alias))
+ && (!want_discon || IS_ROOT(alias))) {
+ __dget_dlock(alias);
+ spin_unlock(&alias->d_lock);
+ return alias;
}
spin_unlock(&alias->d_lock);
- goto again;
}
return NULL;
}
@@ -1599,9 +1576,9 @@ EXPORT_SYMBOL(d_obtain_alias);
* @inode: the inode which may have a disconnected dentry
* @dentry: a negative dentry which we want to point to the inode.
*
- * If inode is a directory and has a 'disconnected' dentry (i.e. IS_ROOT and
- * DCACHE_DISCONNECTED), then d_move that in place of the given dentry
- * and return it, else simply d_add the inode to the dentry and return NULL.
+ * If inode and has a 'disconnected' dentry (i.e. IS_ROOT()) then
+ * d_move that in place of the given dentry and return it, else simply
+ * d_add the inode to the dentry and return NULL.
*
* This is needed in the lookup routine of any filesystem that is exportable
* (via knfsd) so that we can build dcache paths to directories effectively.
@@ -1614,11 +1591,10 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
{
struct dentry *new = NULL;

- if (inode && S_ISDIR(inode->i_mode)) {
+ if (inode) {
spin_lock(&inode->i_lock);
new = __d_find_alias(inode, 1);
if (new) {
- BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
spin_unlock(&inode->i_lock);
security_d_instantiate(new, inode);
d_move(new, dentry);

2011-03-08 18:13:26

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Tue, Jan 18, 2011 at 05:08:17PM -0500, J. Bruce Fields wrote:
> On Wed, Jan 19, 2011 at 09:02:59AM +1100, Nick Piggin wrote:
> > On Wed, Jan 19, 2011 at 7:45 AM, J. Bruce Fields <[email protected]> wrote:
> > > @@ -1521,6 +1521,28 @@ struct dentry * d_alloc_root(struct inode * root_inode)
> > >  }
> > >  EXPORT_SYMBOL(d_alloc_root);
> > >
> > > +static struct dentry * __d_find_any_alias(struct inode *inode)
> > > +{
> > > +       struct dentry *alias;
> > > +
> > > +       if (list_empty(&inode->i_dentry))
> > > +               return NULL;
> > > +       alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias);
> > > +       __dget(alias);
> > > +       return alias;
> > > +}
> > > +
> > > +static struct dentry * d_find_any_alias(struct inode *inode)
> > > +{
> > > +       struct dentry *de;
> > > +
> > > +       spin_lock(&inode->i_lock);
> >
> > Yes, i_dentry/d_alias is protected by i_lock, so it looks fine.
>
> OK, thanks; I'm assuming it's OK to re-add your ack in that case. Al,
> could you apply? (Or if this should go in through an nfsd tree, or some
> other way, let me know.)

Al, do you have this in your queue to look at? Need me to resend? Or
should it take some other route?

--b.

>
> --b.
>
> From: J. Bruce Fields <[email protected]>
> Date: Fri, 17 Dec 2010 09:05:04 -0500
> Subject: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries
>
> Without this patch, inodes are not promptly freed on last close of an
> unlinked file by an nfs client:
>
>        client$ mount -tnfs4 server:/export/ /mnt/
>        client$ tail -f /mnt/FOO
>        ...
>        server$ df -i /export
>        server$ rm /export/FOO
>        (^C the tail -f)
>        server$ df -i /export
>        server$ echo 2 >/proc/sys/vm/drop_caches
>        server$ df -i /export
>
> the df's will show that the inode is not freed on the filesystem until
> the last step, when it could have been freed after killing the client's
> tail -f.  On-disk data won't be deallocated either, leading to possible
> spurious ENOSPC.
>
> This occurs because when the client does the close, it arrives in a
> compound with a putfh and a close, processed like:
>
> - putfh: look up the filehandle.  The only alias found for the
> inode will be DCACHE_UNHASHED alias referenced by the filp
> this, so it creates a new DCACHE_DISCONECTED dentry and
> returns that instead.
> - close: closes the existing filp, which is destroyed
> immediately by dput() since it's DCACHE_UNHASHED.
> - end of the compound: release the reference
> to the current filehandle, and dput() the new
> DCACHE_DISCONECTED dentry, which gets put on the
> unused list instead of being destroyed immediately.
>
> Nick Piggin suggested fixing this by allowing d_obtain_alias to return
> the unhashed dentry that is referenced by the filp, instead of making it
> create a new dentry.
>
> Leave __d_find_alias() alone to avoid changing behavior of other
> callers.
>
> Also nfsd doesn't need all the checks of __d_find_alias(); any dentry,
> hashed or unhashed, disconnected or not, should work.
>
> Acked-by: Nick Piggin <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/dcache.c | 26 ++++++++++++++++++++++++--
> 1 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 9f493ee..2849258 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1521,6 +1521,28 @@ struct dentry * d_alloc_root(struct inode * root_inode)
> }
> EXPORT_SYMBOL(d_alloc_root);
>
> +static struct dentry * __d_find_any_alias(struct inode *inode)
> +{
> + struct dentry *alias;
> +
> + if (list_empty(&inode->i_dentry))
> + return NULL;
> + alias = list_first_entry(&inode->i_dentry, struct dentry, d_alias);
> + __dget(alias);
> + return alias;
> +}
> +
> +static struct dentry * d_find_any_alias(struct inode *inode)
> +{
> + struct dentry *de;
> +
> + spin_lock(&inode->i_lock);
> + de = __d_find_any_alias(inode);
> + spin_unlock(&inode->i_lock);
> + return de;
> +}
> +
> +
> /**
> * d_obtain_alias - find or allocate a dentry for a given inode
> * @inode: inode to allocate the dentry for
> @@ -1550,7 +1572,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
> if (IS_ERR(inode))
> return ERR_CAST(inode);
>
> - res = d_find_alias(inode);
> + res = d_find_any_alias(inode);
> if (res)
> goto out_iput;
>
> @@ -1563,7 +1585,7 @@ struct dentry *d_obtain_alias(struct inode *inode)
>
>
> spin_lock(&inode->i_lock);
> - res = __d_find_alias(inode, 0);
> + res = __d_find_any_alias(inode);
> if (res) {
> spin_unlock(&inode->i_lock);
> dput(tmp);
> --
> 1.7.1
>

2012-02-14 17:03:10

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote:
> On Thu, 10 Mar 2011 10:58:21 +0000 Al Viro <[email protected]> wrote:
>
> > On Tue, Mar 08, 2011 at 01:13:20PM -0500, J. Bruce Fields wrote:
> >
> > > Al, do you have this in your queue to look at? Need me to resend? Or
> > > should it take some other route?
> >
> > It's in queue, but I'd be a lot happier if I understood what's going on
> > with __d_find_alias() elsewhere. Namely, in d_splice_alias(). The thing
> > is, unless I'm missing something we ought to use __d_find_any_alias()
> > there as well. Directories really, _really_ should not have more than
> > one alias. And what we get is really weird:
> > * find (the only) alias
> > * if it doesn't exist, create one (OK, no problem)
> > * if it does exist and happens to be IS_ROOT and DCACHE_DISCONNECTED,
> > move it (also fine, modulo rather useless BUG_ON() in there)
> > * if it does exist and either isn't IS_ROOT or not DCACHE_DISCONNECTED,
> > add a new alias and say nothing.
> >
> > The last part looks very strange. I'd been staring at the history of this
> > function and I _think_ it's a rudiment of period when we used that stuff for
> > non-directories as well. It used to be directory-only; then Neil had
> > switched it to non-directories as well (in "Fix disconnected dentries on NFS
> > exports" back in 2004), adding want_discon argument to __d_find_alias() in
> > process and using it instead of open-coded equivalent of __d_find_any_alias().
> > Then, two years later, in "knfsd: close a race-opportunity in d_splice_alias"
> > he'd made that code directory-only again, at which point we arrived to the
> > current situation.
> >
> > Neil, is there some reason I'm missing that makes __d_find_alias() right in
> > there? And do we still need the second argument in __d_find_alias()?
> >
> > Anyway, Bruce's patch goes in tonight's push, but I'd love to see that
> > mess cleaned up as well or at least explained.

Nothing like seeing somebody actually run across a bug here to motivate
getting back to this....

So, like Al I'm wondering whether the __d_find_alias in d_splice_alias
should be a __d_find_any_alias.

Disclaimer: the bug manifested in rhel5, and I haven't looked closely at
upstream yet, though it seems like it would have the same problem.

In the particular case I'm seeing, the directory already has an alias
that is UNHASHED (but not DISCONNECTED). So d_splice_alias adds a
second alias.

(And we notice because a rename happens to pick up one of each, and the
rename locking then tries to take the parent's i_mutex twice (since the
rename code think it's dealing with a cross-directory rename)).

Using __d_find_any_alias seems to fix the problem.

Would rehashing an UNHASHED dentry here violate any important assumptions?

> I'm a bit uncomfortable that d_splice_alias drops all locks before
> calling d_move. Normally when d_move is called, both directories are
> locked in some way. In the case the source is not a directory bit
> is the state of being anonymous. Theoretically two calls to
> d_splice_alias on the same inode could try to d_move the same
> anonymous dentry to two different places, which would be bad.
> So I think some locking is needed there.

I think d_splice_alias is only ever called from lookup, under the
parent's i_mutex?

So if we have two tasks in the d_move here, then the two callers must be
doing lookups in two different directories, and holding the i_mutex on
both of them.

The directory can't have been renamed while either caller was holding
the i_mutex, so this is really a directory that's returned on lookup
from two different parents.

That sounds like a bug already. Could it happen on a corrupted
filesystem maybe?

--b.

2012-02-20 02:55:50

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Thu, 16 Feb 2012 17:30:11 -0500 "J. Bruce Fields" <[email protected]>
wrote:

> On Thu, Feb 16, 2012 at 02:06:03PM +1100, NeilBrown wrote:
> > I was going ask how you managed to get an 'unhashed' dentry which was not
> > DISCONNECTED, and belonged to a directory that could be the subject of
> > d_splice_alias (that implies it has a name).
> >
> > The bug sounds like a race between lookup and rmdir, which should be
> > prevented by i_mutex.
> >
> > I think that using __d_find_any_alias would just be papering over the
> > problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias.
>
> Looking through the latest upstream code, I can't come up with another
> obvious reproducer.
>
> But I also can't see the code making any particular effort to ensure
> that dentries are removed from inode's alias lists at the same time
> they're unhashed.
>
> E.g., trace up through the callers of d_drop/__d_drop and try to
> convince yourselves that they all end up removing the dentry from the
> alias list.
>
> Can you see any reason why the following would actually create a
> problem?

No, I don't think that would cause problems, so it is probably a good clean
up and as Peng says it means we can remove the want_discon arg as well.

However I cannot help thinking that something else must be going wrong before
we get to this point.
When you rmdir a directory it must be empty and it will never be linked again.
So how does 'lookup' find the inode and want to attach a dentry to it?

So I still think this is just papering over some other problem.

I think that looking at when aliases are removed is missing the point.

A directory can only have one name so it can only have one dentry.
If that dentry gets unhashed, that is because the directory was deleted. So
now it must have zero names. So there is no way that lookup can possibly
find it, so there is no way that d_splice_alias can be asked to attach an
alias to it.


NeilBrown


>
> --b.
>
> commit fcfef6b7319c5d19ea5064317528ff994343b011
> Author: J. Bruce Fields <[email protected]>
> Date: Mon Feb 13 13:38:33 2012 -0500
>
> exports: stop d_splice_alias creating directory aliases
>
> A directory should never have more than one dentry pointing to it.
>
> But d_splice_alias() does so in the case of a directory with an
> already-existing non-DISCONNECTED dentry.
>
> Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0,
> this could cause an nfsd deadlock like this:
>
> - Somebody attempts to remove a non-empty directory.
> - The dentry_unhash() in vfs_rmdir() unhashes the dentry
> pointing to the non-empty directory.
> - ->rmdir() then fails with -ENOTEMPTY
> - Before the vfs_rmdir() caller reaches dput(), an nfsd process
> in rename looks up the directory by filehandle; at the end of
> that lookup, this dentry is found by d_alloc_anon(), and a
> reference is taken on it, preventing dput() from removing it.
> - A regular lookup of the directory calls d_splice_alias(),
> finds only an unhashed (not a DISCONNECTED) dentry, and
> insteads adds a new one, so the directory now has two
> dentries.
> - The nfsd process in rename, which was previously looking up
> the source directory of the rename, now looks up the target
> directory (which is the same), and gets the dentry newly
> created by the previous lookup.
> - The rename, seeing two different dentries, assumes this is a
> cross-directory rename and attempts to take the i_mutex on the
> directory twice.
>
> I don't see as obvious a reproducer now, but I also don't see the
> existing code taking care to remove dentries from the alias list
> whenever they're unhashed.
>
> It therefore seems safest to allow d_splice_alias to use any dentry it
> finds.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index f68e193..1fd2256 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1602,9 +1602,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
>
> if (inode && S_ISDIR(inode->i_mode)) {
> spin_lock(&inode->i_lock);
> - new = __d_find_alias(inode, 1);
> + new = __d_find_any_alias(inode);
> if (new) {
> - BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
> spin_unlock(&inode->i_lock);
> security_d_instantiate(new, inode);
> d_move(new, dentry);


Attachments:
signature.asc (828.00 B)

2012-02-16 03:06:14

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Wed, 15 Feb 2012 11:56:33 -0500 "J. Bruce Fields" <[email protected]>
wrote:

> On Tue, Feb 14, 2012 at 12:03:00PM -0500, J. Bruce Fields wrote:
> > On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote:
> > > On Thu, 10 Mar 2011 10:58:21 +0000 Al Viro <[email protected]> wrote:
> > >
> > > > On Tue, Mar 08, 2011 at 01:13:20PM -0500, J. Bruce Fields wrote:
> > > >
> > > > > Al, do you have this in your queue to look at? Need me to resend? Or
> > > > > should it take some other route?
> > > >
> > > > It's in queue, but I'd be a lot happier if I understood what's going on
> > > > with __d_find_alias() elsewhere. Namely, in d_splice_alias(). The thing
> > > > is, unless I'm missing something we ought to use __d_find_any_alias()
> > > > there as well. Directories really, _really_ should not have more than
> > > > one alias. And what we get is really weird:
> > > > * find (the only) alias
> > > > * if it doesn't exist, create one (OK, no problem)
> > > > * if it does exist and happens to be IS_ROOT and DCACHE_DISCONNECTED,
> > > > move it (also fine, modulo rather useless BUG_ON() in there)
> > > > * if it does exist and either isn't IS_ROOT or not DCACHE_DISCONNECTED,
> > > > add a new alias and say nothing.
> > > >
> > > > The last part looks very strange. I'd been staring at the history of this
> > > > function and I _think_ it's a rudiment of period when we used that stuff for
> > > > non-directories as well. It used to be directory-only; then Neil had
> > > > switched it to non-directories as well (in "Fix disconnected dentries on NFS
> > > > exports" back in 2004), adding want_discon argument to __d_find_alias() in
> > > > process and using it instead of open-coded equivalent of __d_find_any_alias().
> > > > Then, two years later, in "knfsd: close a race-opportunity in d_splice_alias"
> > > > he'd made that code directory-only again, at which point we arrived to the
> > > > current situation.
> > > >
> > > > Neil, is there some reason I'm missing that makes __d_find_alias() right in
> > > > there? And do we still need the second argument in __d_find_alias()?
> > > >
> > > > Anyway, Bruce's patch goes in tonight's push, but I'd love to see that
> > > > mess cleaned up as well or at least explained.
> >
> > Nothing like seeing somebody actually run across a bug here to motivate
> > getting back to this....
> >
> > So, like Al I'm wondering whether the __d_find_alias in d_splice_alias
> > should be a __d_find_any_alias.
> >
> > Disclaimer: the bug manifested in rhel5, and I haven't looked closely at
> > upstream yet, though it seems like it would have the same problem.
> >
> > In the particular case I'm seeing, the directory already has an alias
> > that is UNHASHED (but not DISCONNECTED). So d_splice_alias adds a
> > second alias.
>
> The reproducer I was using against rhel5 isn't going to work against
> upstream, because it depended on the dentry_unhash() call in vfs_rmdir()
> to create that UNHASHED dentry.
>
> Though I think there's still code that can leave UNHASHED dentries
> around still on the alias list. I'll look some more....

I was going ask how you managed to get an 'unhashed' dentry which was not
DISCONNECTED, and belonged to a directory that could be the subject of
d_splice_alias (that implies it has a name).

The bug sounds like a race between lookup and rmdir, which should be
prevented by i_mutex.

I think that using __d_find_any_alias would just be papering over the
problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias.

NeilBrown


Attachments:
signature.asc (828.00 B)

2012-02-16 22:30:19

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Thu, Feb 16, 2012 at 02:06:03PM +1100, NeilBrown wrote:
> I was going ask how you managed to get an 'unhashed' dentry which was not
> DISCONNECTED, and belonged to a directory that could be the subject of
> d_splice_alias (that implies it has a name).
>
> The bug sounds like a race between lookup and rmdir, which should be
> prevented by i_mutex.
>
> I think that using __d_find_any_alias would just be papering over the
> problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias.

Looking through the latest upstream code, I can't come up with another
obvious reproducer.

But I also can't see the code making any particular effort to ensure
that dentries are removed from inode's alias lists at the same time
they're unhashed.

E.g., trace up through the callers of d_drop/__d_drop and try to
convince yourselves that they all end up removing the dentry from the
alias list.

Can you see any reason why the following would actually create a
problem?

--b.

commit fcfef6b7319c5d19ea5064317528ff994343b011
Author: J. Bruce Fields <[email protected]>
Date: Mon Feb 13 13:38:33 2012 -0500

exports: stop d_splice_alias creating directory aliases

A directory should never have more than one dentry pointing to it.

But d_splice_alias() does so in the case of a directory with an
already-existing non-DISCONNECTED dentry.

Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0,
this could cause an nfsd deadlock like this:

- Somebody attempts to remove a non-empty directory.
- The dentry_unhash() in vfs_rmdir() unhashes the dentry
pointing to the non-empty directory.
- ->rmdir() then fails with -ENOTEMPTY
- Before the vfs_rmdir() caller reaches dput(), an nfsd process
in rename looks up the directory by filehandle; at the end of
that lookup, this dentry is found by d_alloc_anon(), and a
reference is taken on it, preventing dput() from removing it.
- A regular lookup of the directory calls d_splice_alias(),
finds only an unhashed (not a DISCONNECTED) dentry, and
insteads adds a new one, so the directory now has two
dentries.
- The nfsd process in rename, which was previously looking up
the source directory of the rename, now looks up the target
directory (which is the same), and gets the dentry newly
created by the previous lookup.
- The rename, seeing two different dentries, assumes this is a
cross-directory rename and attempts to take the i_mutex on the
directory twice.

I don't see as obvious a reproducer now, but I also don't see the
existing code taking care to remove dentries from the alias list
whenever they're unhashed.

It therefore seems safest to allow d_splice_alias to use any dentry it
finds.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/dcache.c b/fs/dcache.c
index f68e193..1fd2256 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1602,9 +1602,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)

if (inode && S_ISDIR(inode->i_mode)) {
spin_lock(&inode->i_lock);
- new = __d_find_alias(inode, 1);
+ new = __d_find_any_alias(inode);
if (new) {
- BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
spin_unlock(&inode->i_lock);
security_d_instantiate(new, inode);
d_move(new, dentry);

2012-02-16 11:51:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Thu, Feb 16, 2012 at 02:06:03PM +1100, NeilBrown wrote:
> On Wed, 15 Feb 2012 11:56:33 -0500 "J. Bruce Fields" <[email protected]>
> wrote:
>
> > On Tue, Feb 14, 2012 at 12:03:00PM -0500, J. Bruce Fields wrote:
> > > On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote:
> > > > On Thu, 10 Mar 2011 10:58:21 +0000 Al Viro <[email protected]> wrote:
> > > >
> > > > > On Tue, Mar 08, 2011 at 01:13:20PM -0500, J. Bruce Fields wrote:
> > > > >
> > > > > > Al, do you have this in your queue to look at? Need me to resend? Or
> > > > > > should it take some other route?
> > > > >
> > > > > It's in queue, but I'd be a lot happier if I understood what's going on
> > > > > with __d_find_alias() elsewhere. Namely, in d_splice_alias(). The thing
> > > > > is, unless I'm missing something we ought to use __d_find_any_alias()
> > > > > there as well. Directories really, _really_ should not have more than
> > > > > one alias. And what we get is really weird:
> > > > > * find (the only) alias
> > > > > * if it doesn't exist, create one (OK, no problem)
> > > > > * if it does exist and happens to be IS_ROOT and DCACHE_DISCONNECTED,
> > > > > move it (also fine, modulo rather useless BUG_ON() in there)
> > > > > * if it does exist and either isn't IS_ROOT or not DCACHE_DISCONNECTED,
> > > > > add a new alias and say nothing.
> > > > >
> > > > > The last part looks very strange. I'd been staring at the history of this
> > > > > function and I _think_ it's a rudiment of period when we used that stuff for
> > > > > non-directories as well. It used to be directory-only; then Neil had
> > > > > switched it to non-directories as well (in "Fix disconnected dentries on NFS
> > > > > exports" back in 2004), adding want_discon argument to __d_find_alias() in
> > > > > process and using it instead of open-coded equivalent of __d_find_any_alias().
> > > > > Then, two years later, in "knfsd: close a race-opportunity in d_splice_alias"
> > > > > he'd made that code directory-only again, at which point we arrived to the
> > > > > current situation.
> > > > >
> > > > > Neil, is there some reason I'm missing that makes __d_find_alias() right in
> > > > > there? And do we still need the second argument in __d_find_alias()?
> > > > >
> > > > > Anyway, Bruce's patch goes in tonight's push, but I'd love to see that
> > > > > mess cleaned up as well or at least explained.
> > >
> > > Nothing like seeing somebody actually run across a bug here to motivate
> > > getting back to this....
> > >
> > > So, like Al I'm wondering whether the __d_find_alias in d_splice_alias
> > > should be a __d_find_any_alias.
> > >
> > > Disclaimer: the bug manifested in rhel5, and I haven't looked closely at
> > > upstream yet, though it seems like it would have the same problem.
> > >
> > > In the particular case I'm seeing, the directory already has an alias
> > > that is UNHASHED (but not DISCONNECTED). So d_splice_alias adds a
> > > second alias.
> >
> > The reproducer I was using against rhel5 isn't going to work against
> > upstream, because it depended on the dentry_unhash() call in vfs_rmdir()
> > to create that UNHASHED dentry.
> >
> > Though I think there's still code that can leave UNHASHED dentries
> > around still on the alias list. I'll look some more....
>
> I was going ask how you managed to get an 'unhashed' dentry which was not
> DISCONNECTED, and belonged to a directory that could be the subject of
> d_splice_alias (that implies it has a name).
>
> The bug sounds like a race between lookup and rmdir, which should be
> prevented by i_mutex.

The rhel5 race is a bit more complicated than that: a filehandle lookup
has to come in to grab the unhashed dentry before rmdir drops the parent
i_mutex, and then the lookup comes along later.

I suspect that this is actually expected, and that it's not safe for us
to assume that directories will lose any UNHASHED dentries before
relevant locks are dropped.

I'm seeing something similar happening on an upstream kernel, so I'll
try to figure out where that's coming from now....

> I think that using __d_find_any_alias would just be papering over the
> problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias.

Right, I throw away that BUG_ON as well. Tested on rhel5 only.

--b.

2012-02-15 16:56:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Tue, Feb 14, 2012 at 12:03:00PM -0500, J. Bruce Fields wrote:
> On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote:
> > On Thu, 10 Mar 2011 10:58:21 +0000 Al Viro <[email protected]> wrote:
> >
> > > On Tue, Mar 08, 2011 at 01:13:20PM -0500, J. Bruce Fields wrote:
> > >
> > > > Al, do you have this in your queue to look at? Need me to resend? Or
> > > > should it take some other route?
> > >
> > > It's in queue, but I'd be a lot happier if I understood what's going on
> > > with __d_find_alias() elsewhere. Namely, in d_splice_alias(). The thing
> > > is, unless I'm missing something we ought to use __d_find_any_alias()
> > > there as well. Directories really, _really_ should not have more than
> > > one alias. And what we get is really weird:
> > > * find (the only) alias
> > > * if it doesn't exist, create one (OK, no problem)
> > > * if it does exist and happens to be IS_ROOT and DCACHE_DISCONNECTED,
> > > move it (also fine, modulo rather useless BUG_ON() in there)
> > > * if it does exist and either isn't IS_ROOT or not DCACHE_DISCONNECTED,
> > > add a new alias and say nothing.
> > >
> > > The last part looks very strange. I'd been staring at the history of this
> > > function and I _think_ it's a rudiment of period when we used that stuff for
> > > non-directories as well. It used to be directory-only; then Neil had
> > > switched it to non-directories as well (in "Fix disconnected dentries on NFS
> > > exports" back in 2004), adding want_discon argument to __d_find_alias() in
> > > process and using it instead of open-coded equivalent of __d_find_any_alias().
> > > Then, two years later, in "knfsd: close a race-opportunity in d_splice_alias"
> > > he'd made that code directory-only again, at which point we arrived to the
> > > current situation.
> > >
> > > Neil, is there some reason I'm missing that makes __d_find_alias() right in
> > > there? And do we still need the second argument in __d_find_alias()?
> > >
> > > Anyway, Bruce's patch goes in tonight's push, but I'd love to see that
> > > mess cleaned up as well or at least explained.
>
> Nothing like seeing somebody actually run across a bug here to motivate
> getting back to this....
>
> So, like Al I'm wondering whether the __d_find_alias in d_splice_alias
> should be a __d_find_any_alias.
>
> Disclaimer: the bug manifested in rhel5, and I haven't looked closely at
> upstream yet, though it seems like it would have the same problem.
>
> In the particular case I'm seeing, the directory already has an alias
> that is UNHASHED (but not DISCONNECTED). So d_splice_alias adds a
> second alias.

The reproducer I was using against rhel5 isn't going to work against
upstream, because it depended on the dentry_unhash() call in vfs_rmdir()
to create that UNHASHED dentry.

Though I think there's still code that can leave UNHASHED dentries
around still on the alias list. I'll look some more....

--b.


>
> (And we notice because a rename happens to pick up one of each, and the
> rename locking then tries to take the parent's i_mutex twice (since the
> rename code think it's dealing with a cross-directory rename)).
>
> Using __d_find_any_alias seems to fix the problem.
>
> Would rehashing an UNHASHED dentry here violate any important assumptions?
>
> > I'm a bit uncomfortable that d_splice_alias drops all locks before
> > calling d_move. Normally when d_move is called, both directories are
> > locked in some way. In the case the source is not a directory bit
> > is the state of being anonymous. Theoretically two calls to
> > d_splice_alias on the same inode could try to d_move the same
> > anonymous dentry to two different places, which would be bad.
> > So I think some locking is needed there.
>
> I think d_splice_alias is only ever called from lookup, under the
> parent's i_mutex?
>
> So if we have two tasks in the d_move here, then the two callers must be
> doing lookups in two different directories, and holding the i_mutex on
> both of them.
>
> The directory can't have been renamed while either caller was holding
> the i_mutex, so this is really a directory that's returned on lookup
> from two different parents.
>
> That sounds like a bug already. Could it happen on a corrupted
> filesystem maybe?
>
> --b.

2012-02-29 23:10:37

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Mon, Feb 20, 2012 at 01:55:37PM +1100, NeilBrown wrote:
> On Thu, 16 Feb 2012 17:30:11 -0500 "J. Bruce Fields" <[email protected]>
> wrote:
>
> > On Thu, Feb 16, 2012 at 02:06:03PM +1100, NeilBrown wrote:
> > > I was going ask how you managed to get an 'unhashed' dentry which was not
> > > DISCONNECTED, and belonged to a directory that could be the subject of
> > > d_splice_alias (that implies it has a name).
> > >
> > > The bug sounds like a race between lookup and rmdir, which should be
> > > prevented by i_mutex.
> > >
> > > I think that using __d_find_any_alias would just be papering over the
> > > problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias.
> >
> > Looking through the latest upstream code, I can't come up with another
> > obvious reproducer.
> >
> > But I also can't see the code making any particular effort to ensure
> > that dentries are removed from inode's alias lists at the same time
> > they're unhashed.
> >
> > E.g., trace up through the callers of d_drop/__d_drop and try to
> > convince yourselves that they all end up removing the dentry from the
> > alias list.
> >
> > Can you see any reason why the following would actually create a
> > problem?
>
> No, I don't think that would cause problems, so it is probably a good clean
> up and as Peng says it means we can remove the want_discon arg as well.
>
> However I cannot help thinking that something else must be going wrong before
> we get to this point.
> When you rmdir a directory it must be empty and it will never be linked again.
> So how does 'lookup' find the inode and want to attach a dentry to it?
>
> So I still think this is just papering over some other problem.
>
> I think that looking at when aliases are removed is missing the point.
>
> A directory can only have one name so it can only have one dentry.
> If that dentry gets unhashed, that is because the directory was deleted.

Wait, what other reasons could cause them to get unhashed?:

Just grepping for callers of d_drop....

On a distributed filesystem, we may unhash an in-use dentry if it no
longer seems to be valid. Can something like this happen?

- nfsd holds a filehandle for directory a/b
- a/b is renamed to c/d by some other host using this
filesystem.
- filesystem looks up a/b, finds it no longer there, unhashes
it.
- a lookup for c/d later adds a new dentry.

And then we have two dentries?

And of course we unhash dentries when they're not in use, just to free
memory. I think that case is OK.

--b.

> So
> now it must have zero names. So there is no way that lookup can possibly
> find it, so there is no way that d_splice_alias can be asked to attach an
> alias to it.

2012-02-16 16:08:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Thu, Feb 16, 2012 at 06:51:33AM -0500, J. Bruce Fields wrote:
> I'm seeing something similar happening on an upstream kernel, so I'll
> try to figure out where that's coming from now....

After taking another look, actually this was something happening on
proc, not on an exported filesystem.

Running some tests on a kernel with this extra warning added to
__d_instantiate whenever it gives a directory a second alias:

Feb 15 14:43:07 pip1 kernel: __d_instantiate adding alias
ffff880037bf51c0(8) to dir ffff880035eaf048(4026532190) with existing alias(es) ffff880034037180(88)

The numbers after the dentry pointers are dentry flags in hex (8 ==
DCACHE_OP_DELETE, 88 == DCACHE_RCUACCESS|DCACHE_OP_DELETE), and after
the inode pointers is an inode number.

I don't know if it's a bug or not, though it seems suspicous.

--b.

Feb 15 14:43:07 pip1 kernel: ------------[ cut here ]------------
Feb 15 14:43:07 pip1 kernel: WARNING: at fs/dcache.c:1320 __d_instantiate+0x185/0x1d0()
Feb 15 14:43:07 pip1 kernel: Hardware name: Bochs
Feb 15 14:43:07 pip1 kernel: Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc [last unloaded: scsi_wait_scan]
Feb 15 14:43:07 pip1 kernel: Pid: 4810, comm: exportfs Not tainted 3.3.0-rc2-00517-g35ae270 #1070
Feb 15 14:43:07 pip1 kernel: Call Trace:
Feb 15 14:43:07 pip1 kernel: [<ffffffff8104040f>] warn_slowpath_common+0x7f/0xc0
Feb 15 14:43:07 pip1 kernel: [<ffffffff8104046a>] warn_slowpath_null+0x1a/0x20
Feb 15 14:43:07 pip1 kernel: [<ffffffff8113f885>] __d_instantiate+0x185/0x1d0
Feb 15 14:43:07 pip1 kernel: [<ffffffff8113face>] d_instantiate+0x4e/0x90
Feb 15 14:43:07 pip1 kernel: [<ffffffff81192d7b>] proc_lookup_de+0xab/0x110
Feb 15 14:43:07 pip1 kernel: [<ffffffff81196a90>] ? proc_sys_poll_notify+0x30/0x30
Feb 15 14:43:07 pip1 kernel: [<ffffffff81196c1f>] proc_tgid_net_lookup+0x3f/0x50
Feb 15 14:43:07 pip1 kernel: [<ffffffff81133f55>] d_alloc_and_lookup+0x45/0x90
Feb 15 14:43:07 pip1 kernel: [<ffffffff8113e455>] ? d_lookup+0x35/0x60
Feb 15 14:43:07 pip1 kernel: [<ffffffff8113453a>] do_lookup+0x28a/0x3b0
Feb 15 14:43:07 pip1 kernel: [<ffffffff814c792c>] ? security_inode_permission+0x1c/0x30
Feb 15 14:43:07 pip1 kernel: [<ffffffff81135337>] link_path_walk+0x157/0x940
Feb 15 14:43:07 pip1 kernel: [<ffffffff8113910c>] path_openat+0xbc/0x440
Feb 15 14:43:07 pip1 kernel: [<ffffffff81106243>] ? might_fault+0x53/0xb0
Feb 15 14:43:07 pip1 kernel: [<ffffffff81106243>] ? might_fault+0x53/0xb0
Feb 15 14:43:07 pip1 kernel: [<ffffffff81145aeb>] ? alloc_fd+0x3b/0x160
Feb 15 14:43:07 pip1 kernel: [<ffffffff811395a9>] do_filp_open+0x49/0xa0
Feb 15 14:43:07 pip1 kernel: [<ffffffff81145b5b>] ? alloc_fd+0xab/0x160
Feb 15 14:43:07 pip1 kernel: [<ffffffff81126328>] do_sys_open+0x108/0x1e0
Feb 15 14:43:07 pip1 kernel: [<ffffffff81126441>] sys_open+0x21/0x30
Feb 15 14:43:07 pip1 kernel: [<ffffffff81973252>] system_call_fastpath+0x16/0x1b
Feb 15 14:43:07 pip1 kernel: ---[ end trace 5714bb1463cb9b46 ]---



>
> > I think that using __d_find_any_alias would just be papering over the
> > problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias.
>
> Right, I throw away that BUG_ON as well. Tested on rhel5 only.
>
> --b.

2012-02-17 16:34:34

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Fri, Feb 17, 2012 at 6:30 AM, J. Bruce Fields <[email protected]> wrote:
> On Thu, Feb 16, 2012 at 02:06:03PM +1100, NeilBrown wrote:
>> I was going ask how you managed to get an 'unhashed' dentry which was not
>> DISCONNECTED, and belonged to a directory that could be the subject of
>> d_splice_alias (that implies it has a name).
>>
>> The bug sounds like a race between lookup and rmdir, which should be
>> prevented by i_mutex.
>>
>> I think that using __d_find_any_alias would just be papering over the
>> problem, and would trigger a BUG_ON when it returned a non-DISCONNECTED alias.
>
> Looking through the latest upstream code, I can't come up with another
> obvious reproducer.
>
> But I also can't see the code making any particular effort to ensure
> that dentries are removed from inode's alias lists at the same time
> they're unhashed.
>
> E.g., trace up through the callers of d_drop/__d_drop and try to
> convince yourselves that they all end up removing the dentry from the
> alias list.
>
> Can you see any reason why the following would actually create a
> problem?
>
> --b.
>
> commit fcfef6b7319c5d19ea5064317528ff994343b011
> Author: J. Bruce Fields <[email protected]>
> Date:   Mon Feb 13 13:38:33 2012 -0500
>
>    exports: stop d_splice_alias creating directory aliases
>
>    A directory should never have more than one dentry pointing to it.
>
>    But d_splice_alias() does so in the case of a directory with an
>    already-existing non-DISCONNECTED dentry.
>
>    Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0,
>    this could cause an nfsd deadlock like this:
>
>        - Somebody attempts to remove a non-empty directory.
>        - The dentry_unhash() in vfs_rmdir() unhashes the dentry
>          pointing to the non-empty directory.
>        - ->rmdir() then fails with -ENOTEMPTY
>        - Before the vfs_rmdir() caller reaches dput(), an nfsd process
>          in rename looks up the directory by filehandle; at the end of
>          that lookup, this dentry is found by d_alloc_anon(), and a
>          reference is taken on it, preventing dput() from removing it.
>        - A regular lookup of the directory calls d_splice_alias(),
>          finds only an unhashed (not a DISCONNECTED) dentry, and
>          insteads adds a new one, so the directory now has two
>          dentries.
>        - The nfsd process in rename, which was previously looking up
>          the source directory of the rename, now looks up the target
>          directory (which is the same), and gets the dentry newly
>          created by the previous lookup.
>        - The rename, seeing two different dentries, assumes this is a
>          cross-directory rename and attempts to take the i_mutex on the
>          directory twice.
>
>    I don't see as obvious a reproducer now, but I also don't see the
>    existing code taking care to remove dentries from the alias list
>    whenever they're unhashed.
>
>    It therefore seems safest to allow d_splice_alias to use any dentry it
>    finds.
>
>    Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index f68e193..1fd2256 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1602,9 +1602,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
>
>        if (inode && S_ISDIR(inode->i_mode)) {
>                spin_lock(&inode->i_lock);
> -               new = __d_find_alias(inode, 1);
by replacing this, the want_discon argument is no longer in use. care
to remove it as well?

--
Thanks,
Tao
> +               new = __d_find_any_alias(inode);
>                if (new) {
> -                       BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
>                        spin_unlock(&inode->i_lock);
>                        security_d_instantiate(new, inode);
>                        d_move(new, dentry);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

2012-03-13 20:55:36

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Sat, Feb 18, 2012 at 12:34:14AM +0800, Peng Tao wrote:
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index f68e193..1fd2256 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -1602,9 +1602,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
> >
> >        if (inode && S_ISDIR(inode->i_mode)) {
> >                spin_lock(&inode->i_lock);
> > -               new = __d_find_alias(inode, 1);
> by replacing this, the want_discon argument is no longer in use. care
> to remove it as well?

Yep, both patches follow; Al, could you take these?

--b.

2012-03-13 20:58:46

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 2/2] vfs: remove unused __d_splice_alias argument

From: "J. Bruce Fields" <[email protected]>

Nobody sets want_disconn any more.

Reported-by: Peng Tao <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/dcache.c | 13 +++++--------
1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 30f4236..5eda891 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -606,8 +606,6 @@ EXPORT_SYMBOL(dget_parent);
/**
* d_find_alias - grab a hashed alias of inode
* @inode: inode in question
- * @want_discon: flag, used by d_splice_alias, to request
- * that only a DISCONNECTED alias be returned.
*
* If inode has a hashed alias, or is a directory and has any alias,
* acquire the reference to alias and return it. Otherwise return NULL.
@@ -616,10 +614,9 @@ EXPORT_SYMBOL(dget_parent);
* of a filesystem.
*
* If the inode has an IS_ROOT, DCACHE_DISCONNECTED alias, then prefer
- * any other hashed alias over that one unless @want_discon is set,
- * in which case only return an IS_ROOT, DCACHE_DISCONNECTED alias.
+ * any other hashed alias over that.
*/
-static struct dentry *__d_find_alias(struct inode *inode, int want_discon)
+static struct dentry *__d_find_alias(struct inode *inode)
{
struct dentry *alias, *discon_alias;

@@ -631,7 +628,7 @@ again:
if (IS_ROOT(alias) &&
(alias->d_flags & DCACHE_DISCONNECTED)) {
discon_alias = alias;
- } else if (!want_discon) {
+ } else {
__dget_dlock(alias);
spin_unlock(&alias->d_lock);
return alias;
@@ -662,7 +659,7 @@ struct dentry *d_find_alias(struct inode *inode)

if (!list_empty(&inode->i_dentry)) {
spin_lock(&inode->i_lock);
- de = __d_find_alias(inode, 0);
+ de = __d_find_alias(inode);
spin_unlock(&inode->i_lock);
}
return de;
@@ -2373,7 +2370,7 @@ struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode)
struct dentry *alias;

/* Does an aliased dentry already exist? */
- alias = __d_find_alias(inode, 0);
+ alias = __d_find_alias(inode);
if (alias) {
actual = alias;
write_seqlock(&rename_lock);
--
1.7.5.4


2012-03-13 20:58:47

by J. Bruce Fields

[permalink] [raw]
Subject: [PATCH 1/2] vfs: stop d_splice_alias creating directory aliases

From: "J. Bruce Fields" <[email protected]>

A directory should never have more than one dentry pointing to it.

But d_splice_alias() will add one if it finds a directory with an
already-existing non-DISCONNECTED dentry.

I can't find an obvious reproducer, but I also can't see what prevents
d_splice_alias() from encountering such a case.

It therefore seems safest to allow d_splice_alias to use any dentry it
finds.

(Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0,
this could cause an nfsd deadlock like this:

- Somebody attempts to remove a non-empty directory.
- The dentry_unhash() in vfs_rmdir() unhashes the dentry
pointing to the non-empty directory.
- ->rmdir() then fails with -ENOTEMPTY
- Before the vfs_rmdir() caller reaches dput(), an nfsd process
in rename looks up the directory by filehandle; at the end of
that lookup, this dentry is found by d_alloc_anon(), and a
reference is taken on it, preventing dput() from removing it.
- A regular lookup of the directory calls d_splice_alias(),
finds only an unhashed (not a DISCONNECTED) dentry, and
insteads adds a new one, so the directory now has two
dentries.
- The nfsd process in rename, which was previously looking up
the source directory of the rename, now looks up the target
directory (which is the same), and gets the dentry newly
created by the previous lookup.
- The rename, seeing two different dentries, assumes this is a
cross-directory rename and attempts to take the i_mutex on the
directory twice.

That reproducer no longer exists, but I don't think there was anything
fundamentally incorrect about the vfs_rmdir() behavior there, so I think
the real fault was here in d_splice_alias().)

Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/dcache.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 16a53cc..30f4236 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1587,9 +1587,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)

if (inode && S_ISDIR(inode->i_mode)) {
spin_lock(&inode->i_lock);
- new = __d_find_alias(inode, 1);
+ new = __d_find_any_alias(inode);
if (new) {
- BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
spin_unlock(&inode->i_lock);
security_d_instantiate(new, inode);
d_move(new, dentry);
--
1.7.5.4


2012-05-23 22:07:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 1/2] vfs: stop d_splice_alias creating directory aliases

These two patches are a resend. I think they're appropriate for 3.5.

--b.

On Wed, May 23, 2012 at 06:05:45PM -0400, J. Bruce Fields wrote:
> From: "J. Bruce Fields" <[email protected]>
>
> A directory should never have more than one dentry pointing to it.
>
> But d_splice_alias() will add one if it finds a directory with an
> already-existing non-DISCONNECTED dentry.
>
> I can't find an obvious reproducer, but I also can't see what prevents
> d_splice_alias() from encountering such a case.
>
> It therefore seems safest to allow d_splice_alias to use any dentry it
> finds.
>
> (Prior to the removal of dentry_unhash() from vfs_rmdir(), around v3.0,
> this could cause an nfsd deadlock like this:
>
> - Somebody attempts to remove a non-empty directory.
> - The dentry_unhash() in vfs_rmdir() unhashes the dentry
> pointing to the non-empty directory.
> - ->rmdir() then fails with -ENOTEMPTY
> - Before the vfs_rmdir() caller reaches dput(), an nfsd process
> in rename looks up the directory by filehandle; at the end of
> that lookup, this dentry is found by d_alloc_anon(), and a
> reference is taken on it, preventing dput() from removing it.
> - A regular lookup of the directory calls d_splice_alias(),
> finds only an unhashed (not a DISCONNECTED) dentry, and
> insteads adds a new one, so the directory now has two
> dentries.
> - The nfsd process in rename, which was previously looking up
> the source directory of the rename, now looks up the target
> directory (which is the same), and gets the dentry newly
> created by the previous lookup.
> - The rename, seeing two different dentries, assumes this is a
> cross-directory rename and attempts to take the i_mutex on the
> directory twice.
>
> That reproducer no longer exists, but I don't think there was anything
> fundamentally incorrect about the vfs_rmdir() behavior there, so I think
> the real fault was here in d_splice_alias().)
>
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/dcache.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b60ddc4..2434c1e 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1606,9 +1606,8 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
>
> if (inode && S_ISDIR(inode->i_mode)) {
> spin_lock(&inode->i_lock);
> - new = __d_find_alias(inode, 1);
> + new = __d_find_any_alias(inode);
> if (new) {
> - BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
> spin_unlock(&inode->i_lock);
> security_d_instantiate(new, inode);
> d_move(new, dentry);
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-06-29 20:29:11

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Fri, Jun 29, 2012 at 04:10:34PM -0400, J. Bruce Fields wrote:
> On Thu, Jun 28, 2012 at 09:59:27AM -0400, J. Bruce Fields wrote:
> > Coming back to this now, just trying to review the
> > filehandle-lookup/dcache interactions:
> >
> > On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote:
> > > 1/ Originally DCACHE_DISCONNECTED didn't really mean much - it's presence
> > > was only a hint, its absence was a strong statement.
> > > If the flag is set, the dentry might not be linked to the root.
> > > If it is clear, it definitely is link through to the root.
> > > However I think it was used with stronger intent than that.
> > >
> > > Now it seems to mean a little bit more: If it is set and the dentry
> > > is hashed, then it must be on the sb->s_anon list.
> >
> > The code that makes that assumption is __d_shrink (which does the work
> > of d_drop)--it uses DCACHE_DISCONECTED to decide which hash chain to
> > lock.
> >
> > I can't find any basis for that assumption. The only code that clears
> > DCACHE_DISCONNECTED is in expfs.c, and it isn't done at the same time as
> > hashing. Am I missing something?
> >
> > > This is a significant
> > > which I never noticed (I haven't been watching). Originally a
> > > disconnected dentry would be attached (and hashed) to its parent. Then
> > > that parent would get its own parent and so on until it was attached all
> > > the way to the root. Only then would be start clearing
> > > DCACHE_DISCONNECTED. It seems we must clear it sooner now... I wonder if
> > > that is correct.
> >
> > It looks wrong to me:
> >
> > If we clear DCACHE_DISCONNECTED too early, then we risk a filehandle
> > lookup thinking the dentry is OK to use. That could mean for example
> > trying to rename across directories that don't have any ancestor
> > relationship to each other in the dcache yet.
> >
> > So we need to wait to clear DCACHE_DISCONNECTED until we *know* the
> > dentry's parents go all the way back to the root. As you say, that's
> > what the current code does.
> >
> > But that means DCACHE_DISCONNECTED dentries can be hashed to their
> > parents, and __d_shrink can be handed such dentries and then get the
> > locking wrong.
> >
> > It looks like this bug might originate with Nick Piggin's ceb5bdc2d246
> > "fs: dcache per-bucket dcache hash locking"? There's no discussion in
> > the changelog, so probably it was just based on an unexamined assumption
> > about DCACHE_DISCONNECTED.
> >
> > I wonder if an IS_ROOT() test could replace the DCACHE_DISCONNECTED test
> > in __d_shrink(), or if we need another flag, or ?
>
> Bah, sorry, and I only just noticed that you already said as much later
> and did the IS_ROOT() thing in your patch.
>
> Anyway, here's just that one change with a slightly more painstaking
> changelog.
>
> --b.
>
> commit b1fa644355122627424fe2240a9fc60cbef4c349
> Author: J. Bruce Fields <[email protected]>
> Date: Thu Jun 28 12:10:55 2012 -0400
>
> dcache: use IS_ROOT to decide where dentry is hashed
>
> Every hashed dentry is either hashed in the dentry_hashtable, or a
> superblock's s_anon list.
>
> __d_shrink assumes it can determine which is the case by checking
> DCACHE_DISCONNECTED; this is not true.
>
> It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
> only hashed on dentry_hashtable, but is fully connected to its parents
> back to the root.
>
> But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
> attempts to connect a directory (found by filehandle lookup) back to
> root by ascending to parents and performing lookups one at a time. It
> does not clear DCACHE_DISCONNECTED until its done, and that is not at
> all an atomic process.
>
> In particular, it is possible for DCACHE_DISCONNECTED to be set on a
> dentry which is hashed on the dentry_hashtable.
>
> Instead, use IS_ROOT() to check which hash chain a dentry is on. This
> *does* work:
>
> Dentries are hashed only by:
>
> - d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.
>
> - __d_rehash, called by _d_rehash: hashes to the dentry's
> parent, and all callers of _d_rehash appear to have d_parent
> set to a "real" parent.
> - __d_rehash, called by __d_move: rehashes the moved dentry to
> hash chain determined by target, and assigns target's d_parent
> to its d_parent, before dropping the dentry's d_lock.
>
> Therefore I believe it's safe for a holder of a dentry's d_lock to
> assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
> true.
>
> I believe the incorrect assumption about DCACHE_DISCONNECTED was
> originally introduced by ceb5bdc2d246 "fs: dcache per-bucket dcache hash
> locking".
>
> Cc: Neil Brown <[email protected]>
> Cc: Nick Piggin <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 87c2da7..b2b382c 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -410,7 +410,7 @@ static void __d_shrink(struct dentry *dentry)
> {
> if (!d_unhashed(dentry)) {
> struct hlist_bl_head *b;
> - if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED))
> + if (unlikely(IS_ROOT(dentry->d_flags)))

Um, right--I'll send an actual tested version along with some other
stuff later.

--b.

> b = &dentry->d_sb->s_anon;
> else
> b = d_hash(dentry->d_parent, dentry->d_name.hash);

2012-06-28 13:59:33

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

Coming back to this now, just trying to review the
filehandle-lookup/dcache interactions:

On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote:
> 1/ Originally DCACHE_DISCONNECTED didn't really mean much - it's presence
> was only a hint, its absence was a strong statement.
> If the flag is set, the dentry might not be linked to the root.
> If it is clear, it definitely is link through to the root.
> However I think it was used with stronger intent than that.
>
> Now it seems to mean a little bit more: If it is set and the dentry
> is hashed, then it must be on the sb->s_anon list.

The code that makes that assumption is __d_shrink (which does the work
of d_drop)--it uses DCACHE_DISCONECTED to decide which hash chain to
lock.

I can't find any basis for that assumption. The only code that clears
DCACHE_DISCONNECTED is in expfs.c, and it isn't done at the same time as
hashing. Am I missing something?

> This is a significant
> which I never noticed (I haven't been watching). Originally a
> disconnected dentry would be attached (and hashed) to its parent. Then
> that parent would get its own parent and so on until it was attached all
> the way to the root. Only then would be start clearing
> DCACHE_DISCONNECTED. It seems we must clear it sooner now... I wonder if
> that is correct.

It looks wrong to me:

If we clear DCACHE_DISCONNECTED too early, then we risk a filehandle
lookup thinking the dentry is OK to use. That could mean for example
trying to rename across directories that don't have any ancestor
relationship to each other in the dcache yet.

So we need to wait to clear DCACHE_DISCONNECTED until we *know* the
dentry's parents go all the way back to the root. As you say, that's
what the current code does.

But that means DCACHE_DISCONNECTED dentries can be hashed to their
parents, and __d_shrink can be handed such dentries and then get the
locking wrong.

It looks like this bug might originate with Nick Piggin's ceb5bdc2d246
"fs: dcache per-bucket dcache hash locking"? There's no discussion in
the changelog, so probably it was just based on an unexamined assumption
about DCACHE_DISCONNECTED.

I wonder if an IS_ROOT() test could replace the DCACHE_DISCONNECTED test
in __d_shrink(), or if we need another flag, or ?

--b.

2012-06-29 20:10:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Thu, Jun 28, 2012 at 09:59:27AM -0400, J. Bruce Fields wrote:
> Coming back to this now, just trying to review the
> filehandle-lookup/dcache interactions:
>
> On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote:
> > 1/ Originally DCACHE_DISCONNECTED didn't really mean much - it's presence
> > was only a hint, its absence was a strong statement.
> > If the flag is set, the dentry might not be linked to the root.
> > If it is clear, it definitely is link through to the root.
> > However I think it was used with stronger intent than that.
> >
> > Now it seems to mean a little bit more: If it is set and the dentry
> > is hashed, then it must be on the sb->s_anon list.
>
> The code that makes that assumption is __d_shrink (which does the work
> of d_drop)--it uses DCACHE_DISCONECTED to decide which hash chain to
> lock.
>
> I can't find any basis for that assumption. The only code that clears
> DCACHE_DISCONNECTED is in expfs.c, and it isn't done at the same time as
> hashing. Am I missing something?
>
> > This is a significant
> > which I never noticed (I haven't been watching). Originally a
> > disconnected dentry would be attached (and hashed) to its parent. Then
> > that parent would get its own parent and so on until it was attached all
> > the way to the root. Only then would be start clearing
> > DCACHE_DISCONNECTED. It seems we must clear it sooner now... I wonder if
> > that is correct.
>
> It looks wrong to me:
>
> If we clear DCACHE_DISCONNECTED too early, then we risk a filehandle
> lookup thinking the dentry is OK to use. That could mean for example
> trying to rename across directories that don't have any ancestor
> relationship to each other in the dcache yet.
>
> So we need to wait to clear DCACHE_DISCONNECTED until we *know* the
> dentry's parents go all the way back to the root. As you say, that's
> what the current code does.
>
> But that means DCACHE_DISCONNECTED dentries can be hashed to their
> parents, and __d_shrink can be handed such dentries and then get the
> locking wrong.
>
> It looks like this bug might originate with Nick Piggin's ceb5bdc2d246
> "fs: dcache per-bucket dcache hash locking"? There's no discussion in
> the changelog, so probably it was just based on an unexamined assumption
> about DCACHE_DISCONNECTED.
>
> I wonder if an IS_ROOT() test could replace the DCACHE_DISCONNECTED test
> in __d_shrink(), or if we need another flag, or ?

Bah, sorry, and I only just noticed that you already said as much later
and did the IS_ROOT() thing in your patch.

Anyway, here's just that one change with a slightly more painstaking
changelog.

--b.

commit b1fa644355122627424fe2240a9fc60cbef4c349
Author: J. Bruce Fields <[email protected]>
Date: Thu Jun 28 12:10:55 2012 -0400

dcache: use IS_ROOT to decide where dentry is hashed

Every hashed dentry is either hashed in the dentry_hashtable, or a
superblock's s_anon list.

__d_shrink assumes it can determine which is the case by checking
DCACHE_DISCONNECTED; this is not true.

It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
only hashed on dentry_hashtable, but is fully connected to its parents
back to the root.

But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
attempts to connect a directory (found by filehandle lookup) back to
root by ascending to parents and performing lookups one at a time. It
does not clear DCACHE_DISCONNECTED until its done, and that is not at
all an atomic process.

In particular, it is possible for DCACHE_DISCONNECTED to be set on a
dentry which is hashed on the dentry_hashtable.

Instead, use IS_ROOT() to check which hash chain a dentry is on. This
*does* work:

Dentries are hashed only by:

- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.

- __d_rehash, called by _d_rehash: hashes to the dentry's
parent, and all callers of _d_rehash appear to have d_parent
set to a "real" parent.
- __d_rehash, called by __d_move: rehashes the moved dentry to
hash chain determined by target, and assigns target's d_parent
to its d_parent, before dropping the dentry's d_lock.

Therefore I believe it's safe for a holder of a dentry's d_lock to
assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
true.

I believe the incorrect assumption about DCACHE_DISCONNECTED was
originally introduced by ceb5bdc2d246 "fs: dcache per-bucket dcache hash
locking".

Cc: Neil Brown <[email protected]>
Cc: Nick Piggin <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/dcache.c b/fs/dcache.c
index 87c2da7..b2b382c 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -410,7 +410,7 @@ static void __d_shrink(struct dentry *dentry)
{
if (!d_unhashed(dentry)) {
struct hlist_bl_head *b;
- if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED))
+ if (unlikely(IS_ROOT(dentry->d_flags)))
b = &dentry->d_sb->s_anon;
else
b = d_hash(dentry->d_parent, dentry->d_name.hash);

2012-07-01 23:15:22

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] fs/dcache: allow __d_obtain_alias() to return unhashed dentries

On Fri, 29 Jun 2012 16:29:03 -0400 "J. Bruce Fields" <[email protected]>
wrote:

> On Fri, Jun 29, 2012 at 04:10:34PM -0400, J. Bruce Fields wrote:
> > On Thu, Jun 28, 2012 at 09:59:27AM -0400, J. Bruce Fields wrote:
> > > Coming back to this now, just trying to review the
> > > filehandle-lookup/dcache interactions:
> > >
> > > On Fri, Mar 11, 2011 at 03:07:49PM +1100, NeilBrown wrote:
> > > > 1/ Originally DCACHE_DISCONNECTED didn't really mean much - it's presence
> > > > was only a hint, its absence was a strong statement.
> > > > If the flag is set, the dentry might not be linked to the root.
> > > > If it is clear, it definitely is link through to the root.
> > > > However I think it was used with stronger intent than that.
> > > >
> > > > Now it seems to mean a little bit more: If it is set and the dentry
> > > > is hashed, then it must be on the sb->s_anon list.
> > >
> > > The code that makes that assumption is __d_shrink (which does the work
> > > of d_drop)--it uses DCACHE_DISCONECTED to decide which hash chain to
> > > lock.
> > >
> > > I can't find any basis for that assumption. The only code that clears
> > > DCACHE_DISCONNECTED is in expfs.c, and it isn't done at the same time as
> > > hashing. Am I missing something?
> > >
> > > > This is a significant
> > > > which I never noticed (I haven't been watching). Originally a
> > > > disconnected dentry would be attached (and hashed) to its parent. Then
> > > > that parent would get its own parent and so on until it was attached all
> > > > the way to the root. Only then would be start clearing
> > > > DCACHE_DISCONNECTED. It seems we must clear it sooner now... I wonder if
> > > > that is correct.
> > >
> > > It looks wrong to me:
> > >
> > > If we clear DCACHE_DISCONNECTED too early, then we risk a filehandle
> > > lookup thinking the dentry is OK to use. That could mean for example
> > > trying to rename across directories that don't have any ancestor
> > > relationship to each other in the dcache yet.
> > >
> > > So we need to wait to clear DCACHE_DISCONNECTED until we *know* the
> > > dentry's parents go all the way back to the root. As you say, that's
> > > what the current code does.
> > >
> > > But that means DCACHE_DISCONNECTED dentries can be hashed to their
> > > parents, and __d_shrink can be handed such dentries and then get the
> > > locking wrong.
> > >
> > > It looks like this bug might originate with Nick Piggin's ceb5bdc2d246
> > > "fs: dcache per-bucket dcache hash locking"? There's no discussion in
> > > the changelog, so probably it was just based on an unexamined assumption
> > > about DCACHE_DISCONNECTED.
> > >
> > > I wonder if an IS_ROOT() test could replace the DCACHE_DISCONNECTED test
> > > in __d_shrink(), or if we need another flag, or ?
> >
> > Bah, sorry, and I only just noticed that you already said as much later
> > and did the IS_ROOT() thing in your patch.
> >
> > Anyway, here's just that one change with a slightly more painstaking
> > changelog.
> >
> > --b.
> >
> > commit b1fa644355122627424fe2240a9fc60cbef4c349
> > Author: J. Bruce Fields <[email protected]>
> > Date: Thu Jun 28 12:10:55 2012 -0400
> >
> > dcache: use IS_ROOT to decide where dentry is hashed
> >
> > Every hashed dentry is either hashed in the dentry_hashtable, or a
> > superblock's s_anon list.
> >
> > __d_shrink assumes it can determine which is the case by checking
> > DCACHE_DISCONNECTED; this is not true.
> >
> > It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
> > only hashed on dentry_hashtable, but is fully connected to its parents
> > back to the root.
> >
> > But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
> > attempts to connect a directory (found by filehandle lookup) back to
> > root by ascending to parents and performing lookups one at a time. It
> > does not clear DCACHE_DISCONNECTED until its done, and that is not at
> > all an atomic process.
> >
> > In particular, it is possible for DCACHE_DISCONNECTED to be set on a
> > dentry which is hashed on the dentry_hashtable.
> >
> > Instead, use IS_ROOT() to check which hash chain a dentry is on. This
> > *does* work:
> >
> > Dentries are hashed only by:
> >
> > - d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.
> >
> > - __d_rehash, called by _d_rehash: hashes to the dentry's
> > parent, and all callers of _d_rehash appear to have d_parent
> > set to a "real" parent.
> > - __d_rehash, called by __d_move: rehashes the moved dentry to
> > hash chain determined by target, and assigns target's d_parent
> > to its d_parent, before dropping the dentry's d_lock.
> >
> > Therefore I believe it's safe for a holder of a dentry's d_lock to
> > assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
> > true.
> >
> > I believe the incorrect assumption about DCACHE_DISCONNECTED was
> > originally introduced by ceb5bdc2d246 "fs: dcache per-bucket dcache hash
> > locking".
> >
> > Cc: Neil Brown <[email protected]>
> > Cc: Nick Piggin <[email protected]>
> > Signed-off-by: J. Bruce Fields <[email protected]>
> >
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index 87c2da7..b2b382c 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -410,7 +410,7 @@ static void __d_shrink(struct dentry *dentry)
> > {
> > if (!d_unhashed(dentry)) {
> > struct hlist_bl_head *b;
> > - if (unlikely(dentry->d_flags & DCACHE_DISCONNECTED))
> > + if (unlikely(IS_ROOT(dentry->d_flags)))
>
> Um, right--I'll send an actual tested version along with some other
> stuff later.

:-)

If that tested version looks like:
if (unlikely(IS_ROOT(dentry)))
you can add a
Reviewed-by: NeilBrown <[email protected]>

Thanks,
NeilBrown


>
> --b.
>
> > b = &dentry->d_sb->s_anon;
> > else
> > b = d_hash(dentry->d_parent, dentry->d_name.hash);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Attachments:
signature.asc (828.00 B)