2004-04-23 13:02:53

by Nikita Danilov

[permalink] [raw]
Subject: d_splice_alias() problem.

Hello,

for some time I am observing that during stress tests over NFS

shrink_slab->...->prune_dcache()->prune_one_dentry()->...->iput()

is called on inode with ->i_nlink == 0 which results in truncate and
file deletion. This is wrong in general (file system is re-entered), and
deadlock prone on some file systems.

After some debugging, I tracked problem down the to d_splice_alias()
failing to identify dentries when necessary.

Suppose we have an inode with ->i_nlink == 1. It's accessed over NFS and
DCACHE_DISCONNECTED dentry D1 is created for it. Then, unlink request
comes for this file. nfsd looks name up in the parent directory
(nfsd_unlink()->lookup_one_len()). File system back-end uses
d_splice_alias(), but it only works for directories and we end up with
second (this time connected) dentry D2.

D2 is successfully unlinked, file has ->i_nlink == 0, and ->i_count == 1
from D1, and when prune_dcache() hits D1 bad things happen.

It's hard to imagine how new name can be identified with one among
multiple anonymous dentries, which is necessary for
NFSEXP_NOSUBTREECHECK export to work reliably.

One possible work-around is to forcibly destroy all remaining
DCACHE_DISCONNECTED dentries when ->i_nlink drops to zero, but I am not
sure that this is possible and solves all problems of having more
dentries than there are nlinks.

Nikita.


2004-04-23 15:41:42

by Andreas Dilger

[permalink] [raw]
Subject: Re: d_splice_alias() problem.

On Apr 23, 2004 17:02 +0400, Nikita Danilov wrote:
> Suppose we have an inode with ->i_nlink == 1. It's accessed over NFS and
> DCACHE_DISCONNECTED dentry D1 is created for it. Then, unlink request
> comes for this file. nfsd looks name up in the parent directory
> (nfsd_unlink()->lookup_one_len()). File system back-end uses
> d_splice_alias(), but it only works for directories and we end up with
> second (this time connected) dentry D2.
>
> It's hard to imagine how new name can be identified with one among
> multiple anonymous dentries, which is necessary for
> NFSEXP_NOSUBTREECHECK export to work reliably.
>
> One possible work-around is to forcibly destroy all remaining
> DCACHE_DISCONNECTED dentries when ->i_nlink drops to zero, but I am not
> sure that this is possible and solves all problems of having more
> dentries than there are nlinks.

We use a patch for Lustre which solves this problem. When there is
a lookup-by-inum done on the server there is the possibility to get a
DISCONNECTED dentry as you say. However, if we ever do another lookup
on this inode we verify that either this is a disconnected dentry and
return the existing dentry, or if it is a connected dentry we essentially
"rename" the disconnected dentry and connect it to the tree and return
that. There can never be both connected and disconnected dentry aliases
on an inode at one time.

This is handled inside the ext3 lookup code, I'm not sure how easy/hard
it would be to make a generic VFS patch to do the same.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2004-04-23 16:20:54

by Nikita Danilov

[permalink] [raw]
Subject: Re: d_splice_alias() problem.

Andreas Dilger writes:
> On Apr 23, 2004 17:02 +0400, Nikita Danilov wrote:
> > Suppose we have an inode with ->i_nlink == 1. It's accessed over NFS and
> > DCACHE_DISCONNECTED dentry D1 is created for it. Then, unlink request
> > comes for this file. nfsd looks name up in the parent directory
> > (nfsd_unlink()->lookup_one_len()). File system back-end uses
> > d_splice_alias(), but it only works for directories and we end up with
> > second (this time connected) dentry D2.
> >
> > It's hard to imagine how new name can be identified with one among
> > multiple anonymous dentries, which is necessary for
> > NFSEXP_NOSUBTREECHECK export to work reliably.
> >
> > One possible work-around is to forcibly destroy all remaining
> > DCACHE_DISCONNECTED dentries when ->i_nlink drops to zero, but I am not
> > sure that this is possible and solves all problems of having more
> > dentries than there are nlinks.
>
> We use a patch for Lustre which solves this problem. When there is
> a lookup-by-inum done on the server there is the possibility to get a
> DISCONNECTED dentry as you say. However, if we ever do another lookup
> on this inode we verify that either this is a disconnected dentry and
> return the existing dentry, or if it is a connected dentry we essentially
> "rename" the disconnected dentry and connect it to the tree and return
> that. There can never be both connected and disconnected dentry aliases
> on an inode at one time.

I am not sure I understand this description correctly, but it looks
pretty much like what d_splice_alias() is supposed to do according to
the comment on top of it.

What I missed is that inode can have no more than one disconnected
dentry even if it has multiple names. Hence when lookup-by-name happens
list we can d_move disconnected dentry in place of the named one. This
is no worse that what is done currently by d_find_alias() that
identifies disconnected dentry with arbitrary connected.

>
> This is handled inside the ext3 lookup code, I'm not sure how easy/hard
> it would be to make a generic VFS patch to do the same.
>
> Cheers, Andreas

Nikita.

2004-04-23 23:49:57

by Andrew Morton

[permalink] [raw]
Subject: Re: d_splice_alias() problem.

Nikita Danilov <[email protected]> wrote:
>
> for some time I am observing that during stress tests over NFS
>
> shrink_slab->...->prune_dcache()->prune_one_dentry()->...->iput()
>
> is called on inode with ->i_nlink == 0 which results in truncate and
> file deletion. This is wrong in general (file system is re-entered), and
> deadlock prone on some file systems.

The filesystem is only reentered if the caller of __alloc_pages() passed in
__GFP_FS, in which case the bug is in the caller, not in shrink_slab().

2004-04-26 12:45:11

by Nikita Danilov

[permalink] [raw]
Subject: Re: d_splice_alias() problem.

Andrew Morton writes:
> Nikita Danilov <[email protected]> wrote:
> >
> > for some time I am observing that during stress tests over NFS
> >
> > shrink_slab->...->prune_dcache()->prune_one_dentry()->...->iput()
> >
> > is called on inode with ->i_nlink == 0 which results in truncate and
> > file deletion. This is wrong in general (file system is re-entered), and
> > deadlock prone on some file systems.
>
> The filesystem is only reentered if the caller of __alloc_pages() passed in
> __GFP_FS, in which case the bug is in the caller, not in shrink_slab().

Well, I always thought that the only file system IO that GFP_FS is
expected to do is one issued by ->writepage. Doing truncate from within
VM scanner looks... wrong. But that's not the point, actually. Current
d_splice_alias leads to the following problems with NFS (and may be
other remote file systems also):

* there are more dentries than nlinks for a given file, as a result

* file (not opened by user) is not truncated when its last name is
removed. inode is pinned in the memory indefinitely by remaining
disconnected dentries.

* sequence "touch x; rm x" always creates _two_ dentries for "x": one
disconnected (by ->decode_fh) and one connected (by lookup_one_len
from nfs unlink request).

I think that d_splice_alias() should be changed to scan inode->i_dentry
list and d_move() any disconnected dentry found into new one.

>

Nikita.

2004-04-30 04:55:29

by NeilBrown

[permalink] [raw]
Subject: Re: d_splice_alias() problem.

On Friday April 23, [email protected] wrote:
> Hello,
>
> for some time I am observing that during stress tests over NFS
>
> shrink_slab->...->prune_dcache()->prune_one_dentry()->...->iput()
>
> is called on inode with ->i_nlink == 0 which results in truncate and
> file deletion. This is wrong in general (file system is re-entered), and
> deadlock prone on some file systems.
>
> After some debugging, I tracked problem down the to d_splice_alias()
> failing to identify dentries when necessary.
>
> Suppose we have an inode with ->i_nlink == 1. It's accessed over NFS and
> DCACHE_DISCONNECTED dentry D1 is created for it. Then, unlink request
> comes for this file. nfsd looks name up in the parent directory
> (nfsd_unlink()->lookup_one_len()). File system back-end uses
> d_splice_alias(), but it only works for directories and we end up with
> second (this time connected) dentry D2.
>
> D2 is successfully unlinked, file has ->i_nlink == 0, and ->i_count == 1
> from D1, and when prune_dcache() hits D1 bad things happen.
>
> It's hard to imagine how new name can be identified with one among
> multiple anonymous dentries, which is necessary for
> NFSEXP_NOSUBTREECHECK export to work reliably.
>
> One possible work-around is to forcibly destroy all remaining
> DCACHE_DISCONNECTED dentries when ->i_nlink drops to zero, but I am not
> sure that this is possible and solves all problems of having more
> dentries than there are nlinks.
>
> Nikita.

If I understand you correctly, the main problem is that a disconnected
dentry can hold an inode active after the last link has been removed.
The file will not then be truncated and removed until memory pressure
flushes the disconnected dentry from the dcache.

This problem can be resolved by making sure that an inode never has
both a connected and a disconnected dentry.

This is already the case for directories (as they must only have one
dentry), but it is not the case for non-directories.

The following patch tries to address this. It is a "technology
preview" in that the only testing I have done is that it compiles OK.

Please consider reviewing it to see if it makes sense.

It:
- changes d_alloc_anon to make sure that a new disconnected dentry is
only allocated if there is currently no (hashed) dentry for the
inode. (Previously this would noramlly be true, but a race was
possible).
- changes d_splice_alias to re-use a disconnected dentry on
non-directories as well as directories.
- splits most of d_find_alias out into a separate function to make
the above easier.

I haven't fully thought through issues with unhashed, connected
dentries.
__d_find_alias won't return them so d_alloc_anon will never return
one, so it is possible to have an unhashed dentry and a disconnected
dentry at the same time, which probably isn't desirable.

Is it OK for d_alloc_anon to return an unaliased dentry, and then have
it possibly spliced back into the dentry tree??? I'm not sure.

Comments welcome.

NeilBrown


===========================================================

----------- Diffstat output ------------
./fs/dcache.c | 60 +++++++++++++++++++++++++++++-----------------------------
1 files changed, 30 insertions(+), 30 deletions(-)

diff ./fs/dcache.c~current~ ./fs/dcache.c
--- ./fs/dcache.c~current~ 2004-04-30 14:25:50.000000000 +1000
+++ ./fs/dcache.c 2004-04-30 14:39:20.000000000 +1000
@@ -281,12 +281,11 @@ struct dentry * dget_locked(struct dentr
* any other hashed alias over that one.
*/

-struct dentry * d_find_alias(struct inode *inode)
+static struct dentry * __d_find_alias(struct inode *inode, int want_discon)
{
struct list_head *head, *next, *tmp;
struct dentry *alias, *discon_alias=NULL;

- spin_lock(&dcache_lock);
head = &inode->i_dentry;
next = inode->i_dentry.next;
while (next != head) {
@@ -297,19 +296,26 @@ struct dentry * d_find_alias(struct inod
if (!d_unhashed(alias)) {
if (alias->d_flags & DCACHE_DISCONNECTED)
discon_alias = alias;
- else {
+ else if (!want_discon) {
__dget_locked(alias);
- spin_unlock(&dcache_lock);
return alias;
}
}
}
if (discon_alias)
__dget_locked(discon_alias);
- spin_unlock(&dcache_lock);
return discon_alias;
}

+struct dentry * d_find_alias(struct inode *inode)
+{
+ struct dentry *de;
+ spin_lock(&dcache_lock);
+ de = __d_find_alias(inode, 0);
+ spin_unlock(&dcache_lock);
+ return de;
+}
+
/*
* Try to kill dentries associated with this inode.
* WARNING: you must own a reference to inode.
@@ -835,28 +841,22 @@ struct dentry * d_alloc_anon(struct inod
tmp->d_parent = tmp; /* make sure dput doesn't croak */

spin_lock(&dcache_lock);
- if (S_ISDIR(inode->i_mode) && !list_empty(&inode->i_dentry)) {
- /* A directory can only have one dentry.
- * This (now) has one, so use it.
- */
- res = list_entry(inode->i_dentry.next, struct dentry, d_alias);
- __dget_locked(res);
- } else {
- /* attach a disconnected dentry */
+
+ res = __d_find_alias(inode, 0);
+ if (!res) {
res = tmp;
tmp = NULL;
- if (res) {
- spin_lock(&res->d_lock);
- res->d_sb = inode->i_sb;
- res->d_parent = res;
- res->d_inode = inode;
- res->d_bucket = d_hash(res, res->d_name.hash);
- res->d_flags |= DCACHE_DISCONNECTED;
- res->d_vfs_flags &= ~DCACHE_UNHASHED;
- list_add(&res->d_alias, &inode->i_dentry);
- hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
- spin_unlock(&res->d_lock);
- }
+ spin_lock(&res->d_lock);
+ res->d_sb = inode->i_sb;
+ res->d_parent = res;
+ res->d_inode = inode;
+ res->d_bucket = d_hash(res, res->d_name.hash);
+ res->d_flags |= DCACHE_DISCONNECTED;
+ res->d_vfs_flags &= ~DCACHE_UNHASHED;
+ list_add(&res->d_alias, &inode->i_dentry);
+ hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
+ spin_unlock(&res->d_lock);
+
inode = NULL; /* don't drop reference */
}
spin_unlock(&dcache_lock);
@@ -878,7 +878,7 @@ struct dentry * d_alloc_anon(struct inod
* DCACHE_DISCONNECTED), then d_move that in place of the given dentry
* and return it, else simply d_add the inode to the dentry and return NULL.
*
- * This is (will be) needed in the lookup routine of any filesystem that is exportable
+ * This is needed in the lookup routine of any filesystem that is exportable
* (via knfsd) so that we can build dcache paths to directories effectively.
*
* If a dentry was found and moved, then it is returned. Otherwise NULL
@@ -889,11 +889,11 @@ struct dentry *d_splice_alias(struct ino
{
struct dentry *new = NULL;

- if (inode && S_ISDIR(inode->i_mode)) {
+ if (inode) {
spin_lock(&dcache_lock);
- if (!list_empty(&inode->i_dentry)) {
- new = list_entry(inode->i_dentry.next, struct dentry, d_alias);
- __dget_locked(new);
+ new = __d_find_alias(inode, 1);
+ if (new) {
+ BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
spin_unlock(&dcache_lock);
security_d_instantiate(new, inode);
d_rehash(dentry);

2004-04-30 07:54:31

by Greg Banks

[permalink] [raw]
Subject: Re: d_splice_alias() problem.

Neil Brown wrote:
>
> If I understand you correctly, the main problem is that a disconnected
> dentry can hold an inode active after the last link has been removed.
> The file will not then be truncated and removed until memory pressure
> flushes the disconnected dentry from the dcache.
>
> This problem can be resolved by making sure that an inode never has
> both a connected and a disconnected dentry.
>
> This is already the case for directories (as they must only have one
> dentry), but it is not the case for non-directories.
>
> The following patch tries to address this. It is a "technology
> preview" in that the only testing I have done is that it compiles OK.
>
> Please consider reviewing it to see if it makes sense.
>
> It:
> - changes d_alloc_anon to make sure that a new disconnected dentry is
> only allocated if there is currently no (hashed) dentry for the
> inode. (Previously this would noramlly be true, but a race was
> possible).
> - changes d_splice_alias to re-use a disconnected dentry on
> non-directories as well as directories.
> - splits most of d_find_alias out into a separate function to make
> the above easier.
>
> I haven't fully thought through issues with unhashed, connected
> dentries.
> __d_find_alias won't return them so d_alloc_anon will never return
> one, so it is possible to have an unhashed dentry and a disconnected
> dentry at the same time, which probably isn't desirable.
>
> Is it OK for d_alloc_anon to return an unaliased dentry, and then have
> it possibly spliced back into the dentry tree??? I'm not sure.
>
> Comments welcome.


I've been wrestling with a problem in this area for the last couple of
days. The symptoms are different but the ultimate cause appears to (also)
be a race condition somewhere under fh_verify() resulting in confused
dentry structures of some kind. Eventually this results in __dget_locked()
being called on a dentry which has a zero reference count but is hashed
and not on the unused list, which spuriously decrements dentry_stat.nr_unused.
When this happens enough times dentry_stat.nr_unused drops below zero
and kswapd starts spinning trying to prune a near-infinite number of
dentries.

I just tried your patch and the problem remains.

What I'm getting from my debug setup is:

<4>kernel BUG at fs/dcache.c:289!
<4>nfsd[4981]: bugcheck! 0 [1]


285 static inline struct dentry * __dget_locked(struct dentry *dentry)
286 {
287 atomic_inc(&dentry->d_count);
288 if (atomic_read(&dentry->d_count) == 1) {
289 BUG_ON(list_empty(&dentry->d_lru)); <-------------
290 dentry_stat.nr_unused--;
291 list_del_init(&dentry->d_lru);
292 BUG_ON(dentry_stat.nr_unused < 0);
293 }
294 return dentry;
295 }

[0]kdb> bt
Stack traceback for pid 4981
0xe00000300b1f0000 4981 1 1 0 R 0xe00000300b1f04f0 *nfsd
0xa0000001001a2250 __d_find_alias+0x250 <----- note: with your patch applied
args (0xa000000100936acc, 0xe000003009059ba8, 0xa0000001001a23f0,
0x206) kernel 0xa0000001001a2000 0xa0000001001a2380
0xa0000001001a23f0 d_find_alias+0x70
args (0xe00000b07aff67b0, 0xa0000001008ec900, 0xa0000001001a4120,
0x287) kernel 0xa0000001001a2380 0xa0000001001a2420
0xa0000001001a4120 d_alloc_anon+0x20
args (0xe00000b07aff67b0, 0xe00000300b1f7ad0, 0xe00000300b1f7ac0,
0xa0000001003c4ee0, 0x207)
kernel 0xa0000001001a4100 0xa0000001001a4440
0xa0000001003c4ee0 linvfs_get_dentry+0x120
args (0xe00000b07aff67b0, 0xe00000307b299f18, 0xa000000100292860,
0xd1d) kernel 0xa0000001003c4dc0 0xa0000001003c4f20
0xa000000100292860 find_exported_dentry+0xa0
args (0xe000003008a9fa00, 0xe00000307b299f18, 0xe00000300b1f7ce0,
0xa000000100861340, 0xe00000b07aa6f180)
kernel 0xa0000001002927c0 0xa000000100293920
0xa000000100294090 export_decode_fh+0xb0
args (0xe000003008a9fa00, 0xe00000307b299f24, 0x4, 0x2, 0xa000000100861340)
kernel 0xa000000100293fe0 0xa000000100294100
0xa0000001002993f0 fh_verify+0x910
args (0xe000003016c0d000, 0xe00000307b299f08, 0x11270000, 0x44,
0xe00000307b299f14)
kernel 0xa000000100298ae0 0xa000000100299960
0xa00000010029cb00 nfsd_open+0x40
args (0xe000003016c0d000, 0xe00000307b299f08, 0x8000, 0x4,
0xe00000300b1f7d00)
kernel 0xa00000010029cac0 0xa00000010029cea0
0xa00000010029d440 nfsd_read+0x40
args (0xe000003016c0d000, 0xe00000307b299f08, 0xe00000300b1f7d10,
0xe00000307b299c38, 0x2)
kernel 0xa00000010029d400 0xa00000010029dda0
0xa0000001002b02f0 nfsd3_proc_read+0x190
args (0xe00000307b299f90, 0xe00000307b299b00, 0xe00000307b299f00,
0xe00000307b29a030, 0xe00000307b299f08)
kernel 0xa0000001002b0160 0xa0000001002b0400
0xa000000100295120 nfsd_dispatch+0x280
args (0xe000003016c0d000, 0xe00000306b888014, 0xa000000100ce7520,
0xe000003016c0d490, 0xa00000010093e0d0)
kernel 0xa000000100294ea0 0xa000000100295320
0xa000000100721810 svc_process+0xff0
args (0xe00000b07aa6e928, 0xe000003016c0d000, 0xe000003016c0d240,
0xe000003016c0d068, 0xa000000100ce7520)
kernel 0xa000000100720820 0xa000000100721b60
0xa000000100294a60 nfsd+0x480
args (0xe000003016c0d000, 0xfffeba2f, 0xfffeba2f, 0xe00000b07aa6e900,
0xa000000100b008a0)
kernel 0xa0000001002945e0 0xa000000100294ea0
0xa00000010001ae60 kernel_thread_helper+0xe0
[...]

arg0 of the new d_find_alias() is the inode

[0]kdb> inode 0xe00000b07aff67b0
struct inode at 0xe00000b07aff67b0
i_ino = 137 i_count = 3 i_size 8589934592
i_mode = 0100777 i_nlink = 1 i_rdev = 0x0
i_hash.nxt = 0x0000000000000000 i_hash.pprev = 0xe00000307bcb6408
i_list.nxt = 0xe000003008a9fac8 i_list.prv = 0xe000003008a9fac8
i_dentry.nxt = 0xe000003009059b80 i_dentry.prv = 0xe000003009059b80
i_sb = 0xe000003008a9fa00 i_op = 0xa0000001009422b0 i_data = 0xe00000b07aff68a8
nrpages = 73612
i_fop= 0xa000000100941fe8 i_flock = 0x0000000000000000 i_mapping =
0xe00000b07aff68a8
i_flags 0x0 i_state 0x1 [I_DIRTY_SYNC] fs specific info @ 0xe00000b07aff6a10

Walk the dentry chain...only one entry

[0]kdb> dentry 0xe000003009059b80
Dentry at 0xe000003009059b80
d_name.len = 16 d_name.name = 0xe000003016ca7010 <read_load_test.0> <----- not
anonymous
d_count = 1 d_flags = 0x4 d_inode = 0xe00000b07aff67b0
^ ^^^ DCACHE_DISCONNECTED
count=0 before line 287
d_parent = 0xe00000b0065d5480
d_hash.nxt = 0x0000000000000000 d_hash.prv = 0xe00000307bbccbf0 <---- hashed
d_lru.nxt = 0xe000003009059ba0 d_lru.prv = 0xe000003009059ba0 <---- not on
unused list
d_child.nxt = 0xe00000b0065d54c0 d_child.prv = 0xe00000b0065d54c0
d_subdirs.nxt = 0xe000003009059bc0 d_subdirs.prv = 0xe000003009059bc0
d_alias.nxt = 0xe00000b07aff67d0 d_alias.prv = 0xe00000b07aff67d0
d_op = 0x0000000000000000 d_sb = 0xe000003008a9fa00


> /*
> * Try to kill dentries associated with this inode.
> * WARNING: you must own a reference to inode.
> @@ -835,28 +841,22 @@ struct dentry * d_alloc_anon(struct inod
> tmp->d_parent = tmp; /* make sure dput doesn't croak */
>
> spin_lock(&dcache_lock);
> - if (S_ISDIR(inode->i_mode) && !list_empty(&inode->i_dentry)) {
> - /* A directory can only have one dentry.
> - * This (now) has one, so use it.
> - */
> - res = list_entry(inode->i_dentry.next, struct dentry, d_alias);
> - __dget_locked(res);
> - } else {
> - /* attach a disconnected dentry */
> +
> + res = __d_find_alias(inode, 0);
> + if (!res) {
> res = tmp;

Yes this is an obvious (well, now) race condition. But not apparently the
whole story.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

2004-04-30 13:28:28

by Nikita Danilov

[permalink] [raw]
Subject: Re: d_splice_alias() problem.

Neil Brown writes:
> On Friday April 23, [email protected] wrote:
> > Hello,
> >
> > for some time I am observing that during stress tests over NFS
> >
> > shrink_slab->...->prune_dcache()->prune_one_dentry()->...->iput()
> >
> > is called on inode with ->i_nlink == 0 which results in truncate and
> > file deletion. This is wrong in general (file system is re-entered), and
> > deadlock prone on some file systems.
> >
> > After some debugging, I tracked problem down the to d_splice_alias()
> > failing to identify dentries when necessary.
> >
> > Suppose we have an inode with ->i_nlink == 1. It's accessed over NFS and
> > DCACHE_DISCONNECTED dentry D1 is created for it. Then, unlink request
> > comes for this file. nfsd looks name up in the parent directory
> > (nfsd_unlink()->lookup_one_len()). File system back-end uses
> > d_splice_alias(), but it only works for directories and we end up with
> > second (this time connected) dentry D2.
> >
> > D2 is successfully unlinked, file has ->i_nlink == 0, and ->i_count == 1
> > from D1, and when prune_dcache() hits D1 bad things happen.
> >
> > It's hard to imagine how new name can be identified with one among
> > multiple anonymous dentries, which is necessary for
> > NFSEXP_NOSUBTREECHECK export to work reliably.
> >
> > One possible work-around is to forcibly destroy all remaining
> > DCACHE_DISCONNECTED dentries when ->i_nlink drops to zero, but I am not
> > sure that this is possible and solves all problems of having more
> > dentries than there are nlinks.
> >
> > Nikita.
>
> If I understand you correctly, the main problem is that a disconnected
> dentry can hold an inode active after the last link has been removed.
> The file will not then be truncated and removed until memory pressure
> flushes the disconnected dentry from the dcache.
>
> This problem can be resolved by making sure that an inode never has
> both a connected and a disconnected dentry.
>
> This is already the case for directories (as they must only have one
> dentry), but it is not the case for non-directories.
>
> The following patch tries to address this. It is a "technology
> preview" in that the only testing I have done is that it compiles OK.

I have a test where such situation is reproducible and will give patch a
try.

Also, Al Viro pointed to me that it's not clear why DCACHE_DISCONNECTED
dentry is DCACHE_HASHED at all. If it were unhashed, last dput (done by
nfsd thread) would destroy it, truncating file if necessary.

>

Nikita.