2003-06-11 23:20:50

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH] First casuality of hlist poisoning in 2.5.70


Hi,

This patch removes the Oops that occurs when either the source or
the target of a d_move() operation is unhashed. It is currently
triggered by the NFS sillyrename code.

Cheers,
Trond

--- linux-2.5.70-up/fs/dcache.c.orig 2003-06-07 10:17:01.000000000 -0700
+++ linux-2.5.70-up/fs/dcache.c 2003-06-11 16:11:56.000000000 -0700
@@ -1223,8 +1223,13 @@
/* Move the dentry to the target hash queue, if on different bucket */
if (dentry->d_bucket != target->d_bucket) {
dentry->d_bucket = target->d_bucket;
- hlist_del_rcu(&dentry->d_hash);
- hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
+ if (!hlist_unhashed(&dentry->d_hash))
+ hlist_del_rcu(&dentry->d_hash);
+ if (!hlist_unhashed(&target->d_hash)) {
+ hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
+ dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
+ } else
+ dentry->d_vfs_flags |= DCACHE_UNHASHED;
}

/* Unhash the target: dput() will then get rid of it */


2003-06-11 23:36:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] First casuality of hlist poisoning in 2.5.70


On Wed, 11 Jun 2003, Trond Myklebust wrote:
>
> This patch removes the Oops that occurs when either the source or
> the target of a d_move() operation is unhashed. It is currently
> triggered by the NFS sillyrename code.

Cool. The thing found something!

However, I'm still a bit confused:

> - hlist_del_rcu(&dentry->d_hash);
> - hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
> + if (!hlist_unhashed(&dentry->d_hash))
> + hlist_del_rcu(&dentry->d_hash);
> + if (!hlist_unhashed(&target->d_hash)) {
> + hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
> + dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
> + } else
> + dentry->d_vfs_flags |= DCACHE_UNHASHED;

Can source or target really be validly unhashed? That makes no sense,
since we just looked it up, and we've held the directory semaphores over
the whole thing.

In other words, I worry that the real bug is something else, and your
patch makes it not oops, but hides the real problem.

I'm sure you're right, but can you tell me what the sequence of events is
that validly leads to this?

Linus

2003-06-11 23:51:02

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] First casuality of hlist poisoning in 2.5.70

>>>>> " " == Linus Torvalds <[email protected]> writes:

>> - hlist_del_rcu(&dentry->d_hash);
>> - hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
>> + if (!hlist_unhashed(&dentry->d_hash))
>> + hlist_del_rcu(&dentry->d_hash);
>> + if (!hlist_unhashed(&target->d_hash)) {
>> + hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
>> + dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
>> + } else
>> + dentry->d_vfs_flags |= DCACHE_UNHASHED;

> Can source or target really be validly unhashed? That makes no
> sense, since we just looked it up, and we've held the directory
> semaphores over the whole thing.

When renaming, we may want to unhash the dentry in order to stop
d_lookup()s from succeeding (Recall that cached_lookup() does not
attempt to take the directory semaphore - only real_lookup() does
that).

AFAICS one should not rehash the dentry until after the d_move(). Does
that make sense?

Cheers,
Trond

2003-06-12 00:19:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] First casuality of hlist poisoning in 2.5.70


On Wed, 11 Jun 2003, Trond Myklebust wrote:
>
> AFAICS one should not rehash the dentry until after the d_move(). Does
> that make sense?

Yeah, it does seem that rehashing before actually calling d_move() means
that there is a small window where another process might now come in, and
use the dcache (without getting the semaphore) to see the old value of the
target dentry, even though the low-level filesystem has already move the
new dentry value over the target. Ie the window would be between

i_op->rename(...)
d_rehash(new_dentry)
... race here ...
d_move(old_dentry, new_dentry)

That might confuse a filesystem that expected that the target was deleted
an no longer reachable by anybody.

Al? What do you think?

Linus

2003-06-12 05:09:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] First casuality of hlist poisoning in 2.5.70


On Wed, 11 Jun 2003, Trond Myklebust wrote:
>
> This patch removes the Oops that occurs when either the source or
> the target of a d_move() operation is unhashed. It is currently
> triggered by the NFS sillyrename code.

Btw, thinking more about this, the patch can't be right.

> + if (!hlist_unhashed(&dentry->d_hash))
> + hlist_del_rcu(&dentry->d_hash);

This leaves the d_hash pointers as crapola.

> + if (!hlist_unhashed(&target->d_hash)) {
> + hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
> + dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
> + } else
> + dentry->d_vfs_flags |= DCACHE_UNHASHED;

And this will fix it only if target is hashed.

So either the patch depends on the target being hashed (and if so, it just
shouldn't have the test, or have a BUG_ON() instead), or the patch is
buggy.

I have this suspicion that the logical patch would be something like this:

if (hlist_unhashed(&target->d_hash)) {
/* Unhash the dentry too */
__d_drop(dentry);
} else {
/* Rehash the dentry onto the same hash as the target */
hlist_del_rcu(&dentry->d_hash);
hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
}

At least this would make slightly more sense than the patch.

HOWEVER, I actually suspect that the target really _cannot_ be unhashed,
and that the test makes no sense, and the sequence should just be

/* Rehash the dentry onto the same hash as the target */
hlist_del_rcu(&dentry->d_hash);
hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
dentry->d_vfs_flags &= ~DCACHE_UNHASHED;

this is actually identical to what we have right now, except for one small
detail: the current 2.5.x tree does not clear the DCACHE_UNHASHED bit in
d_move().

Also, if a unhashed dentry really is ok, then I think the test for

/* Move the dentry to the target hash queue, if on different bucket */
if (dentry->d_bucket != target->d_bucket) {
...

should really be something like

if (!d_hashed(dentry) || dentry->d_bucket != target->d_bucket) {
...

because otherwise the dentry may well share the same bucket as the target,
but if it wasn't hashed before, it won't be hashed after either.

But I suspect that neither dentry nor target should really ever be
unhashed by the time we call d_move(). That's reinforced by the fact that
it looks like a unhashed dentry in d_move() would have been a silent bug
previously - staying unhashed if it just shared the bucket.

Al, I'll be really happy having you go over this code too. And whatever we
decide is right (enforcing hashedness or whatever), we should assert it,
because clearly d_move() has been a bit too subtle for us so far.

Linus

2003-06-12 05:50:45

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] First casuality of hlist poisoning in 2.5.70

>>>>> " " == Linus Torvalds <[email protected]> writes:

> But I suspect that neither dentry nor target should really ever
> be unhashed by the time we call d_move(). That's reinforced by
> the fact that it looks like a unhashed dentry in d_move() would
> have been a silent bug previously - staying unhashed if it just
> shared the bucket.

It is not a bug.

Looking more carefully at the Oops, it seems that this problem is
occurring inside an nfs_rename() in which the target name belongs to
an open file - and thus needs to be sillyrenamed first.

In that case, we certainly do not want to rehash the dentry in order
to do the d_move() since that would give rise to a race: we want to do
a real rename into the same dentry after we're done with the
sillyrename.

I can agree that the patch was flawed, but I still believe that we do
need to allow d_move to work with unhashed dentries.

Cheers,
Trond

2003-06-12 07:17:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] First casuality of hlist poisoning in 2.5.70


Note: As far as NFS is concerned, d_move() should always assume that
the dentry is unhashed. Even if we ensure that we rehash, someone else
could in theory trigger a call to d_revalidate() that causes the
dentry to be dropped before we get to d_move().

Cheers,
Trond

2003-06-12 16:12:44

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] First casuality of hlist poisoning in 2.5.70

On Wed, Jun 11, 2003 at 10:22:25PM -0700, Linus Torvalds wrote:

> HOWEVER, I actually suspect that the target really _cannot_ be unhashed,
> and that the test makes no sense, and the sequence should just be
>
> /* Rehash the dentry onto the same hash as the target */
> hlist_del_rcu(&dentry->d_hash);
> hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
> dentry->d_vfs_flags &= ~DCACHE_UNHASHED;

> But I suspect that neither dentry nor target should really ever be
> unhashed by the time we call d_move(). That's reinforced by the fact that
> it looks like a unhashed dentry in d_move() would have been a silent bug
> previously - staying unhashed if it just shared the bucket.

> Al, I'll be really happy having you go over this code too. And whatever we
> decide is right (enforcing hashedness or whatever), we should assert it,
> because clearly d_move() has been a bit too subtle for us so far.

Sigh... The real problem is not in d_move(), but in the way NFS drops
dentries. That, and the fact that we are eating the consequences of
RCU use in dcache - it had predictably made the entire thing _far_ too
subtle.

We probably should accept that both d_move() arguments can be unhashed.
After the move hashed status of source should remain as it is and
victim^Wtarget should get unhashed.

We _do_ need to sort out the situation with unhashing stuff in NFS - in
particular, the way it deals with mountpoints and with directories is
a mess. I'm looking through that code, but it's bloody slow analysis
due to RCU. Premature optimizations and all such...

2003-06-19 23:48:31

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] First casuality of hlist poisoning in 2.5.70

On Thu, Jun 12, 2003 at 05:26:27PM +0100, [email protected] wrote:
> We _do_ need to sort out the situation with unhashing stuff in NFS - in
> particular, the way it deals with mountpoints and with directories is
> a mess. I'm looking through that code, but it's bloody slow analysis
> due to RCU. Premature optimizations and all such...

No arguments on NFS, and I have to agree that RCU has been used
primarily as an optimization thus far in Linux. However, RCU
can and should be considered quite early on. The situation is
similar to reader-writer locking -- you are better off working
out which of your locks are going to be reader-writer locks
earlier rather than later. If you use either reader-writer
locking or RCU as a late optimization, you will find that your
earlier design decisions make your life more complex than necessary.

On the other hand, if RCU is considered early, it can make life
simpler. Breaking up locks often introduces deadlocks. Some
of these deadlocks can be avoided entirely by thinking through
the use of RCU at design time. An extreme example of deadlock
avoidance may be found in IPMI (drivers/char/ipmi/ipmi_kcs_intf.c,
the synchronize_kernel() near line 1174). Really tough to do
this as a late optimization...

Thanx, Paul