2010-11-16 14:23:50

by Nick Piggin

[permalink] [raw]
Subject: [patch 01/28] fs: d_validate fixes

d_validate has been broken for a long time.

kmem_ptr_validate does not guarantee that a pointer can be dereferenced
if it can go away at any time. Even rcu_read_lock doesn't help, because
the pointer might be queued in RCU callbacks but not executed yet.

So the parent cannot be checked, nor the name hashed. The dentry pointer
can not be touched until it can be verified under lock. Hashing simply
cannot be used.

Instead, verify the parent/child relationship by traversing parent's
d_child list. It's slow, but only ncpfs and the destaged smbfs care
about it, at this point.

Signed-off-by: Nick Piggin <[email protected]>

---
fs/dcache.c | 25 +++++++------------------
1 file changed, 7 insertions(+), 18 deletions(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c 2010-11-17 00:11:48.000000000 +1100
+++ linux-2.6/fs/dcache.c 2010-11-17 01:05:52.000000000 +1100
@@ -1483,41 +1483,30 @@ struct dentry *d_hash_and_lookup(struct
}

/**
- * d_validate - verify dentry provided from insecure source
+ * d_validate - verify dentry provided from insecure source (deprecated)
* @dentry: The dentry alleged to be valid child of @dparent
* @dparent: The parent dentry (known to be valid)
*
* An insecure source has sent us a dentry, here we verify it and dget() it.
* This is used by ncpfs in its readdir implementation.
* Zero is returned in the dentry is invalid.
+ *
+ * This function is slow for big directories, and deprecated, do not use it.
*/
-
int d_validate(struct dentry *dentry, struct dentry *dparent)
{
- struct hlist_head *base;
- struct hlist_node *lhp;
-
- /* Check whether the ptr might be valid at all.. */
- if (!kmem_ptr_validate(dentry_cache, dentry))
- goto out;
-
- if (dentry->d_parent != dparent)
- goto out;
+ struct dentry *child;

spin_lock(&dcache_lock);
- base = d_hash(dparent, dentry->d_name.hash);
- hlist_for_each(lhp,base) {
- /* hlist_for_each_entry_rcu() not required for d_hash list
- * as it is parsed under dcache_lock
- */
- if (dentry == hlist_entry(lhp, struct dentry, d_hash)) {
+ list_for_each_entry(child, &dparent->d_subdirs, d_u.d_child) {
+ if (dentry == child) {
__dget_locked(dentry);
spin_unlock(&dcache_lock);
return 1;
}
}
spin_unlock(&dcache_lock);
-out:
+
return 0;
}
EXPORT_SYMBOL(d_validate);


2010-11-17 10:44:19

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 01/28] fs: d_validate fixes

Nick Piggin <[email protected]> writes:

> d_validate has been broken for a long time.
>
> kmem_ptr_validate does not guarantee that a pointer can be dereferenced
> if it can go away at any time. Even rcu_read_lock doesn't help, because
> the pointer might be queued in RCU callbacks but not executed yet.

I wonder if that is a problem for NFS ... (which I believe is the only
user). Could these races be used to break the NFS server?

-Andi
--
[email protected] -- Speaking for myself only.

2010-11-18 20:50:59

by David Miller

[permalink] [raw]
Subject: Re: [patch 01/28] fs: d_validate fixes

From: Nick Piggin <[email protected]>
Date: Wed, 17 Nov 2010 01:09:01 +1100

> d_validate has been broken for a long time.
>
> kmem_ptr_validate does not guarantee that a pointer can be dereferenced
> if it can go away at any time. Even rcu_read_lock doesn't help, because
> the pointer might be queued in RCU callbacks but not executed yet.
>
> So the parent cannot be checked, nor the name hashed. The dentry pointer
> can not be touched until it can be verified under lock. Hashing simply
> cannot be used.
>
> Instead, verify the parent/child relationship by traversing parent's
> d_child list. It's slow, but only ncpfs and the destaged smbfs care
> about it, at this point.
>
> Signed-off-by: Nick Piggin <[email protected]>

This won't apply because is conflicts with Christoph Hellwig's
RCU conversion of d_validate().

Which is a change that went in more than a month ago.

Thus I'd really appreciate if you mentioned what tree your patches are
against in your "0/N" posting, always.

Thanks.

2010-11-18 20:58:49

by David Miller

[permalink] [raw]
Subject: Re: [patch 01/28] fs: d_validate fixes

From: David Miller <[email protected]>
Date: Thu, 18 Nov 2010 12:51:23 -0800 (PST)

> From: Nick Piggin <[email protected]>
> Date: Wed, 17 Nov 2010 01:09:01 +1100
>
>> d_validate has been broken for a long time.
>>
>> kmem_ptr_validate does not guarantee that a pointer can be dereferenced
>> if it can go away at any time. Even rcu_read_lock doesn't help, because
>> the pointer might be queued in RCU callbacks but not executed yet.
>>
>> So the parent cannot be checked, nor the name hashed. The dentry pointer
>> can not be touched until it can be verified under lock. Hashing simply
>> cannot be used.
>>
>> Instead, verify the parent/child relationship by traversing parent's
>> d_child list. It's slow, but only ncpfs and the destaged smbfs care
>> about it, at this point.
>>
>> Signed-off-by: Nick Piggin <[email protected]>
>
> This won't apply because is conflicts with Christoph Hellwig's
> RCU conversion of d_validate().
>
> Which is a change that went in more than a month ago.

In fact the conflicts of your patch set are even more pervasive, since
all dcache hash traversals are essentially RCU protected instead of
dcache_lock protected right now.

This makes it very difficult to test or analyze your patches for those
of us on mainline or similar.

2010-11-19 05:01:13

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 01/28] fs: d_validate fixes

On Thu, Nov 18, 2010 at 12:51:23PM -0800, David Miller wrote:
> From: Nick Piggin <[email protected]>
> Date: Wed, 17 Nov 2010 01:09:01 +1100
>
> > d_validate has been broken for a long time.
> >
> > kmem_ptr_validate does not guarantee that a pointer can be dereferenced
> > if it can go away at any time. Even rcu_read_lock doesn't help, because
> > the pointer might be queued in RCU callbacks but not executed yet.
> >
> > So the parent cannot be checked, nor the name hashed. The dentry pointer
> > can not be touched until it can be verified under lock. Hashing simply
> > cannot be used.
> >
> > Instead, verify the parent/child relationship by traversing parent's
> > d_child list. It's slow, but only ncpfs and the destaged smbfs care
> > about it, at this point.
> >
> > Signed-off-by: Nick Piggin <[email protected]>
>
> This won't apply because is conflicts with Christoph Hellwig's
> RCU conversion of d_validate().
>
> Which is a change that went in more than a month ago.

Sorry yeah I had a local revert that I didn't send out, otherwise
it should be a vanilla upstream kernel.


> Thus I'd really appreciate if you mentioned what tree your patches are
> against in your "0/N" posting, always.

Try to remember. I'll put some time into polishing up a git tree
and submitting the more interesting patches as well, this weekend.

Thanks,
Nick

2010-11-19 05:05:15

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 01/28] fs: d_validate fixes

On Thu, Nov 18, 2010 at 12:59:13PM -0800, David Miller wrote:
> From: David Miller <[email protected]>
> Date: Thu, 18 Nov 2010 12:51:23 -0800 (PST)
>
> > From: Nick Piggin <[email protected]>
> > Date: Wed, 17 Nov 2010 01:09:01 +1100
> >
> >> d_validate has been broken for a long time.
> >>
> >> kmem_ptr_validate does not guarantee that a pointer can be dereferenced
> >> if it can go away at any time. Even rcu_read_lock doesn't help, because
> >> the pointer might be queued in RCU callbacks but not executed yet.
> >>
> >> So the parent cannot be checked, nor the name hashed. The dentry pointer
> >> can not be touched until it can be verified under lock. Hashing simply
> >> cannot be used.
> >>
> >> Instead, verify the parent/child relationship by traversing parent's
> >> d_child list. It's slow, but only ncpfs and the destaged smbfs care
> >> about it, at this point.
> >>
> >> Signed-off-by: Nick Piggin <[email protected]>
> >
> > This won't apply because is conflicts with Christoph Hellwig's
> > RCU conversion of d_validate().
> >
> > Which is a change that went in more than a month ago.
>
> In fact the conflicts of your patch set are even more pervasive, since
> all dcache hash traversals are essentially RCU protected instead of
> dcache_lock protected right now.

Not sure what you mean there. The patches are against upstream+revert of
the last d_validate patch.

dcache_lock splitup of this series is to split the lock out of all the
other paths, and importantly allow d_lock to protect the complete
dcache state of the dentry.

Next 2 steps (that depend on this series but not on each other) are
fine grained locking of the split locks, and rcu-walk. rcu-walk is what
I called store-free path walking, because we extend RCU not only to the
hash lookup but the entire path walk.

I'll get all that out when I get a bit of time to work on it again.

Thanks,
Nick