2010-06-24 03:16:41

by Nick Piggin

[permalink] [raw]
Subject: [patch 24/52] fs: dcache reduce d_parent locking

Use RCU property of dcache to simplify locking in some places where we
take d_parent and d_lock.

Comment: don't need rcu_deref because we take the spinlock and recheck it.

Signed-off-by: Nick Piggin <[email protected]>
--

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -311,23 +311,18 @@ struct dentry *dget_parent(struct dentry
struct dentry *ret;

repeat:
- spin_lock(&dentry->d_lock);
+ rcu_read_lock();
ret = dentry->d_parent;
- if (!ret)
- goto out;
- if (dentry == ret) {
- ret->d_count++;
- goto out;
- }
- if (!spin_trylock(&ret->d_lock)) {
- spin_unlock(&dentry->d_lock);
+ spin_lock(&ret->d_lock);
+ if (unlikely(ret != dentry->d_parent)) {
+ spin_unlock(&ret->d_lock);
+ rcu_read_unlock();
goto repeat;
}
+ rcu_read_unlock();
BUG_ON(!ret->d_count);
ret->d_count++;
spin_unlock(&ret->d_lock);
-out:
- spin_unlock(&dentry->d_lock);
return ret;
}
EXPORT_SYMBOL(dget_parent);
@@ -601,14 +596,22 @@ static void prune_one_dentry(struct dent
if (inode)
spin_lock(&inode->i_lock);
again:
- spin_lock(&dentry->d_lock);
- if (dentry->d_parent && dentry != dentry->d_parent) {
- if (!spin_trylock(&dentry->d_parent->d_lock)) {
- spin_unlock(&dentry->d_lock);
+ rcu_read_lock();
+ parent = dentry->d_parent;
+ if (parent) {
+ spin_lock(&parent->d_lock);
+ if (unlikely(parent != dentry->d_parent)) {
+ spin_unlock(&parent->d_lock);
+ rcu_read_unlock();
goto again;
}
- parent = dentry->d_parent;
- }
+ if (parent != dentry)
+ spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+ else
+ parent = NULL;
+ } else
+ spin_lock(&dentry->d_lock);
+ rcu_read_unlock();
dentry->d_count--;
if (dentry->d_count) {
if (parent)


2010-06-24 08:44:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch 24/52] fs: dcache reduce d_parent locking

On Thu, 2010-06-24 at 13:02 +1000, [email protected] wrote:
> Use RCU property of dcache to simplify locking in some places where we
> take d_parent and d_lock.
>
> Comment: don't need rcu_deref because we take the spinlock and recheck it.

But does the LOCK barrier imply a DATA DEPENDENCY barrier? (It does on
x86, and the compiler barrier implied by spin_lock() suffices to replace
ACCESS_ONCE()).

2010-06-24 15:07:38

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 24/52] fs: dcache reduce d_parent locking

On Thu, Jun 24, 2010 at 10:44:22AM +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-24 at 13:02 +1000, [email protected] wrote:
> > Use RCU property of dcache to simplify locking in some places where we
> > take d_parent and d_lock.
> >
> > Comment: don't need rcu_deref because we take the spinlock and recheck it.
>
> But does the LOCK barrier imply a DATA DEPENDENCY barrier? (It does on
> x86, and the compiler barrier implied by spin_lock() suffices to replace
> ACCESS_ONCE()).

Well the dependency we care about is from loading the parent pointer
to acquiring its spinlock. But we can't possibly have stale data given
to the spin lock operation itself because it is a RMW.

2010-06-24 15:32:28

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [patch 24/52] fs: dcache reduce d_parent locking

On Fri, Jun 25, 2010 at 01:07:06AM +1000, Nick Piggin wrote:
> On Thu, Jun 24, 2010 at 10:44:22AM +0200, Peter Zijlstra wrote:
> > On Thu, 2010-06-24 at 13:02 +1000, [email protected] wrote:
> > > Use RCU property of dcache to simplify locking in some places where we
> > > take d_parent and d_lock.
> > >
> > > Comment: don't need rcu_deref because we take the spinlock and recheck it.
> >
> > But does the LOCK barrier imply a DATA DEPENDENCY barrier? (It does on
> > x86, and the compiler barrier implied by spin_lock() suffices to replace
> > ACCESS_ONCE()).
>
> Well the dependency we care about is from loading the parent pointer
> to acquiring its spinlock. But we can't possibly have stale data given
> to the spin lock operation itself because it is a RMW.

As long as you check for the structure being valid after acquiring the
lock, I agree. Otherwise, I would be concerned about the following
sequence of events:

1. CPU 0 picks up a pointer to a given data element.

2. CPU 1 removes this element from the list, drops any locks that
it might have, and starts waiting for a grace period to
elapse.

3. CPU 0 acquires the lock, does some operation that would
be appropriate had the element not been removed, then
releases the lock.

4. After the grace period, CPU 1 frees the element, negating
CPU 0's hard work.

The usual approach is to have a "deleted" flag or some such in the
element that CPU 0 would set when removing the element and that CPU 1
would check after acquiring the lock. Which you might well already
be doing! ;-)

Thanx, Paul

2010-06-24 16:05:29

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 24/52] fs: dcache reduce d_parent locking

On Thu, Jun 24, 2010 at 08:32:18AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 25, 2010 at 01:07:06AM +1000, Nick Piggin wrote:
> > On Thu, Jun 24, 2010 at 10:44:22AM +0200, Peter Zijlstra wrote:
> > > On Thu, 2010-06-24 at 13:02 +1000, [email protected] wrote:
> > > > Use RCU property of dcache to simplify locking in some places where we
> > > > take d_parent and d_lock.
> > > >
> > > > Comment: don't need rcu_deref because we take the spinlock and recheck it.
> > >
> > > But does the LOCK barrier imply a DATA DEPENDENCY barrier? (It does on
> > > x86, and the compiler barrier implied by spin_lock() suffices to replace
> > > ACCESS_ONCE()).
> >
> > Well the dependency we care about is from loading the parent pointer
> > to acquiring its spinlock. But we can't possibly have stale data given
> > to the spin lock operation itself because it is a RMW.
>
> As long as you check for the structure being valid after acquiring the
> lock, I agree. Otherwise, I would be concerned about the following
> sequence of events:
>
> 1. CPU 0 picks up a pointer to a given data element.
>
> 2. CPU 1 removes this element from the list, drops any locks that
> it might have, and starts waiting for a grace period to
> elapse.
>
> 3. CPU 0 acquires the lock, does some operation that would
> be appropriate had the element not been removed, then
> releases the lock.
>
> 4. After the grace period, CPU 1 frees the element, negating
> CPU 0's hard work.
>
> The usual approach is to have a "deleted" flag or some such in the
> element that CPU 0 would set when removing the element and that CPU 1
> would check after acquiring the lock. Which you might well already
> be doing! ;-)

Thanks, yep it's done under RCU, and after taking the lock it rechecks
to see that it is still reachable by the same pointer (and if not,
unlocks and retries) so it should be fine.

Thanks,
Nick

2010-06-24 16:41:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [patch 24/52] fs: dcache reduce d_parent locking

On Fri, Jun 25, 2010 at 02:05:24AM +1000, Nick Piggin wrote:
> On Thu, Jun 24, 2010 at 08:32:18AM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 25, 2010 at 01:07:06AM +1000, Nick Piggin wrote:
> > > On Thu, Jun 24, 2010 at 10:44:22AM +0200, Peter Zijlstra wrote:
> > > > On Thu, 2010-06-24 at 13:02 +1000, [email protected] wrote:
> > > > > Use RCU property of dcache to simplify locking in some places where we
> > > > > take d_parent and d_lock.
> > > > >
> > > > > Comment: don't need rcu_deref because we take the spinlock and recheck it.
> > > >
> > > > But does the LOCK barrier imply a DATA DEPENDENCY barrier? (It does on
> > > > x86, and the compiler barrier implied by spin_lock() suffices to replace
> > > > ACCESS_ONCE()).
> > >
> > > Well the dependency we care about is from loading the parent pointer
> > > to acquiring its spinlock. But we can't possibly have stale data given
> > > to the spin lock operation itself because it is a RMW.
> >
> > As long as you check for the structure being valid after acquiring the
> > lock, I agree. Otherwise, I would be concerned about the following
> > sequence of events:
> >
> > 1. CPU 0 picks up a pointer to a given data element.
> >
> > 2. CPU 1 removes this element from the list, drops any locks that
> > it might have, and starts waiting for a grace period to
> > elapse.
> >
> > 3. CPU 0 acquires the lock, does some operation that would
> > be appropriate had the element not been removed, then
> > releases the lock.
> >
> > 4. After the grace period, CPU 1 frees the element, negating
> > CPU 0's hard work.
> >
> > The usual approach is to have a "deleted" flag or some such in the
> > element that CPU 0 would set when removing the element and that CPU 1
> > would check after acquiring the lock. Which you might well already
> > be doing! ;-)
>
> Thanks, yep it's done under RCU, and after taking the lock it rechecks
> to see that it is still reachable by the same pointer (and if not,
> unlocks and retries) so it should be fine.

Very good!!! ;-)

Thanx, Paul

2010-06-28 21:50:17

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [patch 24/52] fs: dcache reduce d_parent locking

On Thu, Jun 24, 2010 at 01:02:36PM +1000, [email protected] wrote:
> Use RCU property of dcache to simplify locking in some places where we
> take d_parent and d_lock.
>
> Comment: don't need rcu_deref because we take the spinlock and recheck it.

Looks good other than one question below.

Thanx, Paul

> Signed-off-by: Nick Piggin <[email protected]>
> --
>
> Index: linux-2.6/fs/dcache.c
> ===================================================================
> --- linux-2.6.orig/fs/dcache.c
> +++ linux-2.6/fs/dcache.c
> @@ -311,23 +311,18 @@ struct dentry *dget_parent(struct dentry
> struct dentry *ret;
>
> repeat:
> - spin_lock(&dentry->d_lock);
> + rcu_read_lock();
> ret = dentry->d_parent;

Doesn't this need to be as follows?

ret = rcu_dereference(dentry)->d_parent;

Otherwise, couldn't we end up seeing pre-initialization value for
->d_parent for a newly inserted dentry?

> - if (!ret)
> - goto out;
> - if (dentry == ret) {
> - ret->d_count++;
> - goto out;
> - }
> - if (!spin_trylock(&ret->d_lock)) {
> - spin_unlock(&dentry->d_lock);
> + spin_lock(&ret->d_lock);

Once we do this, however, we are golden, at least for all dentry
fields protected by ->lock. This does assume that the compiler does not
speculate the fetch that initialized the argument dentry into the critical
section, which I would sure hope would be a reasonable assumption.

> + if (unlikely(ret != dentry->d_parent)) {
> + spin_unlock(&ret->d_lock);
> + rcu_read_unlock();
> goto repeat;
> }
> + rcu_read_unlock();
> BUG_ON(!ret->d_count);
> ret->d_count++;
> spin_unlock(&ret->d_lock);
> -out:
> - spin_unlock(&dentry->d_lock);
> return ret;
> }
> EXPORT_SYMBOL(dget_parent);
> @@ -601,14 +596,22 @@ static void prune_one_dentry(struct dent
> if (inode)
> spin_lock(&inode->i_lock);
> again:
> - spin_lock(&dentry->d_lock);
> - if (dentry->d_parent && dentry != dentry->d_parent) {
> - if (!spin_trylock(&dentry->d_parent->d_lock)) {
> - spin_unlock(&dentry->d_lock);
> + rcu_read_lock();
> + parent = dentry->d_parent;
> + if (parent) {
> + spin_lock(&parent->d_lock);
> + if (unlikely(parent != dentry->d_parent)) {
> + spin_unlock(&parent->d_lock);
> + rcu_read_unlock();
> goto again;
> }
> - parent = dentry->d_parent;
> - }
> + if (parent != dentry)
> + spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> + else
> + parent = NULL;
> + } else
> + spin_lock(&dentry->d_lock);
> + rcu_read_unlock();
> dentry->d_count--;
> if (dentry->d_count) {
> if (parent)
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2010-07-07 14:36:13

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 24/52] fs: dcache reduce d_parent locking

Hi Paul,

Sorry I had left this in my postponed folder while rechecking your
questions and forgot about it :P


On Mon, Jun 28, 2010 at 02:50:13PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 24, 2010 at 01:02:36PM +1000, [email protected] wrote:
> > Use RCU property of dcache to simplify locking in some places where we
> > take d_parent and d_lock.
> >
> > Comment: don't need rcu_deref because we take the spinlock and recheck it.
>
> Looks good other than one question below.
>
> Thanx, Paul
>
> > Signed-off-by: Nick Piggin <[email protected]>
> > --
> >
> > Index: linux-2.6/fs/dcache.c
> > ===================================================================
> > --- linux-2.6.orig/fs/dcache.c
> > +++ linux-2.6/fs/dcache.c
> > @@ -311,23 +311,18 @@ struct dentry *dget_parent(struct dentry
> > struct dentry *ret;
> >
> > repeat:
> > - spin_lock(&dentry->d_lock);
> > + rcu_read_lock();
> > ret = dentry->d_parent;
>
> Doesn't this need to be as follows?
>
> ret = rcu_dereference(dentry)->d_parent;
>
> Otherwise, couldn't we end up seeing pre-initialization value for
> ->d_parent for a newly inserted dentry?

I don't think so. The child's dentry memory should be guaranteed to be
post-initialized at the point it is passed to dget_parent, becase we've
to have a stable refcount on it at that point. Ie. if it was pulled from
an RCU list, it should already have been rcu dereferenced by now.

So ->d_parent should be a valid pointer with lifetime guarantee provided
by RCU -- so enough to take the spinlock and recheck.

>
> > - if (!ret)
> > - goto out;
> > - if (dentry == ret) {
> > - ret->d_count++;
> > - goto out;
> > - }
> > - if (!spin_trylock(&ret->d_lock)) {
> > - spin_unlock(&dentry->d_lock);
> > + spin_lock(&ret->d_lock);
>
> Once we do this, however, we are golden, at least for all dentry
> fields protected by ->lock. This does assume that the compiler does not
> speculate the fetch that initialized the argument dentry into the critical
> section, which I would sure hope would be a reasonable assumption.

Yes I think the above fact that we have a "good" ref on the dentry
should prevent this.

Thanks,
Nick