2005-11-28 13:38:07

by Andi Kleen

[permalink] [raw]
Subject: [PATCH] Allow lockless traversal of notifier lists


As discussed in other thread.

Just needed an additional write barrier, so that a parallel
running lockup can never see inconsistent state. As long as there
is no unregistration or the unregistration is done using
locking or RCU in the caller they should be ok now.

This only makes a difference on non i386/x86-64 architectures.
x86 was already ok because it never reorders writes.

That's the first step for fixing all the callers. Some that
already don't unregister or do it safely are already ok.

Also fixed up the kerneldoc description to document the various
locking restriction.

Signed-off-by: Andi Kleen <[email protected]>

diff -u linux-2.6.15rc2-work/kernel/sys.c-o linux-2.6.15rc2-work/kernel/sys.c
--- linux-2.6.15rc2-work/kernel/sys.c-o 2005-11-16 00:34:33.000000000 +0100
+++ linux-2.6.15rc2-work/kernel/sys.c 2005-11-28 14:33:20.000000000 +0100
@@ -102,6 +102,9 @@
* @n: New entry in notifier chain
*
* Adds a notifier to a notifier chain.
+ * As long as unregister is not used this is safe against parallel
+ * lockless notifier lookups. If unregister is used then unregister
+ * needs to do additional locking or use RCU.
*
* Currently always returns zero.
*/
@@ -116,6 +119,7 @@
list= &((*list)->next);
}
n->next = *list;
+ wmb();
*list=n;
write_unlock(&notifier_lock);
return 0;
@@ -129,6 +133,8 @@
* @n: New entry in notifier chain
*
* Removes a notifier from a notifier chain.
+ * Note this needs additional locking or RCU in the caller to be safe
+ * against parallel traversals.
*
* Returns zero on success, or %-ENOENT on failure.
*/


2005-11-28 15:59:51

by Dipankar Sarma

[permalink] [raw]
Subject: Re: [PATCH] Allow lockless traversal of notifier lists

On Mon, Nov 28, 2005 at 02:37:57PM +0100, Andi Kleen wrote:
>
> As discussed in other thread.
>
> Just needed an additional write barrier, so that a parallel
> running lockup can never see inconsistent state. As long as there
> is no unregistration or the unregistration is done using
> locking or RCU in the caller they should be ok now.
>
> This only makes a difference on non i386/x86-64 architectures.
> x86 was already ok because it never reorders writes.
>
> *
> * Currently always returns zero.
> */
> @@ -116,6 +119,7 @@
> list= &((*list)->next);
> }
> n->next = *list;
> + wmb();
> *list=n;
> write_unlock(&notifier_lock);

Shouldn't this be smp_wmb() ?

Also, not all archs have strong ordering for data dependent reads.
So, you would probably need an smp_read_barrier_depends() between
the load of the pointer and actual dereferencing.

Thanks
Dipankar

2005-11-28 16:05:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Allow lockless traversal of notifier lists

On Mon, Nov 28, 2005 at 09:31:29PM +0530, Dipankar Sarma wrote:
> On Mon, Nov 28, 2005 at 02:37:57PM +0100, Andi Kleen wrote:
> >
> > *
> > * Currently always returns zero.
> > */
> > @@ -116,6 +119,7 @@
> > list= &((*list)->next);
> > }
> > n->next = *list;
> > + wmb();
> > *list=n;
> > write_unlock(&notifier_lock);
>
> Shouldn't this be smp_wmb() ?

Yes.

>
> Also, not all archs have strong ordering for data dependent reads.

Ah you mean the Alpha exception? Right Alpha would need that.

Thanks for the review.

-Andi

Revised patch with Alphaextrawurst.

----
As discussed in other thread.

Just needed an additional write barrier, so that a parallel
running lockup can never see inconsistent state. As long as there
is no unregistration or the unregistration is done using
locking or RCU in the caller they should be ok now.

This only makes a difference on non i386/x86-64 architectures.
x86 was already ok because it never reorders writes.


diff -u linux-2.6.15rc2-work/kernel/sys.c-o linux-2.6.15rc2-work/kernel/sys.c
--- linux-2.6.15rc2-work/kernel/sys.c-o 2005-11-16 00:34:33.000000000 +0100
+++ linux-2.6.15rc2-work/kernel/sys.c 2005-11-28 17:03:22.000000000 +0100
@@ -102,6 +102,9 @@
* @n: New entry in notifier chain
*
* Adds a notifier to a notifier chain.
+ * As long as unregister is not used this is safe against parallel
+ * lockless notifier lookups. If unregister is used then unregister
+ * needs to do additional locking or use RCU.
*
* Currently always returns zero.
*/
@@ -116,6 +119,7 @@
list= &((*list)->next);
}
n->next = *list;
+ smp_wmb();
*list=n;
write_unlock(&notifier_lock);
return 0;
@@ -129,6 +133,8 @@
* @n: New entry in notifier chain
*
* Removes a notifier from a notifier chain.
+ * Note this needs additional locking or RCU in the caller to be safe
+ * against parallel traversals.
*
* Returns zero on success, or %-ENOENT on failure.
*/
@@ -175,6 +181,7 @@

while(nb)
{
+ smp_read_barrier_depends();
ret=nb->notifier_call(nb,val,v);
if(ret&NOTIFY_STOP_MASK)
{

2005-11-28 16:16:07

by Dipankar Sarma

[permalink] [raw]
Subject: Re: [PATCH] Allow lockless traversal of notifier lists

On Mon, Nov 28, 2005 at 05:05:47PM +0100, Andi Kleen wrote:
> *
> * Returns zero on success, or %-ENOENT on failure.
> */
> @@ -175,6 +181,7 @@
>

There should be an smp_read_barrier_depends() here for the first
dereferencing of the notifier block head, I think.

> while(nb)
> {
> + smp_read_barrier_depends();
> ret=nb->notifier_call(nb,val,v);
> if(ret&NOTIFY_STOP_MASK)
> {

Thanks
Dipankar

2005-11-28 16:27:13

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Allow lockless traversal of notifier lists

On Mon, Nov 28, 2005 at 09:47:47PM +0530, Dipankar Sarma wrote:
> On Mon, Nov 28, 2005 at 05:05:47PM +0100, Andi Kleen wrote:
> > *
> > * Returns zero on success, or %-ENOENT on failure.
> > */
> > @@ -175,6 +181,7 @@
> >
>
> There should be an smp_read_barrier_depends() here for the first
> dereferencing of the notifier block head, I think.

Why? The one at the top of the block should be enough, shouldn' it?

-Andi


>
> > while(nb)
> > {
> > + smp_read_barrier_depends();
> > ret=nb->notifier_call(nb,val,v);
> > if(ret&NOTIFY_STOP_MASK)
> > {
>
> Thanks
> Dipankar

2005-11-28 17:40:27

by Dipankar Sarma

[permalink] [raw]
Subject: Re: [PATCH] Allow lockless traversal of notifier lists

On Mon, Nov 28, 2005 at 05:27:09PM +0100, Andi Kleen wrote:
> On Mon, Nov 28, 2005 at 09:47:47PM +0530, Dipankar Sarma wrote:
> > On Mon, Nov 28, 2005 at 05:05:47PM +0100, Andi Kleen wrote:
> > > *
> > > * Returns zero on success, or %-ENOENT on failure.
> > > */
> > > @@ -175,6 +181,7 @@
> > >
> >
> > There should be an smp_read_barrier_depends() here for the first
> > dereferencing of the notifier block head, I think.
>
> Why? The one at the top of the block should be enough, shouldn' it?
>

Don't we insert at the front of the list ? Shouldn't the read-side
on alpha see the contents of the new notifier block before it sees
the pointer to the first notifier block in the list head ?

Thanks
Dipankar

2005-11-29 00:02:02

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Allow lockless traversal of notifier lists

On Mon, Nov 28, 2005 at 11:12:03PM +0530, Dipankar Sarma wrote:
> Don't we insert at the front of the list ? Shouldn't the read-side
> on alpha see the contents of the new notifier block before it sees
> the pointer to the first notifier block in the list head ?

Ok third version, hopefully Dipankar proof now.

Andrew, please consider applying.

-Andi

---

As discussed in other thread.

Notifiers could be locklessly traversed if there was no removal
ever, except that the update order is wrong.

Just needed an additional write barrier, so that a parallel
running lockup can never see inconsistent state. As long as there
is no unregistration or the unregistration is done using
locking or RCU in the caller they should be ok now.

This only makes a difference on non i386/x86-64 architectures.
x86 was already ok because it never reorders writes.

Signed-off-by: Andi Kleen <[email protected]>


diff -u linux-2.6.15rc2-work/kernel/sys.c-o linux-2.6.15rc2-work/kernel/sys.c
--- linux-2.6.15rc2-work/kernel/sys.c-o 2005-11-16 00:34:33.000000000 +0100
+++ linux-2.6.15rc2-work/kernel/sys.c 2005-11-29 00:33:26.000000000 +0100
@@ -102,6 +102,9 @@
* @n: New entry in notifier chain
*
* Adds a notifier to a notifier chain.
+ * As long as unregister is not used this is safe against parallel
+ * lockless notifier lookups. If unregister is used then unregister
+ * needs to do additional locking or use RCU.
*
* Currently always returns zero.
*/
@@ -116,6 +119,7 @@
list= &((*list)->next);
}
n->next = *list;
+ smp_wmb();
*list=n;
write_unlock(&notifier_lock);
return 0;
@@ -129,6 +133,8 @@
* @n: New entry in notifier chain
*
* Removes a notifier from a notifier chain.
+ * Note this needs additional locking or RCU in the caller to be safe
+ * against parallel traversals.
*
* Returns zero on success, or %-ENOENT on failure.
*/
@@ -171,10 +177,12 @@
int notifier_call_chain(struct notifier_block **n, unsigned long val, void *v)
{
int ret=NOTIFY_DONE;
- struct notifier_block *nb = *n;
-
+ struct notifier_block *nb;
+ smp_read_barrier_depends();
+ nb = *n;
while(nb)
{
+ smp_read_barrier_depends();
ret=nb->notifier_call(nb,val,v);
if(ret&NOTIFY_STOP_MASK)
{

2005-11-29 06:09:29

by Dipankar Sarma

[permalink] [raw]
Subject: Re: [PATCH] Allow lockless traversal of notifier lists

On Tue, Nov 29, 2005 at 01:01:58AM +0100, Andi Kleen wrote:
> On Mon, Nov 28, 2005 at 11:12:03PM +0530, Dipankar Sarma wrote:
>
> Ok third version, hopefully Dipankar proof now.
>

Not quite. I spoke without looking at the code of the whole
notifier_call_chain() function.

> + * against parallel traversals.
> *
> * Returns zero on success, or %-ENOENT on failure.
> */
> @@ -171,10 +177,12 @@
> int notifier_call_chain(struct notifier_block **n, unsigned long val, void *v)
> {
> int ret=NOTIFY_DONE;
> - struct notifier_block *nb = *n;
> -
> + struct notifier_block *nb;
> + smp_read_barrier_depends();
> + nb = *n;
> while(nb)
> {
> + smp_read_barrier_depends();
> ret=nb->notifier_call(nb,val,v);
> if(ret&NOTIFY_STOP_MASK)
> {

Looking at the full code, it seems to me that we dereference
the first notifier block only inside the while(nb) loop.
That means the smp_read_barrier_depends() in the while(nb)
loop should be sufficient - IOW, the previous version of
the patch with one smp_read_barrier_depends() was good.
Sorry about the confusion.

Thanks
Dipankar