2005-10-24 20:49:04

by Alan Stern

[permalink] [raw]
Subject: Notifier chains are unsafe

Has anyone been bothered by the fact that notifier chains are not safe
with regard to registration and unregistration while the chain is in use?
The notifier_chain_register and notifier_chain_unregister routines have
writelock protections, but the corresponding readlock is never taken!

It shouldn't be hard to make this work safely, even allowing such things
as notifier routines unregistering themselves as they run. The patch
below contains an example implementation, showing one way to do it.

But doing this correctly requires knowing how notifier chains are used.

Are they always called in process context, with interrupts enabled?

Or do some get called in interrupt context?

Are there any notifier chains invoked on a critical fast path?
(I hope not...)

How many different threads are likely to call a particular
notifier chain at one time?

Feedback is requested.

Alan Stern



Index: usb-2.6/kernel/sys.c
===================================================================
--- usb-2.6.orig/kernel/sys.c
+++ usb-2.6/kernel/sys.c
@@ -92,31 +92,45 @@ int cad_pid = 1;
* and the like.
*/

-static struct notifier_block *reboot_notifier_list;
-static DEFINE_RWLOCK(notifier_lock);
+static DEFINE_NOTIFIER_HEAD(reboot_notifier_list);
+
+struct notifier_caller {
+ struct notifier_block *next;
+ struct list_head node;
+};

/**
* notifier_chain_register - Add notifier to a notifier chain
- * @list: Pointer to root list pointer
+ * @nh: Pointer to head of the notifier chain
* @n: New entry in notifier chain
*
* Adds a notifier to a notifier chain.
*
* Currently always returns zero.
*/
-
-int notifier_chain_register(struct notifier_block **list, struct notifier_block *n)
+
+int notifier_chain_register(struct notifier_head *nh,
+ struct notifier_block *n)
{
- write_lock(&notifier_lock);
- while(*list)
- {
- if(n->priority > (*list)->priority)
+ struct notifier_block **p, *nnext;
+ struct list_head *pos;
+ struct notifier_caller *caller;
+
+ spin_lock(&nh->lock);
+ for (p = &nh->first; *p; p = &((*p)->next)) {
+ if (n->priority > (*p)->priority)
break;
- list= &((*list)->next);
}
- n->next = *list;
- *list=n;
- write_unlock(&notifier_lock);
+ n->next = nnext = *p;
+ *p = n;
+
+ list_for_each(pos, &nh->callers) {
+ caller = list_entry(pos, struct notifier_caller, node);
+ if (caller->next == nnext)
+ caller->next = n;
+ }
+
+ spin_unlock(&nh->lock);
return 0;
}

@@ -124,36 +138,49 @@ EXPORT_SYMBOL(notifier_chain_register);

/**
* notifier_chain_unregister - Remove notifier from a notifier chain
- * @nl: Pointer to root list pointer
- * @n: New entry in notifier chain
+ * @nh: Pointer to head of the notifier chain
+ * @n: Entry to remove from the chain
*
* Removes a notifier from a notifier chain.
*
* Returns zero on success, or %-ENOENT on failure.
*/

-int notifier_chain_unregister(struct notifier_block **nl, struct notifier_block *n)
+int notifier_chain_unregister(struct notifier_head *nh,
+ struct notifier_block *n)
{
- write_lock(&notifier_lock);
- while((*nl)!=NULL)
- {
- if((*nl)==n)
- {
- *nl=n->next;
- write_unlock(&notifier_lock);
- return 0;
- }
- nl=&((*nl)->next);
+ int ret = 0;
+ struct notifier_block **p;
+ struct list_head *pos;
+ struct notifier_caller *caller;
+
+ spin_lock(&nh->lock);
+ for (p = &nh->first; *p; p = &((*p)->next)) {
+ if (*p == n)
+ break;
+ }
+ if (!*p) {
+ ret = -ENOENT;
+ goto done;
+ }
+ *p = n->next;
+
+ list_for_each(pos, &nh->callers) {
+ caller = list_entry(pos, struct notifier_caller, node);
+ if (caller->next == n)
+ caller->next = n->next;
}
- write_unlock(&notifier_lock);
- return -ENOENT;
+
+done:
+ spin_unlock(&nh->lock);
+ return ret;
}

EXPORT_SYMBOL(notifier_chain_unregister);

/**
* notifier_call_chain - Call functions in a notifier chain
- * @n: Pointer to root pointer of notifier chain
+ * @nh: Pointer to head of the notifier chain
* @val: Value passed unmodified to notifier function
* @v: Pointer passed unmodified to notifier function
*
@@ -167,20 +194,28 @@ EXPORT_SYMBOL(notifier_chain_unregister)
* of the last notifier function called.
*/

-int notifier_call_chain(struct notifier_block **n, unsigned long val, void *v)
+int notifier_call_chain(struct notifier_head *nh, unsigned long val, void *v)
{
- int ret=NOTIFY_DONE;
- struct notifier_block *nb = *n;
+ int ret = NOTIFY_DONE;
+ struct notifier_caller caller;
+ struct notifier_block *n;

- while(nb)
- {
- ret=nb->notifier_call(nb,val,v);
- if(ret&NOTIFY_STOP_MASK)
- {
- return ret;
- }
- nb=nb->next;
+ spin_lock(&nh->lock);
+ caller.next = nh->first;
+ list_add(&caller.node, &nh->callers);
+
+ while (caller.next) {
+ n = caller.next;
+ caller.next = n->next;
+ spin_unlock(&nh->lock);
+ ret = n->notifier_call(n, val, v);
+ spin_lock(&nh->lock);
+ if (ret & NOTIFY_STOP_MASK)
+ break;
}
+
+ list_del(&caller.node);
+ spin_unlock(&nh->lock);
return ret;
}

Index: usb-2.6/include/linux/notifier.h
===================================================================
--- usb-2.6.orig/include/linux/notifier.h
+++ usb-2.6/include/linux/notifier.h
@@ -6,10 +6,20 @@
*
* Alan Cox <[email protected]>
*/
-
+
+/*
+ * This implementation allows entries to be safely added to or removed
+ * from a notifier chain at any time (in a context that can sleep),
+ * including while they are running or another process is calling
+ * along the chain. The main assumption is that the number of
+ * simultaneous calls for a chain will be fairly small.
+ */
+
#ifndef _LINUX_NOTIFIER_H
#define _LINUX_NOTIFIER_H
#include <linux/errno.h>
+#include <linux/spinlock.h>
+#include <linux/list.h>

struct notifier_block
{
@@ -18,12 +28,27 @@ struct notifier_block
int priority;
};

+struct notifier_head {
+ spinlock_t lock;
+ struct notifier_block *first;
+ struct list_head callers;
+};
+
+#define NOTIFIER_HEAD_INIT(name) { .lock = SPIN_LOCK_UNLOCKED, \
+ .first = NULL, .callers = LIST_HEAD_INIT((name).callers) }
+
+#define DEFINE_NOTIFIER_HEAD(name) \
+ struct notifier_head name = NOTIFIER_HEAD_INIT(name)
+

#ifdef __KERNEL__

-extern int notifier_chain_register(struct notifier_block **list, struct notifier_block *n);
-extern int notifier_chain_unregister(struct notifier_block **nl, struct notifier_block *n);
-extern int notifier_call_chain(struct notifier_block **n, unsigned long val, void *v);
+extern int notifier_chain_register(struct notifier_head *nh,
+ struct notifier_block *n);
+extern int notifier_chain_unregister(struct notifier_head *nh,
+ struct notifier_block *n);
+extern int notifier_call_chain(struct notifier_head *nh,
+ unsigned long val, void *v);

#define NOTIFY_DONE 0x0000 /* Don't care */
#define NOTIFY_OK 0x0001 /* Suits me */


2005-10-25 17:06:34

by Joe Seigh

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

Alan Stern wrote:
> Has anyone been bothered by the fact that notifier chains are not safe
> with regard to registration and unregistration while the chain is in use?
> The notifier_chain_register and notifier_chain_unregister routines have
> writelock protections, but the corresponding readlock is never taken!
>
> It shouldn't be hard to make this work safely, even allowing such things
> as notifier routines unregistering themselves as they run. The patch
> below contains an example implementation, showing one way to do it.
>
> But doing this correctly requires knowing how notifier chains are used.
>
> Are they always called in process context, with interrupts enabled?
>
> Or do some get called in interrupt context?
>
> Are there any notifier chains invoked on a critical fast path?
> (I hope not...)
>
> How many different threads are likely to call a particular
> notifier chain at one time?
>
> Feedback is requested.
>
> Alan Stern
>
[...]

It's not clear how you are making this safe. You aren't using one
of the known solutions to this problem. For GC lock-free based solutions,
you can't use RCU since notify_call can sleep. You could use a
form of reference counting but you'd have to implement it yourself.
Ditto on RCU+SMR or some other form of proxy GC. Not implemented.

You could use COR (Copy On Read). Make a copy of the list while holding
a lock, release the lock, do the notifications, and then delete the copy
of the list.

The non-blocking schemes can do notify_calls after unregistration so you
need to take this into account. Whatever you're calling against still
has to be there and has to be in a meaningful state.

2005-10-25 23:30:38

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Mon, 2005-10-24 at 16:48 -0400, Alan Stern wrote:

Hi Alan,

I agree with your approach of having a notifier_head to have both the
head of the notifier_list and the corresponding lock (instead of having
a single global rwlock to protect all notifier lists).

But, I am confused about the need for three data structures and two next
pointers. I think we can achieve the same by 2 data structures:

notifier_head {
spinlock_t lock;
list_head head;
};
notifier_block {
int (*notifier_call)(struct notifier_block *self,
unsigned long, void *);
list_head lists;
int priority;
};

I think that having multiple data structures make the code hard to
follow.

No. of register/unregister would be a lot lesser than a
notifier_call_chain() calls, so IMHO, rwlock would be a better option.

<snip>
> /**
> * notifier_call_chain - Call functions in a notifier chain
> - * @n: Pointer to root pointer of notifier chain
> + * @nh: Pointer to head of the notifier chain
> * @val: Value passed unmodified to notifier function
> * @v: Pointer passed unmodified to notifier function
> *
> @@ -167,20 +194,28 @@ EXPORT_SYMBOL(notifier_chain_unregister)
> * of the last notifier function called.
> */
>
> -int notifier_call_chain(struct notifier_block **n, unsigned long val, void *v)
> +int notifier_call_chain(struct notifier_head *nh, unsigned long val, void *v)
> {
> - int ret=NOTIFY_DONE;
> - struct notifier_block *nb = *n;
> + int ret = NOTIFY_DONE;
> + struct notifier_caller caller;
> + struct notifier_block *n;
>
> - while(nb)
> - {
> - ret=nb->notifier_call(nb,val,v);
> - if(ret&NOTIFY_STOP_MASK)
> - {
> - return ret;
> - }
> - nb=nb->next;
> + spin_lock(&nh->lock);
> + caller.next = nh->first;
> + list_add(&caller.node, &nh->callers);
> +
> + while (caller.next) {
> + n = caller.next;
> + caller.next = n->next;
> + spin_unlock(&nh->lock);
> + ret = n->notifier_call(n, val, v);
> + spin_lock(&nh->lock);
> + if (ret & NOTIFY_STOP_MASK)
> + break;
> }

Since the lock is being dropped while calling notifier_call, how are we
guaranteed caller.next is valid ? It might have been unregistered.
> +
> + list_del(&caller.node);
> + spin_unlock(&nh->lock);
> return ret;
> }
>

--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2005-10-25 23:43:19

by Andi Kleen

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

Alan Stern <[email protected]> writes:

> Has anyone been bothered by the fact that notifier chains are not safe
> with regard to registration and unregistration while the chain is in use?
> The notifier_chain_register and notifier_chain_unregister routines have
> writelock protections, but the corresponding readlock is never taken!

If you add locks to the reader make sure it is only taken
if the list is non empty. Otherwise you will add unacceptable
overhead to some fast paths.

Better would be likely to use RCU.

-Andi

2005-10-26 00:01:54

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Wed, 2005-10-26 at 01:43 +0200, Andi Kleen wrote:
> Alan Stern <[email protected]> writes:
>
> > Has anyone been bothered by the fact that notifier chains are not safe
> > with regard to registration and unregistration while the chain is in use?
> > The notifier_chain_register and notifier_chain_unregister routines have
> > writelock protections, but the corresponding readlock is never taken!
>
> If you add locks to the reader make sure it is only taken
> if the list is non empty. Otherwise you will add unacceptable
> overhead to some fast paths.
>
> Better would be likely to use RCU.

RCU will be a problem if the registered notifiers need to block.

>
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2005-10-26 06:13:52

by Keith Owens

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Mon, 24 Oct 2005 16:48:58 -0400 (EDT),
Alan Stern <[email protected]> wrote:
>Has anyone been bothered by the fact that notifier chains are not safe
>with regard to registration and unregistration while the chain is in use?
>The notifier_chain_register and notifier_chain_unregister routines have
>writelock protections, but the corresponding readlock is never taken!
>
>It shouldn't be hard to make this work safely, even allowing such things
>as notifier routines unregistering themselves as they run. The patch
>below contains an example implementation, showing one way to do it.
>
>But doing this correctly requires knowing how notifier chains are used.
>
> Are they always called in process context, with interrupts enabled?
>
> Or do some get called in interrupt context?

Register and unregister should only be called from contexts that can
sleep, although that may not be documented. notifier_call_chain() can
be called in any context, it must not take any locks.

2005-10-26 17:11:35

by Andi Kleen

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

Am Mi 26.10.2005 02:01 schrieb Chandra Seetharaman
<[email protected]>:

> > Better would be likely to use RCU.
>
> RCU will be a problem if the registered notifiers need to block.
 
Actually blocking should be ok, as long as the blocking notifier doesn't
unregister
itself. The current next pointer will be always reloaded after the
blocking.
 
Still on preemptive kernels there might be problems, but they could
be likely solved by a few strategic preempt_disables in
notifier_call_chain().
 
Dipankar, what do you think?
 
-Andi
 

 

2005-10-26 18:46:13

by Alan Stern

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Tue, 25 Oct 2005, Chandra Seetharaman wrote:

> Hi Alan,
>
> I agree with your approach of having a notifier_head to have both the
> head of the notifier_list and the corresponding lock (instead of having
> a single global rwlock to protect all notifier lists).

It seems pretty clear that a separate notifier_head would be a good thing
to have, no matter what other changes are made.

> But, I am confused about the need for three data structures and two next
> pointers. I think we can achieve the same by 2 data structures:
>
> notifier_head {
> spinlock_t lock;
> list_head head;
> };
> notifier_block {
> int (*notifier_call)(struct notifier_block *self,
> unsigned long, void *);
> list_head lists;
> int priority;
> };
>
> I think that having multiple data structures make the code hard to
> follow.
>
> No. of register/unregister would be a lot lesser than a
> notifier_call_chain() calls, so IMHO, rwlock would be a better option.

But if you simply use an rwlock, and acquire the readlock during
notifier_call_chain, then you will hang when a notifier routine tries to
unregister itself while it is running.

See below for more options...

> <snip>

> Since the lock is being dropped while calling notifier_call, how are we
> guaranteed caller.next is valid ? It might have been unregistered.

You are right. I should have realized that; it's an embarassing mistake.


On Wed, 26 Oct 2005, Keith Owens wrote:

> Register and unregister should only be called from contexts that can
> sleep, although that may not be documented.

I'm willing to take your word for it. However, if it isn't documented
then how can you be sure that all callers are doing it correctly? Note
that the code doesn't even contain a might_sleep()!

> notifier_call_chain() can
> be called in any context, it must not take any locks.

It sounds like there really are two different types of notifier chains:

(1) Chains that always run in process context, where the routines
are allowed to sleep.

(2) Chains that run in an atomic context, where the routines are
not allowed to sleep.

Presumably only type 2 chains must not take any locks (although perhaps
you wouldn't object to them using a readlock?). Evidently there's nothing
wrong with a type 1 chain taking a lock.

It's a pity that there's no easy way to distinguish the two types of
chains. Solving these problems correctly will require using different
code for the two types.


On 26 Oct 2005, Andi Kleen wrote:

> If you add locks to the reader make sure it is only taken
> if the list is non empty. Otherwise you will add unacceptable
> overhead to some fast paths.

I don't understand this comment. What point is there in avoiding a lock
when the list is empty? Did you mean that a lock should be taken only if
the list _is_ empty?

> Better would be likely to use RCU.

Note that the RCU documentation says RCU critical sections are not allowed
to sleep.


The whole situation is a ridiculous mess. Here are my current thoughts:

For sleepable chaine (type 1 above), the chain can be
protected either by an rwsem or a mutex. An rwsem is more
friendly when several threads may want to use the chain
simultaneously; a mutex allows (in principle) a routine
to unregister itself as it runs.

For non-sleepable chains (type 2 above), it may be acceptable
to protect the chain by an rwlock. If performance is so
important that no locks can be tolerated in notifier_call_chain
then RCU could be used instead.

On the whole,?it's probably best to disallow routines unregistering
themselves as they run. If a routine does want to do this, it can simply
use a local "enabled" flag. Instead of unregistering itself, it should
just clear the flag -- and whenever it runs, it should simply return if
the flag isn't set. (Of course, the routine would then have to be
unregistered from somewhere else.)

I've listed four possible implementations. The rwsem and RCU approaches
seem to be the most widely applicable. Even so, it will be necessary to
go through and indicate, for each existing chain, which type it is.

Alan Stern

2005-10-26 19:05:33

by Andi Kleen

[permalink] [raw]
Subject: Re: Notifier chains are unsafe


>
> On 26 Oct 2005, Andi Kleen wrote:
>
> > If you add locks to the reader make sure it is only taken
> > if the list is non empty. Otherwise you will add unacceptable
> > overhead to some fast paths.
>
> I don't understand this comment. What point is there in avoiding a
> lock
> when the list is empty?
 
It doesn't add locking overhead for a common case.
 
> Did you mean that a lock should be taken only if
> the list _is_ empty?

No.

> > Better would be likely to use RCU.
>
> Note that the RCU documentation says RCU critical sections are not
> allowed
> to sleep.
 
In this case it would be ok.

 
-Andi

2005-10-26 20:40:06

by Alan Stern

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Wed, 26 Oct 2005, Andreas Kleen wrote:

> > Note that the RCU documentation says RCU critical sections are not
> > allowed
> > to sleep.
>
> In this case it would be ok.

I don't understand. If it's okay for an RCU critical section to sleep in
this case, why wouldn't it be okay always? What's special here?

Aren't there requirements about critical sections finishing on the same
CPU as they started on?

Can you please explain in more detail?

Alan Stern

2005-10-26 21:43:55

by Andi Kleen

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Wednesday 26 October 2005 22:40, Alan Stern wrote:
l> On Wed, 26 Oct 2005, Andreas Kleen wrote:
>
> > > Note that the RCU documentation says RCU critical sections are not
> > > allowed
> > > to sleep.
> >
> > In this case it would be ok.
>
> I don't understand. If it's okay for an RCU critical section to sleep in
> this case, why wouldn't it be okay always? What's special here?
>
> Aren't there requirements about critical sections finishing on the same
> CPU as they started on?


Like I wrote earlier: as long as the notifier doesn't unregister itself
the critical RCU section for the list walk is only a small part of notifier_call_chain.
It's basically a stable anchor in the list that won't change.

The only change needed would be to make these parts unpreemptable and of course
add a RCU step during unregistration.

-Andi

2005-10-26 22:40:40

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Wed, 2005-10-26 at 14:46 -0400, Alan Stern wrote:
> On Tue, 25 Oct 2005, Chandra Seetharaman wrote:
>
> > Hi Alan,
> >
> > I agree with your approach of having a notifier_head to have both the
> > head of the notifier_list and the corresponding lock (instead of having
> > a single global rwlock to protect all notifier lists).
>
> It seems pretty clear that a separate notifier_head would be a good thing
> to have, no matter what other changes are made.

Totally agree.

BTW, We have been discussing about "task_notifier" in lse-tech
(http://marc.theaimsgroup.com/?l=lse-tech&m=112908542104457&w=2). If we
have this basic notifier mechanism hardened enough, we could use this
notifier mechanism.

>
> > But, I am confused about the need for three data structures and two next
> > pointers. I think we can achieve the same by 2 data structures:
> >
> > notifier_head {
> > spinlock_t lock;
> > list_head head;
> > };
> > notifier_block {
> > int (*notifier_call)(struct notifier_block *self,
> > unsigned long, void *);
> > list_head lists;
> > int priority;
> > };
> >
> > I think that having multiple data structures make the code hard to
> > follow.
> >
> > No. of register/unregister would be a lot lesser than a
> > notifier_call_chain() calls, so IMHO, rwlock would be a better option.
>
> But if you simply use an rwlock, and acquire the readlock during
> notifier_call_chain, then you will hang when a notifier routine tries to
> unregister itself while it is running.

I meant to say that instead of the simple spinlock we should use rwlock.

> > notifier_call_chain() can
> > be called in any context, it must not take any locks.
>
> It sounds like there really are two different types of notifier chains:
>
> (1) Chains that always run in process context, where the routines
> are allowed to sleep.
>
> (2) Chains that run in an atomic context, where the routines are
> not allowed to sleep.

IMO having two notifiers which differs only in the way mentioned above
would be confusing. Also, going through all the existing usages and
changing it to proper one could be little painful.
>
> Presumably only type 2 chains must not take any locks (although perhaps
> you wouldn't object to them using a readlock?). Evidently there's nothing
> wrong with a type 1 chain taking a lock.

Not true, the registered function could block. So, we cannot protect the
list in notifier_call_chain() using a lock that surrounds notifier_call.

>
> It's a pity that there's no easy way to distinguish the two types of
> chains. Solving these problems correctly will require using different
> code for the two types.
>
>
> On 26 Oct 2005, Andi Kleen wrote:
>
> > If you add locks to the reader make sure it is only taken
> > if the list is non empty. Otherwise you will add unacceptable
> > overhead to some fast paths.
>
> I don't understand this comment. What point is there in avoiding a lock
> when the list is empty? Did you mean that a lock should be taken only if
> the list _is_ empty?
>
> > Better would be likely to use RCU.
>
> Note that the RCU documentation says RCU critical sections are not allowed
> to sleep.
>
>
> The whole situation is a ridiculous mess. Here are my current thoughts:
>
> For sleepable chaine (type 1 above), the chain can be
> protected either by an rwsem or a mutex. An rwsem is more
> friendly when several threads may want to use the chain
> simultaneously; a mutex allows (in principle) a routine
> to unregister itself as it runs.
>
> For non-sleepable chains (type 2 above), it may be acceptable
> to protect the chain by an rwlock. If performance is so
> important that no locks can be tolerated in notifier_call_chain
> then RCU could be used instead.
>
> On the whole, it's probably best to disallow routines unregistering
> themselves as they run. If a routine does want to do this, it can simply
> use a local "enabled" flag. Instead of unregistering itself, it should
> just clear the flag -- and whenever it runs, it should simply return if
> the flag isn't set. (Of course, the routine would then have to be
> unregistered from somewhere else.)
>
> I've listed four possible implementations. The rwsem and RCU approaches
> seem to be the most widely applicable. Even so, it will be necessary to
> go through and indicate, for each existing chain, which type it is.
>
> Alan Stern
>
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2005-10-26 23:20:23

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Wed, 2005-10-26 at 23:44 +0200, Andi Kleen wrote:
> On Wednesday 26 October 2005 22:40, Alan Stern wrote:
> l> On Wed, 26 Oct 2005, Andreas Kleen wrote:
> >
> > > > Note that the RCU documentation says RCU critical sections are not
> > > > allowed
> > > > to sleep.
> > >
> > > In this case it would be ok.
> >
> > I don't understand. If it's okay for an RCU critical section to sleep in
> > this case, why wouldn't it be okay always? What's special here?
> >
> > Aren't there requirements about critical sections finishing on the same
> > CPU as they started on?
>
>
> Like I wrote earlier: as long as the notifier doesn't unregister itself
> the critical RCU section for the list walk is only a small part of notifier_call_chain.
> It's basically a stable anchor in the list that won't change.

Andy, comment above rcu_read_lock says, "It is illegal to block while in
an RCU read-side critical section."

As i mentioned in the other email we are discussing about "task
notifier" in lse-tech. We thought of using RCU, but one of the
requirements was that the registered function should be able to block,
so we are looking for alternatives.

>
> The only change needed would be to make these parts unpreemptable and of course
> add a RCU step during unregistration.
>
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2005-10-27 01:20:19

by Joe Seigh

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

Chandra Seetharaman wrote:
> Andy, comment above rcu_read_lock says, "It is illegal to block while in
> an RCU read-side critical section."
>
> As i mentioned in the other email we are discussing about "task
> notifier" in lse-tech. We thought of using RCU, but one of the
> requirements was that the registered function should be able to block,
> so we are looking for alternatives.
>

What are the requirements that preclude a conventional rwlock? If you
don't have any, then you should go with that.

The other solutions I've mentioned before.

Copy on read.

Various lock-free schemes:
SMR hazard pointers
RCU+SMR (probably overkill since you don't need the read side performance)
reference counting
proxy reference counting

The last would probably be the easiest to implement expecially if you used
a spinlock to safely increment the reference count without the more complicated
atomic thread-safety. It's also more self contained.

User land implementations of most of the above can be found at
http://sourceforge.net/projects/atomic-ptr-plus/

The proxy refcounting stuff is in the atomic-ptr-plus package. It's
in c++ but you should be able to figure it out.

RCU+SMR is in the fastsmr package.



--
Joe Seigh

2005-10-27 02:47:37

by Herbert Xu

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

Andreas Kleen <[email protected]> wrote:
> Am Mi 26.10.2005 02:01 schrieb Chandra Seetharaman
> <[email protected]>:
>
>> > Better would be likely to use RCU.
>>
>> RCU will be a problem if the registered notifiers need to block.
> ?
> Actually blocking should be ok, as long as the blocking notifier doesn't
> unregister
> itself. The current next pointer will be always reloaded after the
> blocking.

Blocking would be OK as long as you reference count the objects.
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2005-10-27 14:13:12

by Alan Stern

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Wed, 26 Oct 2005, Andi Kleen wrote:

> Like I wrote earlier: as long as the notifier doesn't unregister itself
> the critical RCU section for the list walk is only a small part of notifier_call_chain.
> It's basically a stable anchor in the list that won't change.

I have to disagree with you. The critical section is the entire dynamic
scope of notifier_call_chain. After all, what if a _different_ thread
unregisters a notifier routine while the routine is running? What if the
_following_ routine is unregistered also?

The desired behavior for notifier_unregister is that when it returns, the
notifier routine is not running on any processor and will not start
running. The only way to guarantee this is to put the entire routine into
the critical section. And that means putting the entire chain into the
critical section.

Alan Stern

2005-10-27 15:28:27

by Alan Stern

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Wed, 26 Oct 2005, Chandra Seetharaman wrote:

> > It seems pretty clear that a separate notifier_head would be a good thing
> > to have, no matter what other changes are made.
>
> Totally agree.

> > It sounds like there really are two different types of notifier chains:
> >
> > (1) Chains that always run in process context, where the routines
> > are allowed to sleep.
> >
> > (2) Chains that run in an atomic context, where the routines are
> > not allowed to sleep.
>
> IMO having two notifiers which differs only in the way mentioned above
> would be confusing. Also, going through all the existing usages and
> changing it to proper one could be little painful.

It would be less confusing than the state we're in now! The difference
between the two types of notifiers would be very analogous to the
difference between semaphores and spinlocks, which IMO isn't confusing at
all.

I agree that updating all the existing definitions would be a little
painful. However, adding a new notifier_head will require doing those
updates anyway.

> > Presumably only type 2 chains must not take any locks (although perhaps
> > you wouldn't object to them using a readlock?). Evidently there's nothing
> > wrong with a type 1 chain taking a lock.
>
> Not true, the registered function could block. So, we cannot protect the
> list in notifier_call_chain() using a lock that surrounds notifier_call.

Sorry, I meant to say there's nothing wrong with a type 1 chain taking a
semaphore.


What do you think of the untested patch below?

Alan Stern



Signed-off-by: Alan Stern <[email protected]>

Index: usb-2.6/kernel/sys.c
===================================================================
--- usb-2.6.orig/kernel/sys.c
+++ usb-2.6/kernel/sys.c
@@ -92,31 +92,41 @@ int cad_pid = 1;
* and the like.
*/

-static struct notifier_block *reboot_notifier_list;
-static DEFINE_RWLOCK(notifier_lock);
+static DEFINE_BLOCKING_NOTIFIER_HEAD(reboot_notifier_list);

/**
* notifier_chain_register - Add notifier to a notifier chain
- * @list: Pointer to root list pointer
- * @n: New entry in notifier chain
+ * @nh: Pointer to head of the notifier chain
+ * @n: New entry to add to notifier chain
*
* Adds a notifier to a notifier chain.
*
+ * Context: Must be able to sleep.
+ *
* Currently always returns zero.
*/

-int notifier_chain_register(struct notifier_block **list, struct notifier_block *n)
+int notifier_chain_register(struct notifier_head *nh,
+ struct notifier_block *n)
{
- write_lock(&notifier_lock);
- while(*list)
- {
- if(n->priority > (*list)->priority)
- break;
- list= &((*list)->next);
+ struct list_head *pos;
+ struct notifier_block *b;
+
+ down_write(&nh->rwsem);
+ if (nh->type == ATOMIC_NOTIFIER)
+ rcu_read_lock();
+ list_for_each_rcu(pos, &nh->chain) {
+ b = list_entry(pos, struct notifier_block, node);
+ if (n->priority > b->priority)
+ break;
+ }
+ list_add_tail_rcu(&n->node, pos);
+ up_write(&nh->rwsem);
+
+ if (nh->type == ATOMIC_NOTIFIER) {
+ rcu_read_unlock();
+ synchronize_rcu();
}
- n->next = *list;
- *list=n;
- write_unlock(&notifier_lock);
return 0;
}

@@ -124,36 +134,34 @@ EXPORT_SYMBOL(notifier_chain_register);

/**
* notifier_chain_unregister - Remove notifier from a notifier chain
- * @nl: Pointer to root list pointer
- * @n: New entry in notifier chain
++ * @nh: Pointer to head of the notifier chain
++ * @n: Entry to remove from the chain
*
- * Removes a notifier from a notifier chain.
+ * Removes a notifier from a notifier chain. The notifier must
+ * previously have been added to the chain.
*
- * Returns zero on success, or %-ENOENT on failure.
+ * Context: Must be able to sleep.
+ *
+ * Currently always returns zero.
*/

-int notifier_chain_unregister(struct notifier_block **nl, struct notifier_block *n)
+int notifier_chain_unregister(struct notifier_head *nh,
+ struct notifier_block *n)
{
- write_lock(&notifier_lock);
- while((*nl)!=NULL)
- {
- if((*nl)==n)
- {
- *nl=n->next;
- write_unlock(&notifier_lock);
- return 0;
- }
- nl=&((*nl)->next);
- }
- write_unlock(&notifier_lock);
- return -ENOENT;
+ down_write(&nh->rwsem);
+ list_del_rcu(&n->node);
+ up_write(&nh->rwsem);
+
+ if (nh->type == ATOMIC_NOTIFIER)
+ synchronize_rcu();
+ return 0;
}

EXPORT_SYMBOL(notifier_chain_unregister);

/**
* notifier_call_chain - Call functions in a notifier chain
- * @n: Pointer to root pointer of notifier chain
++ * @nh: Pointer to head of the notifier chain
* @val: Value passed unmodified to notifier function
* @v: Pointer passed unmodified to notifier function
*
@@ -167,20 +175,32 @@ EXPORT_SYMBOL(notifier_chain_unregister)
* of the last notifier function called.
*/

-int notifier_call_chain(struct notifier_block **n, unsigned long val, void *v)
+int notifier_call_chain(struct notifier_head *nh, unsigned long val, void *v)
{
- int ret=NOTIFY_DONE;
- struct notifier_block *nb = *n;
+ int ret = NOTIFY_DONE;
+ struct list_head *pos;
+ struct notifier_block *b;

- while(nb)
- {
- ret=nb->notifier_call(nb,val,v);
- if(ret&NOTIFY_STOP_MASK)
- {
- return ret;
- }
- nb=nb->next;
+ if (list_empty(&nh->chain)) /* Optimize for common case */
+ return ret;
+
+ if (nh->type == ATOMIC_NOTIFIER)
+ rcu_read_lock();
+ else /* BLOCKING_NOTIFIER */
+ down_read(&nh->rwsem);
+
+ list_for_each_rcu(pos, &nh->chain) {
+ b = list_entry(pos, struct notifier_block, node);
+ ret = b->notifier_call(b, val, v);
+ if (ret & NOTIFY_STOP_MASK)
+ break;
}
+
+ if (nh->type == ATOMIC_NOTIFIER)
+ rcu_read_unlock();
+ else /* BLOCKING_NOTIFIER */
+ up_read(&nh->rwsem);
+
return ret;
}

Index: usb-2.6/include/linux/notifier.h
===================================================================
--- usb-2.6.orig/include/linux/notifier.h
+++ usb-2.6/include/linux/notifier.h
@@ -10,25 +10,74 @@
#ifndef _LINUX_NOTIFIER_H
#define _LINUX_NOTIFIER_H
#include <linux/errno.h>
+#include <linux/rwsem.h>
+#include <linux/list.h>
+
+/*
+ * There are two types of notifier chains:
+ *
+ * Atomic notifier chains, which may run in an atomic context
+ * and whose entries are not allowed to block, and
+ *
+ * Blocking notifier chains, which always run in process
+ * context and whose entries are allowed to block.
+ *
+ * The type of a chain is determined by the definition of its head.
+ * Atomic notifier chains use RCU for registration/unregistration,
+ * so they can run with no locking overhead.
+ */
+
+enum notifier_type {
+ ATOMIC_NOTIFIER,
+ BLOCKING_NOTIFIER,
+};
+
+struct notifier_head {
+ enum notifier_type type;
+ struct rw_semaphore rwsem;
+ struct list_head chain;
+};
+
+#define ATOMIC_NOTIFIER_HEAD_INIT(name) { \
+ .type = ATOMIC_NOTIFIER, \
+ .rwsem = __RWSEM_INITIALIZER((name).rwsem), \
+ .chain = LIST_HEAD_INIT((name).chain) }
+
+#define BLOCKING_NOTIFIER_HEAD_INIT(name) { \
+ .type = BLOCKING_NOTIFIER, \
+ .rwsem = __RWSEM_INITIALIZER((name).rwsem), \
+ .chain = LIST_HEAD_INIT((name).chain) }
+
+#define DEFINE_ATOMIC_NOTIFIER_HEAD(name) \
+ struct notifier_head name = ATOMIC_NOTIFIER_HEAD_INIT(name)
+
+#define DEFINE_BLOCKING_NOTIFIER_HEAD(name) \
+ struct notifier_head name = BLOCKING_NOTIFIER_HEAD_INIT(name)
+

struct notifier_block
{
- int (*notifier_call)(struct notifier_block *self, unsigned long, void *);
- struct notifier_block *next;
+ int (*notifier_call)(struct notifier_block *self, unsigned long,
+ void *);
+ struct list_head node;
int priority;
};


#ifdef __KERNEL__

-extern int notifier_chain_register(struct notifier_block **list, struct notifier_block *n);
-extern int notifier_chain_unregister(struct notifier_block **nl, struct notifier_block *n);
-extern int notifier_call_chain(struct notifier_block **n, unsigned long val, void *v);
+extern int notifier_chain_register(struct notifier_head *nh,
+ struct notifier_block *n);
+extern int notifier_chain_unregister(struct notifier_head *nh,
+ struct notifier_block *n);
+extern int notifier_call_chain(struct notifier_head *nh,
+ unsigned long val, void *v);

#define NOTIFY_DONE 0x0000 /* Don't care */
#define NOTIFY_OK 0x0001 /* Suits me */
#define NOTIFY_STOP_MASK 0x8000 /* Don't call further */
-#define NOTIFY_BAD (NOTIFY_STOP_MASK|0x0002) /* Bad/Veto action */
+#define NOTIFY_BAD (NOTIFY_STOP_MASK|0x0002)
+ /* Bad/Veto action */
/*
* Clean way to return from the notifier and stop further calls.
*/

2005-10-27 20:43:46

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Thu, 2005-10-27 at 11:28 -0400, Alan Stern wrote:
> On Wed, 26 Oct 2005, Chandra Seetharaman wrote:
>
> > > It seems pretty clear that a separate notifier_head would be a good thing
> > > to have, no matter what other changes are made.
> >
> > Totally agree.
>
> > > It sounds like there really are two different types of notifier chains:
> > >
> > > (1) Chains that always run in process context, where the routines
> > > are allowed to sleep.
> > >
> > > (2) Chains that run in an atomic context, where the routines are
> > > not allowed to sleep.
> >
> > IMO having two notifiers which differs only in the way mentioned above
> > would be confusing. Also, going through all the existing usages and
> > changing it to proper one could be little painful.
>
> It would be less confusing than the state we're in now! The difference
> between the two types of notifiers would be very analogous to the
> difference between semaphores and spinlocks, which IMO isn't confusing at
> all.
>
> I agree that updating all the existing definitions would be a little
> painful. However, adding a new notifier_head will require doing those
> updates anyway.

That change would be minimal, one have only change the place from which
the notify_register is called.

Whereas if we change it to become two types one has to go through the
corresponding callouts (and the functions they call and so on) to see
which (BLOCKING or ATOMIC) notifier mechanism to use.

>
> > > Presumably only type 2 chains must not take any locks (although perhaps
> > > you wouldn't object to them using a readlock?). Evidently there's nothing
> > > wrong with a type 1 chain taking a lock.
> >
> > Not true, the registered function could block. So, we cannot protect the
> > list in notifier_call_chain() using a lock that surrounds notifier_call.
>
> Sorry, I meant to say there's nothing wrong with a type 1 chain taking a
> semaphore.
>
>
> What do you think of the untested patch below?

Functionally looks good. But there are two problems w.r.t interface
changes:
1. above mentioned problem of making sure of all the places to use
the proper one.
2. Requiring register and unregister to be able to sleep. (there are
few usages that are called with a lock held)

How does the following code look (only change w.r.t the existing usage
model is that unregister can now return -EAGAIN, if the list is busy).

One assumption the following code makes is that the store of a pointer
(next in the list) is atomic. If that assumption is unacceptable, we can
do one of two things:
1. change notify_register to return -EAGAIN if list is busy.
2. move the chain list in call_chain under lock and use that
list instead of using the chain in the head, and restore it back
before returning.

==================================================
struct notifier_head {
spinlock_t lock; /* protects both chain and readers below */
struct list_head chain; /* chain of notifier blocks */
int readers; /* current no. of readers of the chain */
};

struct notifier_block {
int (*notifier_call)(struct notifier_block *self, unsigned long,
void *);
struct list_head node;
int priority;
};

int notifier_chain_register(struct notifier_head *nh,
struct notifier_block *n)
{
spin_lock(nh->lock);
list_for_each_rcu(pos, &nh->chain) {
b = list_entry(pos, struct notifier_block, node);
if (n->priority > b->priority)
break;
}
list_add_tail(&n->node, pos);
spin_unlock(nh->lock);
return 0;
}

int notifier_chain_unregister(struct notifier_head *nh,
struct notifier_block *n)
{
int rc = 0;
spin_lock(nh->lock);
if (nh->readers == 0)
list_del(&n->node);
else
rc = -EAGAIN;
spin_unlock(nh->lock);
return rc;
}

int notifier_call_chain(struct notifier_head *nh, unsigned long val,
void *v)
{
int ret = 0;
list_head *cur ;
notifier_block *b, *tmp;

spin_lock(nh->lock);
nh->readers++;
spin_unlock(nh->lock);

if (list_empty(&nh->chain)) /* Optimize for common case */
return ret;

list_for_each_entry_safe(b, tmp, &nh->chain, node) {
ret = b->notifier_call(b, val, v);
if (ret & NOTIFY_STOP_MASK)
goto done;
} while (cur != &nh->chain);

done:
spin_lock(nh->lock);
nh->readers--; /* is locking really needed ? */
spin_unlock(nh->lock);
return ret;
}
=================================================================

--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2005-10-27 21:21:45

by Alan Stern

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Thu, 27 Oct 2005, Chandra Seetharaman wrote:

> > I agree that updating all the existing definitions would be a little
> > painful. However, adding a new notifier_head will require doing those
> > updates anyway.
>
> That change would be minimal, one have only change the place from which
> the notify_register is called.
>
> Whereas if we change it to become two types one has to go through the
> corresponding callouts (and the functions they call and so on) to see
> which (BLOCKING or ATOMIC) notifier mechanism to use.

True, the amount of study required would be greater -- but the actual
change is still minimal! :-)

In many cases the type is obvious. For instance, there are lots of
callouts that occur in ioctl handlers; they clearly have to be BLOCKING.

In fact there's another sort of problem I neglected to mention: Callouts
that unregister themselves will have to be changed. I think that's going
to be unavoidable. Fortunately there probably aren't very many, but
finding them might not be easy.

> Functionally looks good. But there are two problems w.r.t interface
> changes:
> 1. above mentioned problem of making sure of all the places to use
> the proper one.

This at least is a one-time difficulty, and it doesn't require changing
many sections of code. Last time I checked, there were 28 notifier chain
definitions in the kernel.

> 2. Requiring register and unregister to be able to sleep. (there are
> few usages that are called with a lock held)

Are there? I haven't looked at all of them closely. According to Keith
Owens, such usages are wrong.

> How does the following code look (only change w.r.t the existing usage
> model is that unregister can now return -EAGAIN, if the list is busy).
>
> One assumption the following code makes is that the store of a pointer
> (next in the list) is atomic. If that assumption is unacceptable, we can
> do one of two things:
> 1. change notify_register to return -EAGAIN if list is busy.
> 2. move the chain list in call_chain under lock and use that
> list instead of using the chain in the head, and restore it back
> before returning.

I see a couple of problems (aside from the trivial one where you increment
nh->readers before the early exit).

The biggest problem is allowing unregister to return an error. None of
its callers will expect that, and they all will have to be changed. There
are a lot more calls to unregister than there are chain definitions.

The other problem is that you violated Keith's statement that
notifier_call_chain shouldn't take any locks. On the other hand, if we
put together all the requirements people have listed for notifier chains,
the resulting set is inconsistent! That's part of the reason why I
suggested implementing two different kinds of chains.

Alan Stern

2005-10-27 23:02:11

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Thu, 2005-10-27 at 17:21 -0400, Alan Stern wrote:

> > How does the following code look (only change w.r.t the existing usage
> > model is that unregister can now return -EAGAIN, if the list is busy).
> >
> > One assumption the following code makes is that the store of a pointer
> > (next in the list) is atomic. If that assumption is unacceptable, we can
> > do one of two things:
> > 1. change notify_register to return -EAGAIN if list is busy.
> > 2. move the chain list in call_chain under lock and use that
> > list instead of using the chain in the head, and restore it back
> > before returning.
>
> I see a couple of problems (aside from the trivial one where you increment
> nh->readers before the early exit).

Just a programmatic error. shouldn't be a problem.

>
> The biggest problem is allowing unregister to return an error. None of
> its callers will expect that, and they all will have to be changed. There
> are a lot more calls to unregister than there are chain definitions.

IMO, we will not be changing the interface so it should be fine.

> The other problem is that you violated Keith's statement that
> notifier_call_chain shouldn't take any locks. On the other hand, if we

I would interpret Keith's comment like this: callout should not be
called with any locks held (because that would limit the callouts from
blocking).

Keith, can you please clarify
>
> put together all the requirements people have listed for notifier chains,
> the resulting set is inconsistent! That's part of the reason why I
> suggested implementing two different kinds of chains.
>
> Alan Stern
>
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2005-10-28 00:48:15

by Keith Owens

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Thu, 27 Oct 2005 16:02:08 -0700,
Chandra Seetharaman <[email protected]> wrote:
>On Thu, 2005-10-27 at 17:21 -0400, Alan Stern wrote:
>> The other problem is that you violated Keith's statement that
>> notifier_call_chain shouldn't take any locks. On the other hand, if we
>
>I would interpret Keith's comment like this: callout should not be
>called with any locks held (because that would limit the callouts from
>blocking).

We should be able to call notifier_call_chain() from any context. That
includes oops, panic, NMI and other unmaskable machine check events.
If you can call notifier_call_chain() from an unmaskable context then
it follows that the callbacks cannot take any locks. Locks are not
safe in NMI context.

2005-10-28 01:34:23

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Fri, 2005-10-28 at 10:48 +1000, Keith Owens wrote:
> On Thu, 27 Oct 2005 16:02:08 -0700,
> Chandra Seetharaman <[email protected]> wrote:
> >On Thu, 2005-10-27 at 17:21 -0400, Alan Stern wrote:
> >> The other problem is that you violated Keith's statement that
> >> notifier_call_chain shouldn't take any locks. On the other hand, if we
> >
> >I would interpret Keith's comment like this: callout should not be
> >called with any locks held (because that would limit the callouts from
> >blocking).
>
> We should be able to call notifier_call_chain() from any context. That
> includes oops, panic, NMI and other unmaskable machine check events.
> If you can call notifier_call_chain() from an unmaskable context then
> it follows that the callbacks cannot take any locks. Locks are not
> safe in NMI context.

Thanks Keith.

That really ties our hands down.

So, requirements to fix the bug are:
- no sleeping in register/unregister(if we want to keep the
current way of use. We can change it and make the relevant
changes in the kernel code, if it is agreeable)
- notifier_call_chain could be called from any context
- callout function could sleep
- no acquiring locks in notifier_call_chain
- make sure the list is consistent :) (which is problem Alan
started to fix)
- anything else ?

Alan, i think your solution of maintaining two notifier chains satisfies
all the needs mentioned. Will look at it more closely and respond
tomorrow.

Thanks & Regards,

chandra

>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2005-10-28 01:36:38

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Wed, 2005-10-26 at 21:17 -0400, Joe Seigh wrote:
> Chandra Seetharaman wrote:
> > Andy, comment above rcu_read_lock says, "It is illegal to block while in
> > an RCU read-side critical section."
> >
> > As i mentioned in the other email we are discussing about "task
> > notifier" in lse-tech. We thought of using RCU, but one of the
> > requirements was that the registered function should be able to block,
> > so we are looking for alternatives.
> >
>
> What are the requirements that preclude a conventional rwlock? If you
> don't have any, then you should go with that.

I was thinking the problem is that we cannot hold any locks while
calling the callouts.

But, As Keith mentioned, we cannot even acquire a lock in the
notifier_call_chain.

Thanks,

chandra
>
> The other solutions I've mentioned before.
>
> Copy on read.
>
> Various lock-free schemes:
> SMR hazard pointers
> RCU+SMR (probably overkill since you don't need the read side performance)
> reference counting
> proxy reference counting
>
> The last would probably be the easiest to implement expecially if you used
> a spinlock to safely increment the reference count without the more complicated
> atomic thread-safety. It's also more self contained.
>
> User land implementations of most of the above can be found at
> http://sourceforge.net/projects/atomic-ptr-plus/
>
> The proxy refcounting stuff is in the atomic-ptr-plus package. It's
> in c++ but you should be able to figure it out.
>
> RCU+SMR is in the fastsmr package.
>
>
>
> --
> Joe Seigh
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2005-10-28 04:35:48

by Pete Zaitcev

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Fri, 28 Oct 2005 10:48:00 +1000, Keith Owens <[email protected]> wrote:

> We should be able to call notifier_call_chain() from any context. That
> includes oops, panic, NMI and other unmaskable machine check events.
> If you can call notifier_call_chain() from an unmaskable context then
> it follows that the callbacks cannot take any locks. Locks are not
> safe in NMI context.

I understand your need, Keith, but this is impossible, as long as we
meet demands of people who want to unregister notifiers from their
own call-out functions (or other notifiers). Remember that these
functions might sleep. Obviously, those functions which your code calls
from an NMI context are written with that in mind and do not try to
allocate memory with GFP_KERNEL. However, this is not the case for
all notifier call-out functions.

I am inclined to think that we have to split the notifier interface
in two: with locked chain walk and with unlocked chain walk.
The "unlocked" version can be returned to its Linux 2.2 state,
where no locks of any sort were taken at all, even during the
registration. Keith would be using that one. The locked version
would be in use by USB and others.

The claim that unregister_reboot_notifier has not be called from the
notifier call-out was made by Leonard, who is, unfortunately, dead,
so we cannot ask if he changed his mind:
http://www.ussg.iu.edu/hypermail/linux/kernel/0007.3/0548.html
Perhaps it's not true anymore. Also ANK appeared to observe that the
practice was unsafe anyway.

-- Pete

2005-10-28 14:23:05

by Alan Stern

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Thu, 27 Oct 2005, Chandra Seetharaman wrote:

> So, requirements to fix the bug are:
> - no sleeping in register/unregister(if we want to keep the
> current way of use. We can change it and make the relevant
> changes in the kernel code, if it is agreeable)

I think we will have to make these changes. In principal it shouldn't be
hard to add a simple "enabled" flag to each callout which currently is
registered/unregistered atomically or while running. We could even put
such a flag into the notifier_block structure and add routines to set or
clear it, using appropriate barriers.

> - notifier_call_chain could be called from any context
> - callout function could sleep
> - no acquiring locks in notifier_call_chain
> - make sure the list is consistent :) (which is problem Alan
> started to fix)
> - anything else ?

Let's clarify the "list is consistent" statement. Obviously it implies
that no more than one thread can modify the list pointers at any time.
Beyond that, there should be a guarantee that when unregister returns, the
routine being removed is not in use and will not be called by any thread.
Likewise, after register returns, any invocation of notifier_call_chain
should see the new routine.

Alan Stern

2005-10-28 22:15:55

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Fri, 2005-10-28 at 10:23 -0400, Alan Stern wrote:
> On Thu, 27 Oct 2005, Chandra Seetharaman wrote:
>
> > So, requirements to fix the bug are:
> > - no sleeping in register/unregister(if we want to keep the
> > current way of use. We can change it and make the relevant
> > changes in the kernel code, if it is agreeable)
>
> I think we will have to make these changes. In principal it shouldn't be
> hard to add a simple "enabled" flag to each callout which currently is
> registered/unregistered atomically or while running. We could even put
> such a flag into the notifier_block structure and add routines to set or
> clear it, using appropriate barriers.

I do not understand the purpose of enabled flag. Can you clarify
>
> > - notifier_call_chain could be called from any context
> > - callout function could sleep
> > - no acquiring locks in notifier_call_chain
> > - make sure the list is consistent :) (which is problem Alan
> > started to fix)
> > - anything else ?
>
> Let's clarify the "list is consistent" statement. Obviously it implies
> that no more than one thread can modify the list pointers at any time.
> Beyond that, there should be a guarantee that when unregister returns, the
> routine being removed is not in use and will not be called by any thread.
> Likewise, after register returns, any invocation of notifier_call_chain
> should see the new routine.

true
>
> Alan Stern
>
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2005-10-29 12:28:04

by Joe Seigh

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

Herbert Xu wrote:
> Andreas Kleen <[email protected]> wrote:
>
>>Am Mi 26.10.2005 02:01 schrieb Chandra Seetharaman
>><[email protected]>:
>>
>>
>>>>Better would be likely to use RCU.
>>>
>>>RCU will be a problem if the registered notifiers need to block.
>>
>>?
>>Actually blocking should be ok, as long as the blocking notifier doesn't
>>unregister
>>itself. The current next pointer will be always reloaded after the
>>blocking.
>
>
> Blocking would be OK as long as you reference count the objects.


With or without RCU? I.e., it's just a straighforward reference counting
solution and RCU is being used to allow incrementing the reference counts
safely. Otherwise, without RCU you'd need a lock to safely increment the
reference counts. Also note that with reference counting, the deletes of
the objects can occur any time a reference count is decremented. So that
would include the notify_call threads as well.

--
Joe Seigh

2005-10-29 14:51:44

by Alan Stern

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Fri, 28 Oct 2005, Chandra Seetharaman wrote:

> On Fri, 2005-10-28 at 10:23 -0400, Alan Stern wrote:
> > On Thu, 27 Oct 2005, Chandra Seetharaman wrote:
> >
> > > So, requirements to fix the bug are:
> > > - no sleeping in register/unregister(if we want to keep the
> > > current way of use. We can change it and make the relevant
> > > changes in the kernel code, if it is agreeable)
> >
> > I think we will have to make these changes. In principal it shouldn't be
> > hard to add a simple "enabled" flag to each callout which currently is
> > registered/unregistered atomically or while running. We could even put
> > such a flag into the notifier_block structure and add routines to set or
> > clear it, using appropriate barriers.
>
> I do not understand the purpose of enabled flag. Can you clarify

Something like this:

struct notifier_block {
int (*notifier_call)(struct notifier_block *self, unsigned long,
void *);
struct list_head node;
int priority;
int enabled;
};

int notifier_call_chain(struct notifier_head *nh, unsigned long val,
void *v)
{
int ret = 0;
notifier_block *b;

if (list_empty(&nh->chain)) /* Optimize for common case */
return ret;

smp_rmb();
list_for_each_entry(b, &nh->chain, node) {
if (b->enabled) {
ret = b->notifier_call(b, val, v);
if (ret & NOTIFY_STOP_MASK)
break;
}
}

return ret;
}

#define notifier_block_enable(b) set_wmb((b)->enabled, 1)
#define notifier_block_disable(b) set_wmb((b)->enabled, 0)


It occurred to me that there _is_ a way to do unregister for atomic chains
without blocking. Add to struct notifier_head

atomic_t num_callers;

Then in notifier_call_chain, do atomic_inc(&nh->num_callers) at the start
and atomic_dec(&nh->num_callers) at the end. Finally, make unregister do
this:

int notifier_chain_unregister(struct notifier_head *nh,
struct notifier_block *n)
{
if (nh->type == ATOMIC_NOTIFIER) {
spin_lock(nh->lock);
list_del(&n->node);
smp_mb();
while (atomic_read(&nh->num_callers) > 0)
cpu_relax();
spin_unlock(nh->lock);
} else {
...
}
return 0;
}

I don't mean to suggest that this is better than using RCU, and with
notifier_block_disable it probably isn't needed. However it is worth
thinking about.

Alan Stern

2005-10-31 22:23:05

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Sat, 2005-10-29 at 10:51 -0400, Alan Stern wrote:
> On Fri, 28 Oct 2005, Chandra Seetharaman wrote:
>
> > On Fri, 2005-10-28 at 10:23 -0400, Alan Stern wrote:
> > > On Thu, 27 Oct 2005, Chandra Seetharaman wrote:
> > >
> > > > So, requirements to fix the bug are:
> > > > - no sleeping in register/unregister(if we want to keep the
> > > > current way of use. We can change it and make the relevant
> > > > changes in the kernel code, if it is agreeable)
> > >
> > > I think we will have to make these changes. In principal it shouldn't be
> > > hard to add a simple "enabled" flag to each callout which currently is
> > > registered/unregistered atomically or while running. We could even put
> > > such a flag into the notifier_block structure and add routines to set or
> > > clear it, using appropriate barriers.
> >
> > I do not understand the purpose of enabled flag. Can you clarify
>
> Something like this:
>
> struct notifier_block {
> int (*notifier_call)(struct notifier_block *self, unsigned long,
> void *);
> struct list_head node;
> int priority;
> int enabled;
> };
>
> int notifier_call_chain(struct notifier_head *nh, unsigned long val,
> void *v)
> {
> int ret = 0;
> notifier_block *b;
>
> if (list_empty(&nh->chain)) /* Optimize for common case */
> return ret;
>
> smp_rmb();
> list_for_each_entry(b, &nh->chain, node) {
> if (b->enabled) {
> ret = b->notifier_call(b, val, v);
> if (ret & NOTIFY_STOP_MASK)
> break;
> }
> }
>
> return ret;
> }
>
> #define notifier_block_enable(b) set_wmb((b)->enabled, 1)
> #define notifier_block_disable(b) set_wmb((b)->enabled, 0)
>

I am not getting the complete picture. So, in unregister we would just
disable and never delete the notifier_block ? Or
notifier_block_enable/disable will be used by external entities
directly ?

>
> It occurred to me that there _is_ a way to do unregister for atomic chains
> without blocking. Add to struct notifier_head
>
> atomic_t num_callers;
>
> Then in notifier_call_chain, do atomic_inc(&nh->num_callers) at the start
> and atomic_dec(&nh->num_callers) at the end. Finally, make unregister do
> this:
>
> int notifier_chain_unregister(struct notifier_head *nh,
> struct notifier_block *n)
> {
> if (nh->type == ATOMIC_NOTIFIER) {
> spin_lock(nh->lock);
> list_del(&n->node);
> smp_mb();
> while (atomic_read(&nh->num_callers) > 0)
> cpu_relax();
> spin_unlock(nh->lock);
> } else {
> ...
> }
> return 0;
> }

But, how is the list protected in call_chain (will you be holding the
lock in call_chain() while incrementing the atomic variable).

>
> I don't mean to suggest that this is better than using RCU, and with
> notifier_block_disable it probably isn't needed. However it is worth
> thinking about.
>
> Alan Stern
>
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2005-11-01 15:24:40

by Alan Stern

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Mon, 31 Oct 2005, Chandra Seetharaman wrote:

> > #define notifier_block_enable(b) set_wmb((b)->enabled, 1)
> > #define notifier_block_disable(b) set_wmb((b)->enabled, 0)
>
> I am not getting the complete picture. So, in unregister we would just
> disable and never delete the notifier_block ? Or
> notifier_block_enable/disable will be used by external entities
> directly ?

Register and unregister will continue to work as before, requiring a
process context and the ability to sleep. notifier_block_enable/disable
should be used when:

a callout wants to disable itself as it is running, or

someone running in an atomic context wants to enable or disable
a callout.

In the first case, unregister can't be used because it would hang. In the
second case, register/unregister can't be used because they need to be
able to sleep.

In both cases the notifier block would have to be registered beforehand
and unregistered later.


> > It occurred to me that there _is_ a way to do unregister for atomic chains
> > without blocking. Add to struct notifier_head
> >
> > atomic_t num_callers;
> >
> > Then in notifier_call_chain, do atomic_inc(&nh->num_callers) at the start
> > and atomic_dec(&nh->num_callers) at the end. Finally, make unregister do
> > this:
> >
> > int notifier_chain_unregister(struct notifier_head *nh,
> > struct notifier_block *n)
> > {
> > if (nh->type == ATOMIC_NOTIFIER) {
> > spin_lock(nh->lock);
> > list_del(&n->node);
> > smp_mb();
> > while (atomic_read(&nh->num_callers) > 0)
> > cpu_relax();
> > spin_unlock(nh->lock);
> > } else {
> > ...
> > }
> > return 0;
> > }
>
> But, how is the list protected in call_chain (will you be holding the
> lock in call_chain() while incrementing the atomic variable).

No; the list _won't_ be protected in call_chain. It will be possible to
unregister a callout while the chain is in use. That's how the RCU
approach works -- it uses no read locks, only write locks.

Deleting an entry while the list is in use is safe, because readers will
encounter either the old or the new value of the .next pointer, and either
one will be valid. The important thing is to make sure that no one will
ever encounter the old pointer after unregister returns; that's what the
"while" loop is for.

Alan Stern

2005-11-01 20:20:40

by Chandra Seetharaman

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Tue, 2005-11-01 at 10:24 -0500, Alan Stern wrote:
> On Mon, 31 Oct 2005, Chandra Seetharaman wrote:
>
> > > #define notifier_block_enable(b) set_wmb((b)->enabled, 1)
> > > #define notifier_block_disable(b) set_wmb((b)->enabled, 0)
> >
> > I am not getting the complete picture. So, in unregister we would just
> > disable and never delete the notifier_block ? Or
> > notifier_block_enable/disable will be used by external entities
> > directly ?
>
> Register and unregister will continue to work as before, requiring a
> process context and the ability to sleep. notifier_block_enable/disable
> should be used when:
>
> a callout wants to disable itself as it is running, or
>
> someone running in an atomic context wants to enable or disable
> a callout.
>
> In the first case, unregister can't be used because it would hang. In the
> second case, register/unregister can't be used because they need to be
> able to sleep.
>
> In both cases the notifier block would have to be registered beforehand
> and unregistered later.

I understand. Thanks for the explanation. I like the option below better
(no new interface).
>
>
> > > It occurred to me that there _is_ a way to do unregister for atomic chains
> > > without blocking. Add to struct notifier_head
> > >
> > > atomic_t num_callers;
> > >
> > > Then in notifier_call_chain, do atomic_inc(&nh->num_callers) at the start
> > > and atomic_dec(&nh->num_callers) at the end. Finally, make unregister do
> > > this:
> > >
> > > int notifier_chain_unregister(struct notifier_head *nh,
> > > struct notifier_block *n)
> > > {
> > > if (nh->type == ATOMIC_NOTIFIER) {
> > > spin_lock(nh->lock);
> > > list_del(&n->node);
> > > smp_mb();
> > > while (atomic_read(&nh->num_callers) > 0)
> > > cpu_relax();
> > > spin_unlock(nh->lock);
> > > } else {
> > > ...
> > > }
> > > return 0;
> > > }
> >
> > But, how is the list protected in call_chain (will you be holding the
> > lock in call_chain() while incrementing the atomic variable).
>
> No; the list _won't_ be protected in call_chain. It will be possible to
> unregister a callout while the chain is in use. That's how the RCU
> approach works -- it uses no read locks, only write locks.

but, list_del poisons the next pointer which is not good for a reader
that is walking through the list, we have to use list_del_rcu instead.

Also, do you think we have to use _rcu versions of list traversal
functions in call_chain ?
>
> Deleting an entry while the list is in use is safe, because readers will
> encounter either the old or the new value of the .next pointer, and either
> one will be valid. The important thing is to make sure that no one will
> ever encounter the old pointer after unregister returns; that's what the
> "while" loop is for.
>
> Alan Stern
>
>
--

----------------------------------------------------------------------
Chandra Seetharaman | Be careful what you choose....
- [email protected] | .......you may get it.
----------------------------------------------------------------------


2005-11-01 21:20:46

by Alan Stern

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Tue, 1 Nov 2005, Chandra Seetharaman wrote:

> > Register and unregister will continue to work as before, requiring a
> > process context and the ability to sleep. notifier_block_enable/disable
> > should be used when:
> >
> > a callout wants to disable itself as it is running, or
> >
> > someone running in an atomic context wants to enable or disable
> > a callout.
> >
> > In the first case, unregister can't be used because it would hang. In the
> > second case, register/unregister can't be used because they need to be
> > able to sleep.
> >
> > In both cases the notifier block would have to be registered beforehand
> > and unregistered later.
>
> I understand. Thanks for the explanation. I like the option below better
> (no new interface).

You mean the RCU-style update? It will hang when a callout routine tries
to deregister itself as it is running, although we could add a new
unregister_self API to handle that. Just check for num_callers equal to 1
instead of 0.

> > No; the list _won't_ be protected in call_chain. It will be possible to
> > unregister a callout while the chain is in use. That's how the RCU
> > approach works -- it uses no read locks, only write locks.
>
> but, list_del poisons the next pointer which is not good for a reader
> that is walking through the list, we have to use list_del_rcu instead.

Agreed.

> Also, do you think we have to use _rcu versions of list traversal
> functions in call_chain ?

Yes.

Note that on an SMP system you run the risk of starvation (the chain gets
called so frequently that it's _always_ running on some CPU). We can
probably ignore that possibility.

Would you like to code up these ideas?

Alan Stern

2005-11-02 09:50:23

by Keith Owens

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Tue, 1 Nov 2005 16:20:43 -0500 (EST),
Alan Stern <[email protected]> wrote:
>You mean the RCU-style update? It will hang when a callout routine tries
>to deregister itself as it is running, although we could add a new
>unregister_self API to handle that. Just check for num_callers equal to 1
>instead of 0.

A callout on an atomic notifer chain has no business calling the
register/unregister functions. It makes no sense for an atomic context
to call a routine that can sleep or block.

2005-11-02 16:03:25

by Alan Stern

[permalink] [raw]
Subject: Re: Notifier chains are unsafe

On Wed, 2 Nov 2005, Keith Owens wrote:

> On Tue, 1 Nov 2005 16:20:43 -0500 (EST),
> Alan Stern <[email protected]> wrote:
> >You mean the RCU-style update? It will hang when a callout routine tries
> >to deregister itself as it is running, although we could add a new
> >unregister_self API to handle that. Just check for num_callers equal to 1
> >instead of 0.
>
> A callout on an atomic notifer chain has no business calling the
> register/unregister functions. It makes no sense for an atomic context
> to call a routine that can sleep or block.

Ah, but what if the unregister function for atomic chains is implemented
in such a way that it doesn't sleep or block? That's what Chandra and I
have been discussing.

On the other hand, it's still true that for blocking chains, unregister
will have to acquire a write semaphore. We won't want callouts on
blocking chains (which already own the read semaphore) trying to
unregister themselves.

And in any case, it's cleaner for callouts never to unregister themselves.
That's why I tend to prefer the block_enable/disable solution.

Alan Stern