2006-01-18 16:34:19

by Alan Stern

[permalink] [raw]
Subject: [PATCH 1/8] Notifier chain update

The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain
while the chain is in use. The issues were discussed in this thread:

http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2

We noticed that notifier chains in the kernel fall into two basic
usage classes:

"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;

"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.

We decided to codify this distinction and make it part of the API.
Therefore this set of patches introduces three new, parallel APIs: one
for blocking notifiers, one for atomic notifiers, and one for "raw"
notifiers (which is really just the old API under a new name). New
kinds of data structures are used for the heads of the chains, and new
routines are defined for registration, unregistration, and calling a
chain. The three APIs are explained in include/linux/notifier.h and
their implementation is in kernel/sys.c.

With atomic and blocking chains, the implementation guarantees that
the chain links will not be corrupted and that chain callers will not
get messed up by entries being added or removed. For raw chains the
implementation provides no guarantees at all; users of this API must
provide their own protections. (The idea was that situations may come
up where the assumptions of the atomic and blocking APIs are not
appropriate, so it should be possible for users to handle these things
in their own way.)

There are some limitations, which should not be too hard to live with.
For atomic/blocking chains, registration and unregistration must
always be done in a process context since the chain is protected by a
mutex/rwsem. Also, a callout routine for a non-raw chain must not try
to register or unregister entries on its own chain. (This did happen
in a couple of places and the code had to be changed to avoid it.)

Since atomic chains may be called from within an NMI handler, they
cannot use spinlocks for synchronization. Instead we use RCU. The
overhead falls almost entirely in the unregister routine, which is
okay since unregistration is much less frequent that calling a chain.


Here is the list of chains that we adjusted and their classifications.
None of them use the raw API, so for the moment it is only a
placeholder.

ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain

BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain


The patch set is organized as follows:

1 of 8 (this file (as632)): Introduce the new API and its implementation.
2 of 8: Simple changes that involve only the definition or declaration
of chain heads.
3 of 8: Changes in which we removed some local protection (it's no
longer needed since the core now provides the protection).
4 of 8: Changes for diechain in various architectures (it's now safe
to unregister entries in this chain).
5 of 8: Changes to routines that try to unregister themselves.
6 of 8: Changes to dcdbas.c (it requires special handling).
7 of 8: Changes to make usb_notify use the new API instead of its own.
8 of 8: Changes from the old API names to the new names.


The patch set was written by Alan Stern and Chandra Seetharaman,
incorporating material written by Keith Owens and suggestions from
Paul McKenney and Andrew Morton.

Signed-off-by: Alan Stern <[email protected]>
Signed-off-by: Chandra Seetharaman <[email protected]>

---

These patches are against 2.6.16-rc1.

Index: l2616/include/linux/notifier.h
===================================================================
--- l2616.orig/include/linux/notifier.h
+++ l2616/include/linux/notifier.h
@@ -10,25 +10,100 @@
#ifndef _LINUX_NOTIFIER_H
#define _LINUX_NOTIFIER_H
#include <linux/errno.h>
+#include <linux/mutex.h>
+#include <linux/rwsem.h>

-struct notifier_block
-{
- int (*notifier_call)(struct notifier_block *self, unsigned long, void *);
+/*
+ * Notifier chains are of three types:
+ *
+ * Atomic notifier chains: Chain callbacks run in interrupt/atomic
+ * context. Callouts are not allowed to block.
+ * Blocking notifier chains: Chain callbacks run in process context.
+ * Callouts are allowed to block.
+ * Raw notifier chains: There are no restrictions on callbacks,
+ * registration, or unregistration. All locking and protection
+ * must be provided by the caller.
+ *
+ * atomic_notifier_chain_register() and blocking_notifier_chain_register()
+ * may be called only from process context, and likewise for the
+ * corresponding _unregister() routines.
+ *
+ * atomic_notifier_chain_unregister() and blocking_notifier_chain_unregister()
+ * _must not_ be called from within the call chain.
+ */
+
+struct notifier_block {
+ int (*notifier_call)(struct notifier_block *, unsigned long, void *);
struct notifier_block *next;
int priority;
};

+struct atomic_notifier_head {
+ struct mutex mutex;
+ struct notifier_block *head;
+};
+
+struct blocking_notifier_head {
+ struct rw_semaphore rwsem;
+ struct notifier_block *head;
+};
+
+struct raw_notifier_head {
+ struct notifier_block *head;
+};
+
+#define ATOMIC_INIT_NOTIFIER_HEAD(name) do { \
+ mutex_init(&(name)->mutex); \
+ (name)->head = NULL; \
+ } while (0)
+#define BLOCKING_INIT_NOTIFIER_HEAD(name) do { \
+ init_rwsem(&(name)->rwsem); \
+ (name)->head = NULL; \
+ } while (0)
+#define RAW_INIT_NOTIFIER_HEAD(name) do { \
+ (name)->head = NULL; \
+ } while (0)
+
+#define ATOMIC_NOTIFIER_HEAD(name) \
+ struct atomic_notifier_head name = { \
+ .mutex = __MUTEX_INITIALIZER((name).mutex), \
+ .head = NULL }
+#define BLOCKING_NOTIFIER_HEAD(name) \
+ struct blocking_notifier_head name = { \
+ .rwsem = __RWSEM_INITIALIZER((name).rwsem), \
+ .head = NULL }
+#define RAW_NOTIFIER_HEAD(name) \
+ struct raw_notifier_head name = { \
+ .head = NULL }

#ifdef __KERNEL__

-extern int notifier_chain_register(struct notifier_block **list, struct notifier_block *n);
-extern int notifier_chain_unregister(struct notifier_block **nl, struct notifier_block *n);
-extern int notifier_call_chain(struct notifier_block **n, unsigned long val, void *v);
+extern int atomic_notifier_chain_register(struct atomic_notifier_head *,
+ struct notifier_block *);
+extern int blocking_notifier_chain_register(struct blocking_notifier_head *,
+ struct notifier_block *);
+extern int raw_notifier_chain_register(struct raw_notifier_head *,
+ struct notifier_block *);
+
+extern int atomic_notifier_chain_unregister(struct atomic_notifier_head *,
+ struct notifier_block *);
+extern int blocking_notifier_chain_unregister(struct blocking_notifier_head *,
+ struct notifier_block *);
+extern int raw_notifier_chain_unregister(struct raw_notifier_head *,
+ struct notifier_block *);
+
+extern int atomic_notifier_call_chain(struct atomic_notifier_head *,
+ unsigned long val, void *v);
+extern int blocking_notifier_call_chain(struct blocking_notifier_head *,
+ unsigned long val, void *v);
+extern int raw_notifier_call_chain(struct raw_notifier_head *,
+ unsigned long val, void *v);

#define NOTIFY_DONE 0x0000 /* Don't care */
#define NOTIFY_OK 0x0001 /* Suits me */
#define NOTIFY_STOP_MASK 0x8000 /* Don't call further */
-#define NOTIFY_BAD (NOTIFY_STOP_MASK|0x0002) /* Bad/Veto action */
+#define NOTIFY_BAD (NOTIFY_STOP_MASK|0x0002)
+ /* Bad/Veto action */
/*
* Clean way to return from the notifier and stop further calls.
*/
Index: l2616/kernel/sys.c
===================================================================
--- l2616.orig/kernel/sys.c
+++ l2616/kernel/sys.c
@@ -96,98 +96,287 @@ int cad_pid = 1;
*/

static struct notifier_block *reboot_notifier_list;
-static DEFINE_RWLOCK(notifier_lock);
+
+/*
+ * Notifier chain core routines. The exported routines below
+ * are layered on top of these, with appropriate locking added.
+ */
+
+static int notifier_chain_register(struct notifier_block **nl,
+ struct notifier_block *n)
+{
+ while ((*nl) != NULL) {
+ if (n->priority > (*nl)->priority)
+ break;
+ nl = &((*nl)->next);
+ }
+ n->next = *nl;
+ rcu_assign_pointer(*nl, n);
+ return 0;
+}
+
+static int notifier_chain_unregister(struct notifier_block **nl,
+ struct notifier_block *n)
+{
+ while ((*nl) != NULL) {
+ if ((*nl) == n) {
+ rcu_assign_pointer(*nl, n->next);
+ return 0;
+ }
+ nl = &((*nl)->next);
+ }
+ return -ENOENT;
+}
+
+static int __kprobes notifier_call_chain(struct notifier_block **nl,
+ unsigned long val, void *v)
+{
+ int ret = NOTIFY_DONE;
+ struct notifier_block *nb;
+
+ nb = rcu_dereference(*nl);
+ while (nb) {
+ ret = nb->notifier_call(nb, val, v);
+ if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
+ break;
+ nb = rcu_dereference(nb->next);
+ }
+ return ret;
+}
+
+/*
+ * Atomic notifier chain routines. Registration and unregistration
+ * use a mutex, and call_chain is synchronized by RCU (no locks).
+ */

/**
- * notifier_chain_register - Add notifier to a notifier chain
- * @list: Pointer to root list pointer
+ * atomic_notifier_chain_register - Add notifier to an atomic notifier chain
+ * @nh: Pointer to head of the atomic notifier chain
* @n: New entry in notifier chain
*
- * Adds a notifier to a notifier chain.
+ * Adds a notifier to an atomic notifier chain.
+ * Must be called in process context.
*
* Currently always returns zero.
*/
+
+int atomic_notifier_chain_register(struct atomic_notifier_head *nh,
+ struct notifier_block *n)
+{
+ int ret;
+
+ mutex_lock(&nh->mutex);
+ ret = notifier_chain_register(&nh->head, n);
+ mutex_unlock(&nh->mutex);
+ return ret;
+}
+
+EXPORT_SYMBOL(atomic_notifier_chain_register);
+
+/**
+ * atomic_notifier_chain_unregister - Remove notifier from an atomic notifier chain
+ * @nh: Pointer to head of the atomic notifier chain
+ * @n: Entry to remove from notifier chain
+ *
+ * Removes a notifier from an atomic notifier chain.
+ * Must be called from process context.
+ *
+ * Returns zero on success or %-ENOENT on failure.
+ */
+int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh,
+ struct notifier_block *n)
+{
+ int ret;
+
+ mutex_lock(&nh->mutex);
+ ret = notifier_chain_unregister(&nh->head, n);
+ mutex_unlock(&nh->mutex);
+ synchronize_rcu();
+ return ret;
+}
+
+EXPORT_SYMBOL(atomic_notifier_chain_unregister);
+
+/**
+ * atomic_notifier_call_chain - Call functions in an atomic notifier chain
+ * @nh: Pointer to head of the atomic notifier chain
+ * @val: Value passed unmodified to notifier function
+ * @v: Pointer passed unmodified to notifier function
+ *
+ * Calls each function in a notifier chain in turn. The functions
+ * run in an atomic context, so they must not block.
+ * This routine uses RCU to synchronize with changes to the chain.
+ *
+ * If the return value of the notifier can be and'ed
+ * with %NOTIFY_STOP_MASK then atomic_notifier_call_chain
+ * will return immediately, with the return value of
+ * the notifier function which halted execution.
+ * Otherwise the return value is the return value
+ * of the last notifier function called.
+ */

-int notifier_chain_register(struct notifier_block **list, struct notifier_block *n)
+int atomic_notifier_call_chain(struct atomic_notifier_head *nh,
+ unsigned long val, void *v)
{
- write_lock(&notifier_lock);
- while(*list)
- {
- if(n->priority > (*list)->priority)
- break;
- list= &((*list)->next);
- }
- n->next = *list;
- *list=n;
- write_unlock(&notifier_lock);
- return 0;
+ int ret;
+
+ rcu_read_lock();
+ ret = notifier_call_chain(&nh->head, val, v);
+ rcu_read_unlock();
+ return ret;
}

-EXPORT_SYMBOL(notifier_chain_register);
+EXPORT_SYMBOL(atomic_notifier_call_chain);
+
+/*
+ * Blocking notifier chain routines. All access to the chain is
+ * synchronized by an rwsem.
+ */

/**
- * notifier_chain_unregister - Remove notifier from a notifier chain
- * @nl: Pointer to root list pointer
+ * blocking_notifier_chain_register - Add notifier to a blocking notifier chain
+ * @nh: Pointer to head of the blocking notifier chain
* @n: New entry in notifier chain
*
- * Removes a notifier from a notifier chain.
+ * Adds a notifier to a blocking notifier chain.
+ * Must be called in process context.
*
- * Returns zero on success, or %-ENOENT on failure.
+ * Currently always returns zero.
*/

-int notifier_chain_unregister(struct notifier_block **nl, struct notifier_block *n)
+int blocking_notifier_chain_register(struct blocking_notifier_head *nh,
+ struct notifier_block *n)
{
- write_lock(&notifier_lock);
- while((*nl)!=NULL)
- {
- if((*nl)==n)
- {
- *nl=n->next;
- write_unlock(&notifier_lock);
- return 0;
- }
- nl=&((*nl)->next);
- }
- write_unlock(&notifier_lock);
- return -ENOENT;
+ int ret;
+
+ down_write(&nh->rwsem);
+ ret = notifier_chain_register(&nh->head, n);
+ up_write(&nh->rwsem);
+ return ret;
}

-EXPORT_SYMBOL(notifier_chain_unregister);
+EXPORT_SYMBOL(blocking_notifier_chain_register);

/**
- * notifier_call_chain - Call functions in a notifier chain
- * @n: Pointer to root pointer of notifier chain
+ * blocking_notifier_chain_unregister - Remove notifier from a blocking notifier chain
+ * @nh: Pointer to head of the blocking notifier chain
+ * @n: Entry to remove from notifier chain
+ *
+ * Removes a notifier from a blocking notifier chain.
+ * Must be called from process context.
+ *
+ * Returns zero on success or %-ENOENT on failure.
+ */
+int blocking_notifier_chain_unregister(struct blocking_notifier_head *nh,
+ struct notifier_block *n)
+{
+ int ret;
+
+ down_write(&nh->rwsem);
+ ret = notifier_chain_unregister(&nh->head, n);
+ up_write(&nh->rwsem);
+ return ret;
+}
+
+EXPORT_SYMBOL(blocking_notifier_chain_unregister);
+
+/**
+ * blocking_notifier_call_chain - Call functions in a blocking notifier chain
+ * @nh: Pointer to head of the blocking notifier chain
* @val: Value passed unmodified to notifier function
* @v: Pointer passed unmodified to notifier function
*
- * Calls each function in a notifier chain in turn.
+ * Calls each function in a notifier chain in turn. The functions
+ * run in a process context, so they are allowed to block.
*
- * If the return value of the notifier can be and'd
- * with %NOTIFY_STOP_MASK, then notifier_call_chain
+ * If the return value of the notifier can be and'ed
+ * with %NOTIFY_STOP_MASK then blocking_notifier_call_chain
* will return immediately, with the return value of
* the notifier function which halted execution.
- * Otherwise, the return value is the return value
+ * Otherwise the return value is the return value
* of the last notifier function called.
*/

-int __kprobes notifier_call_chain(struct notifier_block **n, unsigned long val, void *v)
+int blocking_notifier_call_chain(struct blocking_notifier_head *nh,
+ unsigned long val, void *v)
{
- int ret=NOTIFY_DONE;
- struct notifier_block *nb = *n;
+ int ret;

- while(nb)
- {
- ret=nb->notifier_call(nb,val,v);
- if(ret&NOTIFY_STOP_MASK)
- {
- return ret;
- }
- nb=nb->next;
- }
+ down_read(&nh->rwsem);
+ ret = notifier_call_chain(&nh->head, val, v);
+ up_read(&nh->rwsem);
return ret;
}

-EXPORT_SYMBOL(notifier_call_chain);
+EXPORT_SYMBOL(blocking_notifier_call_chain);
+
+/*
+ * Raw notifier chain routines. There is no protection;
+ * the caller must provide it. Use at your own risk!
+ */
+
+/**
+ * raw_notifier_chain_register - Add notifier to a raw notifier chain
+ * @nh: Pointer to head of the raw notifier chain
+ * @n: New entry in notifier chain
+ *
+ * Adds a notifier to a raw notifier chain.
+ * All locking must be provided by the caller.
+ *
+ * Currently always returns zero.
+ */
+
+int raw_notifier_chain_register(struct raw_notifier_head *nh,
+ struct notifier_block *n)
+{
+ return notifier_chain_register(&nh->head, n);
+}
+
+EXPORT_SYMBOL(raw_notifier_chain_register);
+
+/**
+ * raw_notifier_chain_unregister - Remove notifier from a raw notifier chain
+ * @nh: Pointer to head of the raw notifier chain
+ * @n: Entry to remove from notifier chain
+ *
+ * Removes a notifier from a raw notifier chain.
+ * All locking must be provided by the caller.
+ *
+ * Returns zero on success or %-ENOENT on failure.
+ */
+int raw_notifier_chain_unregister(struct raw_notifier_head *nh,
+ struct notifier_block *n)
+{
+ return notifier_chain_unregister(&nh->head, n);
+}
+
+EXPORT_SYMBOL(raw_notifier_chain_unregister);
+
+/**
+ * raw_notifier_call_chain - Call functions in a raw notifier chain
+ * @nh: Pointer to head of the raw notifier chain
+ * @val: Value passed unmodified to notifier function
+ * @v: Pointer passed unmodified to notifier function
+ *
+ * Calls each function in a notifier chain in turn. The functions
+ * run in an undefined context.
+ * All locking must be provided by the caller.
+ *
+ * If the return value of the notifier can be and'ed
+ * with %NOTIFY_STOP_MASK then raw_notifier_call_chain
+ * will return immediately, with the return value of
+ * the notifier function which halted execution.
+ * Otherwise the return value is the return value
+ * of the last notifier function called.
+ */
+
+int raw_notifier_call_chain(struct raw_notifier_head *nh,
+ unsigned long val, void *v)
+{
+ return notifier_call_chain(&nh->head, val, v);
+}
+
+EXPORT_SYMBOL(raw_notifier_call_chain);

/**
* register_reboot_notifier - Register function to be called at reboot time



2006-01-18 18:24:09

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

On Wed, Jan 18, 2006 at 11:34:12AM -0500, Alan Stern wrote:
> There are some limitations, which should not be too hard to live with.
> For atomic/blocking chains, registration and unregistration must
> always be done in a process context since the chain is protected by a
> mutex/rwsem. Also, a callout routine for a non-raw chain must not try
> to register or unregister entries on its own chain. (This did happen
> in a couple of places and the code had to be changed to avoid it.)

This is bad, as rwsems are pretty much guaranteed to be a cache miss on
smp systems, so their addition makes these code paths scale much more
poorly than is needed. Given the current approach to modules, would it
not make sense to simply require that any code that the notifier paths
touch simply remain loaded in the kernel? In that case rcu protection
of the pointers would suffice for the hooks.

-ben

2006-01-18 20:23:23

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

On Wed, 18 Jan 2006, Benjamin LaHaise wrote:

> On Wed, Jan 18, 2006 at 11:34:12AM -0500, Alan Stern wrote:
> > There are some limitations, which should not be too hard to live with.
> > For atomic/blocking chains, registration and unregistration must
> > always be done in a process context since the chain is protected by a
> > mutex/rwsem. Also, a callout routine for a non-raw chain must not try
> > to register or unregister entries on its own chain. (This did happen
> > in a couple of places and the code had to be changed to avoid it.)
>
> This is bad, as rwsems are pretty much guaranteed to be a cache miss on
> smp systems, so their addition makes these code paths scale much more
> poorly than is needed. Given the current approach to modules, would it
> not make sense to simply require that any code that the notifier paths
> touch simply remain loaded in the kernel? In that case rcu protection
> of the pointers would suffice for the hooks.

You can't use RCU protection around code that may sleep. Whether the code
remains loaded in the kernel or is part of a removable module doesn't
enter into it.

Alan Stern

2006-01-18 21:46:14

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

On Wed, Jan 18, 2006 at 03:23:20PM -0500, Alan Stern wrote:
> You can't use RCU protection around code that may sleep. Whether the code
> remains loaded in the kernel or is part of a removable module doesn't
> enter into it.

A notifier callee should not be sleeping, if anything it should be putting
its work onto a workqueue and completing it when it gets scheduled if it
has to do something that blocks.

-ben
--
"You know, I've seen some crystals do some pretty trippy shit, man."
Don't Email: <[email protected]>.

2006-01-18 21:57:33

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

On Wed, 18 Jan 2006, Benjamin LaHaise wrote:

> A notifier callee should not be sleeping, if anything it should be putting
> its work onto a workqueue and completing it when it gets scheduled if it
> has to do something that blocks.

Sez who? If it's not documented in the kernel source, I don't believe
it.

Alan Stern

2006-01-18 22:01:04

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

From: Alan Stern <[email protected]>
Date: Wed, 18 Jan 2006 16:57:30 -0500 (EST)

> On Wed, 18 Jan 2006, Benjamin LaHaise wrote:
>
> > A notifier callee should not be sleeping, if anything it should be putting
> > its work onto a workqueue and completing it when it gets scheduled if it
> > has to do something that blocks.
>
> Sez who? If it's not documented in the kernel source, I don't believe
> it.

Many notifiers even get run from software interrupt context,
making sleeping illegal.

For example, IPV6 addresses can get added/removed from a device
in response to packets, and these operations trigger the
inet6addr_chain notifier in net/ipv6/addrconf.c

So sleeping in a notifier is indeed illegal.

2006-01-18 22:04:06

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

On Wed, 18 Jan 2006, David S. Miller wrote:

> From: Alan Stern <[email protected]>
> Date: Wed, 18 Jan 2006 16:57:30 -0500 (EST)
>
> > On Wed, 18 Jan 2006, Benjamin LaHaise wrote:
> >
> > > A notifier callee should not be sleeping, if anything it should be putting
> > > its work onto a workqueue and completing it when it gets scheduled if it
> > > has to do something that blocks.
> >
> > Sez who? If it's not documented in the kernel source, I don't believe
> > it.
>
> Many notifiers even get run from software interrupt context,
> making sleeping illegal.
>
> For example, IPV6 addresses can get added/removed from a device
> in response to packets, and these operations trigger the
> inet6addr_chain notifier in net/ipv6/addrconf.c
>
> So sleeping in a notifier is indeed illegal.

Correction: sleeping in an atomic notifier (like inet6addr_chain) callout
is illegal.

But there are plenty of notifier chains that are always invoked in process
context and where the callout routines may indeed block.

Alan Stern

2006-01-18 22:05:28

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

On Wed, Jan 18, 2006 at 04:57:30PM -0500, Alan Stern wrote:
> Sez who? If it's not documented in the kernel source, I don't believe
> it.

The notifier interface is supposed to be *light weight*. Adding locks
that get taken on every call basically changes the concept entirely. The
cache misses notifiers add are measurable overhead, with locks being far
worse.

-ben
--
"You know, I've seen some crystals do some pretty trippy shit, man."
Don't Email: <[email protected]>.

2006-01-18 22:09:14

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

On Wed, 18 Jan 2006, Benjamin LaHaise wrote:

> The notifier interface is supposed to be *light weight*.

Again, where is that documented?

> Adding locks
> that get taken on every call basically changes the concept entirely. The
> cache misses notifiers add are measurable overhead, with locks being far
> worse.

Which is worse: overhead due to cache misses or an oops caused by code
being called after it was unloaded?

Do you have a better proposal for a way to prevent blocking notifier
chains from being modified while in use? Or would you prefer to rewrite
all the callout routines that currently block, so that all the notifier
chains can be made atomic and we don't need the blocking notifier API?

Alan Stern

2006-01-18 22:18:44

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

From: Alan Stern <[email protected]>
Date: Wed, 18 Jan 2006 17:09:10 -0500 (EST)

> Do you have a better proposal for a way to prevent blocking notifier
> chains from being modified while in use? Or would you prefer to rewrite
> all the callout routines that currently block, so that all the notifier
> chains can be made atomic and we don't need the blocking notifier API?

I think if the user needs special locking, they should implement
it.

If you need to block, take a mutex around the notifier calls.
(I nearly typed semaphore there, sorry Ingo! :-)

2006-01-18 22:19:25

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

On Wed, Jan 18, 2006 at 05:09:10PM -0500, Alan Stern wrote:
> On Wed, 18 Jan 2006, Benjamin LaHaise wrote:
>
> > The notifier interface is supposed to be *light weight*.
>
> Again, where is that documented?

Read the kernel. Notifiers are called from all sorts of hot paths, so they
damned well better be light.

> Which is worse: overhead due to cache misses or an oops caused by code
> being called after it was unloaded?

Given that the overhead need not be present at all, neither.

> Do you have a better proposal for a way to prevent blocking notifier
> chains from being modified while in use? Or would you prefer to rewrite
> all the callout routines that currently block, so that all the notifier
> chains can be made atomic and we don't need the blocking notifier API?

Easy: in register_notifier stuff a serial number for each entry put on
a notifier chain. Remember the serial number of the entry before performing
->notifier_call in notifier_call_chain. Upon return, if the chain has been
modified (easy to detect by nature of the serial number changing), walk
the chain looking for the entry following the last serial number run. Voila,
rcu can be used to protect the chain's contents.

-ben
--
"You know, I've seen some crystals do some pretty trippy shit, man."
Don't Email: <[email protected]>.

2006-01-18 22:22:26

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

On Wed, Jan 18, 2006 at 02:18:41PM -0800, David S. Miller wrote:
> If you need to block, take a mutex around the notifier calls.
> (I nearly typed semaphore there, sorry Ingo! :-)

I'm arguing against adding the mutex. We don't need things like munmap()
on every cpu doing a down_read() on system wide semaphore.

-ben
--
"You know, I've seen some crystals do some pretty trippy shit, man."
Don't Email: <[email protected]>.

2006-01-18 23:07:12

by Alan

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

On Mer, 2006-01-18 at 14:00 -0800, David S. Miller wrote:
> For example, IPV6 addresses can get added/removed from a device
> in response to packets, and these operations trigger the
> inet6addr_chain notifier in net/ipv6/addrconf.c
>
> So sleeping in a notifier is indeed illegal.

On the specific example yet. Notifiers get used for many things and
there has never been a rule about them not sleeping. There are lots of
cases where notifiers sleeping make sense including its early use in
power manglement.

Notifiers should not have locks. That was intentional in the original
implementation. You want locks, you implement them in the API *using*
the notifier, because its odds on you actually need to hold that lock
for other things too.

Alan

2006-01-18 23:08:05

by Alan

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

On Mer, 2006-01-18 at 16:42 -0500, Benjamin LaHaise wrote:
> A notifier callee should not be sleeping, if anything it should be putting
> its work onto a workqueue and completing it when it gets scheduled if it
> has to do something that blocks.

Notifiers impose sequential ordering and the ability to abort or
complete a sequence early. Workqueues do something different.

2006-01-19 03:33:43

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

On Wed, 18 Jan 2006, Benjamin LaHaise wrote:

> On Wed, Jan 18, 2006 at 05:09:10PM -0500, Alan Stern wrote:
> > On Wed, 18 Jan 2006, Benjamin LaHaise wrote:
> >
> > > The notifier interface is supposed to be *light weight*.
> >
> > Again, where is that documented?
>
> Read the kernel. Notifiers are called from all sorts of hot paths, so they
> damned well better be light.

_Some_ notifiers better be light. Others can be heavier. And it sure
would be a good thing to indicate in the code which are which. That's a
lot better than trying to read and understand the entire kernel in order
to gather impressions about how a certain class of routines is used.

In the patch we classified chains as blocking or atomic based on how they
were used, not on how often they get called. The patch includes a
provision specifically for lightweight notifiers: the raw notifier type.
If you want to identify which chains should be switched over to the raw
type, then fine. But _you_ will have to provide protection for them.

> > Which is worse: overhead due to cache misses or an oops caused by code
> > being called after it was unloaded?
>
> Given that the overhead need not be present at all, neither.
>
> > Do you have a better proposal for a way to prevent blocking notifier
> > chains from being modified while in use? Or would you prefer to rewrite
> > all the callout routines that currently block, so that all the notifier
> > chains can be made atomic and we don't need the blocking notifier API?
>
> Easy: in register_notifier stuff a serial number for each entry put on
> a notifier chain. Remember the serial number of the entry before performing
> ->notifier_call in notifier_call_chain. Upon return, if the chain has been
> modified (easy to detect by nature of the serial number changing), walk
> the chain looking for the entry following the last serial number run. Voila,
> rcu can be used to protect the chain's contents.

What happens if the last serial number run is no longer on the chain?
You would never find it, and so would never know where to pick up from.
And what happens if the chain is changed while you are checking (or just
after you have checked) the serial number? There's still a race.

You see? Doing this correctly is not so easy after all...

If you think about it a little more carefully, you'll see why. To be
safe, unregistration has to guarantee when it returns that the entry being
removed is not in use and will not be called in the future. If other CPUs
are traversing the chain in a lightweight fashion, the only way you can
fulfill this guarantee is to wait until all the current traversers have
finished, a la RCU. And if the traversers can sleep, the simplest way to
wait for them is to use an rwsem.


On Wed, 18 Jan 2006, Alan Cox wrote:

> On Mer, 2006-01-18 at 14:00 -0800, David S. Miller wrote:
> > For example, IPV6 addresses can get added/removed from a device
> > in response to packets, and these operations trigger the
> > inet6addr_chain notifier in net/ipv6/addrconf.c
> >
> > So sleeping in a notifier is indeed illegal.
>
> On the specific example yet. Notifiers get used for many things and
> there has never been a rule about them not sleeping. There are lots of
> cases where notifiers sleeping make sense including its early use in
> power manglement.
>
> Notifiers should not have locks. That was intentional in the original
> implementation.

But not documented.

> You want locks, you implement them in the API *using*
> the notifier, because its odds on you actually need to hold that lock
> for other things too.

In going through the kernel, looking for places where notifier chains were
locked locally, there were surprisingly few instances (no more than one, I
think) where that lock was used for other things too.

There are a lot of notifier chains that _can_ use locks. That's what the
patch is for, to provide the locking for them, all in one place, so they
don't have to do it themselves in many different places.

For other notifier chains that don't want locking (or don't want the kind
of locking provided by the new API), there's always the raw notifier type.

Alan Stern

2006-01-19 09:55:59

by Jes Sorensen

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

>>>>> "Benjamin" == Benjamin LaHaise <[email protected]> writes:

Benjamin> On Wed, Jan 18, 2006 at 03:23:20PM -0500, Alan Stern wrote:
>> You can't use RCU protection around code that may sleep. Whether
>> the code remains loaded in the kernel or is part of a removable
>> module doesn't enter into it.

Benjamin> A notifier callee should not be sleeping, if anything it
Benjamin> should be putting its work onto a workqueue and completing
Benjamin> it when it gets scheduled if it has to do something that
Benjamin> blocks.

There are cases where all the notifier client would want to do is
kmalloc(); sum_up_some_data(); copy_to_user(); kfree();
very often in the exit() path - doing that via a workqueue seems
overly complex.

Cheers,
Jes

2006-01-19 10:01:33

by Jes Sorensen

[permalink] [raw]
Subject: Re: [PATCH 1/8] Notifier chain update

>>>>> "Benjamin" == Benjamin LaHaise <[email protected]> writes:

Benjamin> On Wed, Jan 18, 2006 at 05:09:10PM -0500, Alan Stern wrote:
>> On Wed, 18 Jan 2006, Benjamin LaHaise wrote:
>>
>> > The notifier interface is supposed to be *light weight*.
>>
>> Again, where is that documented?

Benjamin> Read the kernel. Notifiers are called from all sorts of hot
Benjamin> paths, so they damned well better be light.

Ben,

There' plenty of cases where adding a new notifier chain will be
preferred to adding a long list of if (foo) call() items. One place is
the fork()/exit() path for more system accounting. Being able to sleep
in those places is needed. Fortunately one can do them lock free as
they are in task context, but you need to be able to kmalloc() there.

Cheers,
Jes