2009-06-23 15:03:53

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 00/10] We must use rcu_barrier() on module unload

This patch series is an attempt to cleanup the entire tree, for
potential oops'es during module unload, due to outstanding RCU
callbacks. (My last rcu_barrier patch series only addressed net/).

If an unloadable module uses RCU callbacks, it need to use
rcu_barrier() so that the module may be safely unloaded.

For documentation see:

Paul E. McKenney's Blog
http://paulmck.livejournal.com/7314.html

http://lwn.net/Articles/217484/

Documentation/RCU/rcubarrier.txt


Looking through the Linux kernel for call_rcu() users and unloadable
modules I found 10 modules that didn't behave correctly.

Please: MAINTAINERS needs to verify that the module exit code prevent
any new RCU callbacks from being posted (before rcu_barrier() is
called). (I have tried to do this verification, but most of these
module are simply too large and complex for me to verify this within
reasonable time)

[Overview description, following patch ordering]

The modules ext4, bridge, mac80211, sunrpc, nfs and ipv6 are fairly
straight forward (maintainers still needs to check for prevent of new
RCU callbacks).

The module decnet, has disabled its module_exit() (since ^1da177e) but
it still seems relevant to keep the code updated.

The modules edac_core and cfq-iosched, has implemented their own
open-coded wait_for_completion() scheme, in order to wait for
call_rcu() calls. Maintainers needs to look into removing this code
and using rcu_barrier() instead.

The module nf_conntrack, has embedded some comments that I would like
Patrick McHardy to look at. As I'm not sure which is are most optimal
place to call rcu_barrier(). The patch probably calls rcu_barrier()
too much, but its a better safe than sorry approach.


I have made a patch for each individual module, so objections can be
made on a per module basis. I have Cc'ed all of the patches to the
maintainers of each module (according to the MAINTAINERS file).


The patchset is made on top of Linus Torvalds tree (starting on top of
commit f234012f52a3).

Who wants to pickup these patches? (I usually go through DaveM, but
this also touches subsystems that are not (yet?) under DaveM's
maintainer ship)


---
Jesper Dangaard Brouer (10):
nf_conntrack: Use rcu_barrier().
cfq-iosched: Uses its own open-coded rcu_barrier.
edac_core: Uses call_rcu() and its own wait_for_completion scheme.
decnet: Use rcu_barrier() on module unload.
ipv6: Use rcu_barrier() on module unload.
nfs: Use rcu_barrier() on module unload.
sunrpc: Use rcu_barrier() on unload.
mac80211: Use rcu_barrier() on unload.
bridge: Use rcu_barrier() instead of syncronize_net() on unload.
ext4: Use rcu_barrier() on module unload.


block/cfq-iosched.c | 6 ++++++
drivers/edac/edac_device.c | 5 +++++
drivers/edac/edac_mc.c | 5 +++++
drivers/edac/edac_pci.c | 5 +++++
fs/ext4/mballoc.c | 4 +++-
fs/nfs/inode.c | 1 +
net/bridge/br.c | 2 +-
net/decnet/af_decnet.c | 6 ++++++
net/ipv6/af_inet6.c | 2 ++
net/mac80211/main.c | 2 ++
net/netfilter/nf_conntrack_core.c | 5 +++++
net/netfilter/nf_conntrack_standalone.c | 2 ++
net/sunrpc/sunrpc_syms.c | 1 +
13 files changed, 44 insertions(+), 2 deletions(-)


--
Best regards,
Jesper Brouer
ComX Networks A/S
Linux Network developer
Cand. Scient Datalog / MSc.
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer



2009-06-23 15:03:59

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 01/10] ext4: Use rcu_barrier() on module unload.

The ext4 module uses rcu_call() thus it should use rcu_barrier()on
module unload.

The kmem cache ext4_pspace_cachep is sometimes free'ed using
call_rcu() callbacks. Thus, we must wait for completion of call_rcu()
before doing kmem_cache_destroy().

I have difficult determining if no new call_rcu() callbacks can be envoked.
Would the maintainer please verify this?

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

fs/ext4/mballoc.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 519a0a6..e271a9e 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2902,8 +2902,10 @@ int __init init_ext4_mballoc(void)

void exit_ext4_mballoc(void)
{
- /* XXX: synchronize_rcu(); */
+ /* Wait for completion of call_rcu()'s on ext4_pspace_cachep */
+ rcu_barrier();
kmem_cache_destroy(ext4_pspace_cachep);
+
kmem_cache_destroy(ext4_ac_cachep);
kmem_cache_destroy(ext4_free_ext_cachep);
}


2009-06-23 15:04:14

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 04/10] sunrpc: Use rcu_barrier() on unload.

The sunrpc module uses rcu_call() thus it should use rcu_barrier() on
module unload.

Have not verified that the possibility for new call_rcu() callbacks
has been disabled. As a hint for checking, the functions calling
call_rcu() (unx_destroy_cred and generic_destroy_cred) are
registered as crdestroy function pointer in struct rpc_credops.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/sunrpc/sunrpc_syms.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c
index 843629f..adaa819 100644
--- a/net/sunrpc/sunrpc_syms.c
+++ b/net/sunrpc/sunrpc_syms.c
@@ -66,6 +66,7 @@ cleanup_sunrpc(void)
#ifdef CONFIG_PROC_FS
rpc_proc_exit();
#endif
+ rcu_barrier(); /* Wait for completion of call_rcu()'s */
}
MODULE_LICENSE("GPL");
module_init(init_sunrpc);


2009-06-23 15:04:04

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 02/10] bridge: Use rcu_barrier() instead of syncronize_net() on unload.

When unloading modules that uses call_rcu() callbacks, then we must
use rcu_barrier(). This module uses syncronize_net() which is not
enough to be sure that all callback has been completed.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/bridge/br.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/bridge/br.c b/net/bridge/br.c
index 9aac521..e1241c7 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -93,7 +93,7 @@ static void __exit br_deinit(void)

unregister_pernet_subsys(&br_net_ops);

- synchronize_net();
+ rcu_barrier(); /* Wait for completion of call_rcu()'s */

br_netfilter_fini();
#if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE)


2009-06-23 15:04:09

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 03/10] mac80211: Use rcu_barrier() on unload.

The mac80211 module uses rcu_call() thus it should use rcu_barrier()
on module unload.

I'm having a hardtime verifying that no more call_rcu() callbacks can
be schedules in the module unload path. Could a maintainer please
look into this?

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/mac80211/main.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 092a017..e9f70ce 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -1100,6 +1100,8 @@ static void __exit ieee80211_exit(void)
ieee80211s_stop();

ieee80211_debugfs_netdev_exit();
+
+ rcu_barrier(); /* Wait for completion of call_rcu()'s */
}




2009-06-23 15:15:55

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 03/10] mac80211: Use rcu_barrier() on unload.

On Tue, 2009-06-23 at 17:04 +0200, Jesper Dangaard Brouer wrote:
> The mac80211 module uses rcu_call() thus it should use rcu_barrier()
> on module unload.
>
> I'm having a hardtime verifying that no more call_rcu() callbacks can
> be schedules in the module unload path. Could a maintainer please
> look into this?
>
> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> ---
>
> net/mac80211/main.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/net/mac80211/main.c b/net/mac80211/main.c
> index 092a017..e9f70ce 100644
> --- a/net/mac80211/main.c
> +++ b/net/mac80211/main.c
> @@ -1100,6 +1100,8 @@ static void __exit ieee80211_exit(void)
> ieee80211s_stop();
>
> ieee80211_debugfs_netdev_exit();
> +
> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> }

I don't think this is correct at all -- note that call_rcu() is done in
some of the mesh code, so I would think you need to do this in
ieee80211_stop() since the call_rcu() code requires the interface to
still be around. And when it's stopped everything should be idle.

I can fix it later, but I'm deep in some other stuff right now.

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-06-23 15:04:19

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 05/10] nfs: Use rcu_barrier() on module unload.

The exit path unregister and destroys a lot of stuff, but I have not
verified that nfs_free_delegation() cannot start new call_rcu() callbacks.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

fs/nfs/inode.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 64f8719..40bf2b7 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1518,6 +1518,7 @@ static void __exit exit_nfs_fs(void)
unregister_nfs_fs();
nfs_fs_proc_exit();
nfsiod_stop();
+ rcu_barrier(); /* Wait for completion of call_rcu()'s */
}

/* Not quite true; I just maintain it */

2009-06-23 15:04:24

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 06/10] ipv6: Use rcu_barrier() on module unload.

The ipv6 module uses rcu_call() thus it should use rcu_barrier() on
module unload.
---

net/ipv6/af_inet6.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 85b3d00..caa0278 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -1284,6 +1284,8 @@ static void __exit inet6_exit(void)
proto_unregister(&udplitev6_prot);
proto_unregister(&udpv6_prot);
proto_unregister(&tcpv6_prot);
+
+ rcu_barrier(); /* Wait for completion of call_rcu()'s */
}
module_exit(inet6_exit);

2009-06-23 15:04:44

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 10/10] nf_conntrack: Use rcu_barrier().

I'm not sure which is are most optimal place to call rcu_barrier().
The patch probably calls rcu_barrier() too much, but its a better
safe than sorry approach.

There is embedded some comments that I would like Patrick McHardy
to look at.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/netfilter/nf_conntrack_core.c | 5 +++++
net/netfilter/nf_conntrack_standalone.c | 2 ++
2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 5f72b94..cea4537 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1084,6 +1084,8 @@ static void nf_conntrack_cleanup_init_net(void)
{
nf_conntrack_helper_fini();
nf_conntrack_proto_fini();
+ rcu_barrier();
+ /* Need to wait for call_rcu() before dealloc the kmem_cache */
kmem_cache_destroy(nf_conntrack_cachep);
}

@@ -1118,6 +1120,9 @@ void nf_conntrack_cleanup(struct net *net)
/* This makes sure all current packets have passed through
netfilter framework. Roll on, two-stage module
delete... */
+ /* [email protected] 2009-06-20: Think this should be replaced by a
+ rcu_barrier() ???
+ */
synchronize_net();

nf_conntrack_cleanup_net(net);
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 1935153..29c6cd0 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -500,6 +500,8 @@ static void nf_conntrack_net_exit(struct net *net)
nf_conntrack_standalone_fini_sysctl(net);
nf_conntrack_standalone_fini_proc(net);
nf_conntrack_cleanup(net);
+ /* [email protected]: Think rcu_barrier() should to be called earlier? */
+ rcu_barrier(); /* Wait for completion of call_rcu()'s */
}

static struct pernet_operations nf_conntrack_net_ops = {

2009-06-23 15:04:34

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 08/10] edac_core: Uses call_rcu() and its own wait_for_completion scheme.

Module edac_core.ko uses call_rcu() callbacks in edac_device.c, edac_mc.c
and edac_pci.c.

They all uses a wait_for_completion scheme, but this scheme it not 100%
safe on multiple CPUs. See the _rcu_barrier() implementation which explains
why extra precausion is needed.

The patch adds a comment about rcu_barrier() and as a precausion calls
rcu_barrier(). A maintainer needs to look at removing the wait_for_completion
code.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

drivers/edac/edac_device.c | 5 +++++
drivers/edac/edac_mc.c | 5 +++++
drivers/edac/edac_pci.c | 5 +++++
3 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/edac/edac_device.c b/drivers/edac/edac_device.c
index b02a6a6..5e831c9 100644
--- a/drivers/edac/edac_device.c
+++ b/drivers/edac/edac_device.c
@@ -373,6 +373,11 @@ static void del_edac_device_from_global_list(struct edac_device_ctl_info
init_completion(&edac_device->removal_complete);
call_rcu(&edac_device->rcu, complete_edac_device_list_del);
wait_for_completion(&edac_device->removal_complete);
+
+ /* [email protected] 2009-06-22: I think that rcu_barrier() should
+ * be used instead of wait_for_completion, because
+ * rcu_barrier() take multiple CPUs into account */
+ rcu_barrier();
}

/*
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 335b7eb..edcce41 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -428,6 +428,11 @@ static void del_mc_from_global_list(struct mem_ctl_info *mci)
init_completion(&mci->complete);
call_rcu(&mci->rcu, complete_mc_list_del);
wait_for_completion(&mci->complete);
+
+ /* [email protected] 2009-06-22: I think that rcu_barrier() should
+ * be used instead of wait_for_completion, because
+ * rcu_barrier() take multiple CPUs into account */
+ rcu_barrier();
}

/**
diff --git a/drivers/edac/edac_pci.c b/drivers/edac/edac_pci.c
index 30b585b..d0eb8c9 100644
--- a/drivers/edac/edac_pci.c
+++ b/drivers/edac/edac_pci.c
@@ -188,6 +188,11 @@ static void del_edac_pci_from_global_list(struct edac_pci_ctl_info *pci)
init_completion(&pci->complete);
call_rcu(&pci->rcu, complete_edac_pci_list_del);
wait_for_completion(&pci->complete);
+
+ /* [email protected] 2009-06-22: I think that rcu_barrier() should
+ * be used instead of wait_for_completion, because
+ * rcu_barrier() take multiple CPUs into account */
+ rcu_barrier();
}

#if 0

2009-06-23 15:04:39

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 09/10] cfq-iosched: Uses its own open-coded rcu_barrier.

This module cfq-iosched, has discovered the value of waiting for
call_rcu() completion, but its has its own open-coded implementation
of rcu_barrier(), which I don't think is 'strong' enough.

This patch only leaves a comment for the maintainers to consider.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

block/cfq-iosched.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 833ec18..c15555b 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -2657,6 +2657,12 @@ static void __exit cfq_exit(void)
/*
* this also protects us from entering cfq_slab_kill() with
* pending RCU callbacks
+ *
+ * [email protected] 2009-06-18: Maintainer please consider using
+ * rcu_barrier() instead of this open-coded wait for
+ * completion implementation. I think it provides a better
+ * guarantee that all CPUs are finished, although
+ * elv_ioc_count_read() do consider all CPUs.
*/
if (elv_ioc_count_read(ioc_count))
wait_for_completion(&all_gone);

2009-06-23 15:04:29

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 07/10] decnet: Use rcu_barrier() on module unload.

The decnet module unloading as been disabled with a '#if 0' statement,
because it have had issues. Perhaps using rcu_barrier() will fix
these issues?

Any maintainers with input?

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/decnet/af_decnet.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index d351b8d..bff12da 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -2393,6 +2393,10 @@ module_init(decnet_init);
* Prevent DECnet module unloading until its fixed properly.
* Requires an audit of the code to check for memory leaks and
* initialisation problems etc.
+ *
+ * [email protected] 2009-06-19:
+ * I have added a rcu_barrier() which should plug some of your
+ * module unload issues. Maintainers please try it out...
*/
#if 0
static void __exit decnet_exit(void)
@@ -2413,6 +2417,8 @@ static void __exit decnet_exit(void)
proc_net_remove(&init_net, "decnet");

proto_unregister(&dn_proto);
+
+ rcu_barrier_bh(); /* Wait for completion of call_rcu_bh()'s */
}
module_exit(decnet_exit);
#endif


2009-06-23 16:23:33

by Patrick McHardy

[permalink] [raw]
Subject: Re: [PATCH 10/10] nf_conntrack: Use rcu_barrier().

Jesper Dangaard Brouer wrote:
> I'm not sure which is are most optimal place to call rcu_barrier().
> The patch probably calls rcu_barrier() too much, but its a better
> safe than sorry approach.
>
> There is embedded some comments that I would like Patrick McHardy
> to look at.
>
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 5f72b94..cea4537 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1084,6 +1084,8 @@ static void nf_conntrack_cleanup_init_net(void)
> {
> nf_conntrack_helper_fini();
> nf_conntrack_proto_fini();
> + rcu_barrier();
> + /* Need to wait for call_rcu() before dealloc the kmem_cache */
> kmem_cache_destroy(nf_conntrack_cachep);

Which call_rcu() is this referring to? If its the conntrack destruction,
that one is gone in the current kernel and I think destruction is
handled properly by the sl*b-allocators (SLAB_DESTROY_BY_RCU).

> @@ -1118,6 +1120,9 @@ void nf_conntrack_cleanup(struct net *net)
> /* This makes sure all current packets have passed through
> netfilter framework. Roll on, two-stage module
> delete... */
> + /* [email protected] 2009-06-20: Think this should be replaced by a
> + rcu_barrier() ???
> + */
> synchronize_net();

AFAICT this one is used to make sure the old value of the ip_ct_attach
hook is not visible anymore before beginning cleanup and is not needed
for anything else.

> nf_conntrack_cleanup_net(net);
> diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
> index 1935153..29c6cd0 100644
> --- a/net/netfilter/nf_conntrack_standalone.c
> +++ b/net/netfilter/nf_conntrack_standalone.c
> @@ -500,6 +500,8 @@ static void nf_conntrack_net_exit(struct net *net)
> nf_conntrack_standalone_fini_sysctl(net);
> nf_conntrack_standalone_fini_proc(net);
> nf_conntrack_cleanup(net);
> + /* [email protected]: Think rcu_barrier() should to be called earlier? */
> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> }

Which call_rcu() is this referring to? We should place them in
the cleanup sub-functions to make this clearly visible.

2009-06-23 16:59:13

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 04/10] sunrpc: Use rcu_barrier() on unload.

On Tue, 2009-06-23 at 17:04 +0200, Jesper Dangaard Brouer wrote:
> The sunrpc module uses rcu_call() thus it should use rcu_barrier() on
> module unload.
>
> Have not verified that the possibility for new call_rcu() callbacks
> has been disabled. As a hint for checking, the functions calling
> call_rcu() (unx_destroy_cred and generic_destroy_cred) are
> registered as crdestroy function pointer in struct rpc_credops.
>
> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> ---
>
> net/sunrpc/sunrpc_syms.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c
> index 843629f..adaa819 100644
> --- a/net/sunrpc/sunrpc_syms.c
> +++ b/net/sunrpc/sunrpc_syms.c
> @@ -66,6 +66,7 @@ cleanup_sunrpc(void)
> #ifdef CONFIG_PROC_FS
> rpc_proc_exit();
> #endif
> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> }
> MODULE_LICENSE("GPL");
> module_init(init_sunrpc);
>

Acked-by: Trond Myklebust <[email protected]>
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-06-24 01:44:21

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 00/10] We must use rcu_barrier() on module unload

On Tue, Jun 23, 2009 at 05:03:53PM +0200, Jesper Dangaard Brouer wrote:
> This patch series is an attempt to cleanup the entire tree, for
> potential oops'es during module unload, due to outstanding RCU
> callbacks. (My last rcu_barrier patch series only addressed net/).
>
> If an unloadable module uses RCU callbacks, it need to use
> rcu_barrier() so that the module may be safely unloaded.
>
> For documentation see:
>
> Paul E. McKenney's Blog
> http://paulmck.livejournal.com/7314.html
>
> http://lwn.net/Articles/217484/
>
> Documentation/RCU/rcubarrier.txt
>
>
> Looking through the Linux kernel for call_rcu() users and unloadable
> modules I found 10 modules that didn't behave correctly.

These look good from an RCU viewpoint, but I am in no better position
than is Jesper to analyze the individual modules. From an RCU
viewpoint:

Acked-by: Paul E. McKenney <[email protected]>

> Please: MAINTAINERS needs to verify that the module exit code prevent
> any new RCU callbacks from being posted (before rcu_barrier() is
> called). (I have tried to do this verification, but most of these
> module are simply too large and complex for me to verify this within
> reasonable time)
>
> [Overview description, following patch ordering]
>
> The modules ext4, bridge, mac80211, sunrpc, nfs and ipv6 are fairly
> straight forward (maintainers still needs to check for prevent of new
> RCU callbacks).
>
> The module decnet, has disabled its module_exit() (since ^1da177e) but
> it still seems relevant to keep the code updated.
>
> The modules edac_core and cfq-iosched, has implemented their own
> open-coded wait_for_completion() scheme, in order to wait for
> call_rcu() calls. Maintainers needs to look into removing this code
> and using rcu_barrier() instead.
>
> The module nf_conntrack, has embedded some comments that I would like
> Patrick McHardy to look at. As I'm not sure which is are most optimal
> place to call rcu_barrier(). The patch probably calls rcu_barrier()
> too much, but its a better safe than sorry approach.
>
>
> I have made a patch for each individual module, so objections can be
> made on a per module basis. I have Cc'ed all of the patches to the
> maintainers of each module (according to the MAINTAINERS file).
>
>
> The patchset is made on top of Linus Torvalds tree (starting on top of
> commit f234012f52a3).
>
> Who wants to pickup these patches? (I usually go through DaveM, but
> this also touches subsystems that are not (yet?) under DaveM's
> maintainer ship)
>
>
> ---
> Jesper Dangaard Brouer (10):
> nf_conntrack: Use rcu_barrier().
> cfq-iosched: Uses its own open-coded rcu_barrier.
> edac_core: Uses call_rcu() and its own wait_for_completion scheme.
> decnet: Use rcu_barrier() on module unload.
> ipv6: Use rcu_barrier() on module unload.
> nfs: Use rcu_barrier() on module unload.
> sunrpc: Use rcu_barrier() on unload.
> mac80211: Use rcu_barrier() on unload.
> bridge: Use rcu_barrier() instead of syncronize_net() on unload.
> ext4: Use rcu_barrier() on module unload.
>
>
> block/cfq-iosched.c | 6 ++++++
> drivers/edac/edac_device.c | 5 +++++
> drivers/edac/edac_mc.c | 5 +++++
> drivers/edac/edac_pci.c | 5 +++++
> fs/ext4/mballoc.c | 4 +++-
> fs/nfs/inode.c | 1 +
> net/bridge/br.c | 2 +-
> net/decnet/af_decnet.c | 6 ++++++
> net/ipv6/af_inet6.c | 2 ++
> net/mac80211/main.c | 2 ++
> net/netfilter/nf_conntrack_core.c | 5 +++++
> net/netfilter/nf_conntrack_standalone.c | 2 ++
> net/sunrpc/sunrpc_syms.c | 1 +
> 13 files changed, 44 insertions(+), 2 deletions(-)
>
>
> --
> Best regards,
> Jesper Brouer
> ComX Networks A/S
> Linux Network developer
> Cand. Scient Datalog / MSc.
> Author of http://adsl-optimizer.dk
> LinkedIn: http://www.linkedin.com/in/brouer
>

2009-06-24 06:23:58

by Chrissie Caulfield

[permalink] [raw]
Subject: Re: [PATCH 07/10] decnet: Use rcu_barrier() on module unload.

Jesper Dangaard Brouer wrote:
> The decnet module unloading as been disabled with a '#if 0' statement,
> because it have had issues. Perhaps using rcu_barrier() will fix
> these issues?
>
> Any maintainers with input?
>
> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> ---
>
> net/decnet/af_decnet.c | 6 ++++++
> 1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
> index d351b8d..bff12da 100644
> --- a/net/decnet/af_decnet.c
> +++ b/net/decnet/af_decnet.c
> @@ -2393,6 +2393,10 @@ module_init(decnet_init);
> * Prevent DECnet module unloading until its fixed properly.
> * Requires an audit of the code to check for memory leaks and
> * initialisation problems etc.
> + *
> + * [email protected] 2009-06-19:
> + * I have added a rcu_barrier() which should plug some of your
> + * module unload issues. Maintainers please try it out...
> */
> #if 0
> static void __exit decnet_exit(void)
> @@ -2413,6 +2417,8 @@ static void __exit decnet_exit(void)
> proc_net_remove(&init_net, "decnet");
>
> proto_unregister(&dn_proto);
> +
> + rcu_barrier_bh(); /* Wait for completion of call_rcu_bh()'s */
> }
> module_exit(decnet_exit);
> #endif
>

The issues with DECnet module unloading are a little more than just an
RCU leak I think!

Though that area does need reviewing ... when I get some time.

--

Chrissie

2009-06-24 06:42:37

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 09/10] cfq-iosched: Uses its own open-coded rcu_barrier.

On Tue, Jun 23 2009, Jesper Dangaard Brouer wrote:
> This module cfq-iosched, has discovered the value of waiting for
> call_rcu() completion, but its has its own open-coded implementation
> of rcu_barrier(), which I don't think is 'strong' enough.
>
> This patch only leaves a comment for the maintainers to consider.

We need a stronger primitive and rcu_barrier(), since we also need to
wait for the rcu calls to even be scheduled. So I don't think the below
can be improved, it's already fine.

>
> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> ---
>
> block/cfq-iosched.c | 6 ++++++
> 1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 833ec18..c15555b 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -2657,6 +2657,12 @@ static void __exit cfq_exit(void)
> /*
> * this also protects us from entering cfq_slab_kill() with
> * pending RCU callbacks
> + *
> + * [email protected] 2009-06-18: Maintainer please consider using
> + * rcu_barrier() instead of this open-coded wait for
> + * completion implementation. I think it provides a better
> + * guarantee that all CPUs are finished, although
> + * elv_ioc_count_read() do consider all CPUs.
> */
> if (elv_ioc_count_read(ioc_count))
> wait_for_completion(&all_gone);
>

--
Jens Axboe

2009-06-24 07:02:11

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 00/10] We must use rcu_barrier() on module unload

From: Jesper Dangaard Brouer <[email protected]>
Date: Tue, 23 Jun 2009 17:03:53 +0200

> This patch series is an attempt to cleanup the entire tree, for
> potential oops'es during module unload, due to outstanding RCU
> callbacks. (My last rcu_barrier patch series only addressed net/).

This series has net/ stuff too :-)

I'll let the networking cases sit for a while and get review before
I apply them.

2009-06-24 09:02:19

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH 10/10] nf_conntrack: Use rcu_barrier().

On Tue, 2009-06-23 at 18:23 +0200, Patrick McHardy wrote:
> Jesper Dangaard Brouer wrote:
> > I'm not sure which is are most optimal place to call rcu_barrier().
> > The patch probably calls rcu_barrier() too much, but its a better
> > safe than sorry approach.
> >
> > There is embedded some comments that I would like Patrick McHardy
> > to look at.
> >
> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > index 5f72b94..cea4537 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -1084,6 +1084,8 @@ static void nf_conntrack_cleanup_init_net(void)
> > {
> > nf_conntrack_helper_fini();
> > nf_conntrack_proto_fini();
> > + rcu_barrier();
> > + /* Need to wait for call_rcu() before dealloc the kmem_cache */
> > kmem_cache_destroy(nf_conntrack_cachep);
>
> Which call_rcu() is this referring to?

It is the call_rcu() in nf_conntrack_expect.c (which is linked into
nf_conntrack.ko). But that also means that it should have been the slab
cache called "nf_ct_expect_cachep" we should have waited for... (and I
also notice that "nf_ct_expect_cachep" is missing the
SLAB_DESTROY_BY_RCU flag, and the SLAB_DESTROY_BY_RCU flag should be
removed from "nf_conntrack_cachep")

> If its the conntrack destruction,
> that one is gone in the current kernel and I think destruction is
> handled properly by the sl*b-allocators (SLAB_DESTROY_BY_RCU).

Just dived into the slab.c code and noticed that it also is flawed,
ARGH!. When the SLAB_DESTROY_BY_RCU flags is set, it only calls
synchronize_rcu() and not rcu_barrier() as it should!

I'll fix that up in another patch series...

Looking into slub and slob at the moment, and it seems that they
schedule another call_rcu callback for freeing when the
SLAB_DESTROY_BY_RCU flags is set. That seems racy to me... Paul?


> > @@ -1118,6 +1120,9 @@ void nf_conntrack_cleanup(struct net *net)
> > /* This makes sure all current packets have passed through
> > netfilter framework. Roll on, two-stage module
> > delete... */
> > + /* [email protected] 2009-06-20: Think this should be replaced by a
> > + rcu_barrier() ???
> > + */
> > synchronize_net();
>
> AFAICT this one is used to make sure the old value of the ip_ct_attach
> hook is not visible anymore before beginning cleanup and is not needed
> for anything else.

Fine!

> > nf_conntrack_cleanup_net(net);
> > diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
> > index 1935153..29c6cd0 100644
> > --- a/net/netfilter/nf_conntrack_standalone.c
> > +++ b/net/netfilter/nf_conntrack_standalone.c
> > @@ -500,6 +500,8 @@ static void nf_conntrack_net_exit(struct net *net)
> > nf_conntrack_standalone_fini_sysctl(net);
> > nf_conntrack_standalone_fini_proc(net);
> > nf_conntrack_cleanup(net);
> > + /* [email protected]: Think rcu_barrier() should to be called earlier? */
> > + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> > }
>
> Which call_rcu() is this referring to? We should place them in
> the cleanup sub-functions to make this clearly visible.

This call_rcu() is the one done in nf_conntrack_extend.c:114 (notice
"_extend" NOT "_expect"), which calls __nf_ct_ext_free_rcu().

Guess this rcu_barrier() should then be move to
nf_ct_extend_unregister() right? (it already invokes a
synchronize_rcu() that should be replaced by rcu_barrier()).
Although this means the nf_ct_extend_unregister() will be called three
times in nf_conntrack_cleanup_net() when unregistering ecache, acct and
expect.


Thank you for your feedback :-) ... I'll post a new v2 patch...
--
Med venlig hilsen / Best regards
Jesper Brouer
ComX Networks A/S
Linux Network developer
Cand. Scient Datalog / MSc.
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer


2009-06-24 09:40:08

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH v2 10/10] nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags


Adjusting SLAB_DESTROY_BY_RCU flags.

kmem_cache_create("nf_conntrack", ...) does not need the
SLAB_DESTROY_BY_RCU flag. But the
kmem_cache_create("nf_conntrack_expect", ...) should use the
SLAB_DESTROY_BY_RCU flag, because it uses a call_rcu() callback to
invoke kmem_cache_free().

RCU barriers, rcu_barrier(), is inserted two places.

In nf_conntrack_expect.c nf_conntrack_expect_fini() before the
kmem_cache_destroy(), even though the use of the SLAB_DESTROY_BY_RCU
flag, because slub does not (currently) handle rcu sync correctly.

And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to
wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is
invoked by __nf_ct_ext_add(). It might be more efficient to call
rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but
thats make it more difficult to read the code (as the callback code
in located in nf_conntrack_extend.c).

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/netfilter/nf_conntrack_core.c | 2 +-
net/netfilter/nf_conntrack_expect.c | 11 +++++++++--
net/netfilter/nf_conntrack_extend.c | 2 +-
3 files changed, 11 insertions(+), 4 deletions(-)


diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 5f72b94..438ce84 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1242,7 +1242,7 @@ static int nf_conntrack_init_init_net(void)

nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
sizeof(struct nf_conn),
- 0, SLAB_DESTROY_BY_RCU, NULL);
+ 0, 0, NULL);
if (!nf_conntrack_cachep) {
printk(KERN_ERR "Unable to create nf_conn slab cache\n");
ret = -ENOMEM;
diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
index afde8f9..56227c2 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -593,7 +593,7 @@ int nf_conntrack_expect_init(struct net *net)
if (net_eq(net, &init_net)) {
nf_ct_expect_cachep = kmem_cache_create("nf_conntrack_expect",
sizeof(struct nf_conntrack_expect),
- 0, 0, NULL);
+ 0, SLAB_DESTROY_BY_RCU, NULL);
if (!nf_ct_expect_cachep)
goto err2;
}
@@ -617,8 +617,15 @@ err1:
void nf_conntrack_expect_fini(struct net *net)
{
exp_proc_remove(net);
- if (net_eq(net, &init_net))
+ if (net_eq(net, &init_net)) {
+ /* [email protected] 2009-06-24: The rcu_barrier() can be
+ * removed once the sl*b allocators has been fixed
+ * regarding handling the SLAB_DESTROY_BY_RCU flag
+ * correctly.
+ */
+ rcu_barrier(); /* Wait for call_rcu() before destroy */
kmem_cache_destroy(nf_ct_expect_cachep);
+ }
nf_ct_free_hashtable(net->ct.expect_hash, net->ct.expect_vmalloc,
nf_ct_expect_hsize);
}
diff --git a/net/netfilter/nf_conntrack_extend.c b/net/netfilter/nf_conntrack_extend.c
index 4b2c769..fef95be 100644
--- a/net/netfilter/nf_conntrack_extend.c
+++ b/net/netfilter/nf_conntrack_extend.c
@@ -186,6 +186,6 @@ void nf_ct_extend_unregister(struct nf_ct_ext_type *type)
rcu_assign_pointer(nf_ct_ext_types[type->id], NULL);
update_alloc_size(type);
mutex_unlock(&nf_ct_ext_type_mutex);
- synchronize_rcu();
+ rcu_barrier(); /* Wait for completion of call_rcu()'s */
}
EXPORT_SYMBOL_GPL(nf_ct_extend_unregister);


2009-06-24 10:06:05

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH 03/10] mac80211: Use rcu_barrier() on unload.

On Tue, 2009-06-23 at 17:15 +0200, Johannes Berg wrote:
> On Tue, 2009-06-23 at 17:04 +0200, Jesper Dangaard Brouer wrote:
> > The mac80211 module uses rcu_call() thus it should use rcu_barrier()
> > on module unload.
> >
> > I'm having a hardtime verifying that no more call_rcu() callbacks can
> > be schedules in the module unload path. Could a maintainer please
> > look into this?
> >
> > Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> > ---
> >
> > net/mac80211/main.c | 2 ++
> > 1 files changed, 2 insertions(+), 0 deletions(-)
> >
> > diff --git a/net/mac80211/main.c b/net/mac80211/main.c
> > index 092a017..e9f70ce 100644
> > --- a/net/mac80211/main.c
> > +++ b/net/mac80211/main.c
> > @@ -1100,6 +1100,8 @@ static void __exit ieee80211_exit(void)
> > ieee80211s_stop();
> >
> > ieee80211_debugfs_netdev_exit();
> > +
> > + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> > }
>
> I don't think this is correct at all -- note that call_rcu() is done in
> some of the mesh code, so I would think you need to do this in
> ieee80211_stop() since the call_rcu() code requires the interface to
> still be around. And when it's stopped everything should be idle.

Should it then not be in mesh.c ieee80211_stop_mesh(). We can replace
the synchronize_rcu() in this function with a rcu_barrier().

> I can fix it later, but I'm deep in some other stuff right now.

Yes, I noticed you seem quite active :-)
I can also do a repost... what about the patch below?

--
Med venlig hilsen / Best regards
Jesper Brouer
ComX Networks A/S
Linux Network developer
Cand. Scient Datalog / MSc.
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer


[PATCH v2 03/10] mac80211: Use rcu_barrier() on unload.

From: Jesper Dangaard Brouer <[email protected]>

The mac80211 module uses rcu_call() thus it should use rcu_barrier()
on module unload.

The rcu_barrier() is placed in mech.c ieee80211_stop_mesh() which is
invoked from ieee80211_stop() in case vif.type == NL80211_IFTYPE_MESH_POINT.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/mac80211/mesh.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)


diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c
index fc712e6..11cf45b 100644
--- a/net/mac80211/mesh.c
+++ b/net/mac80211/mesh.c
@@ -494,7 +494,7 @@ void ieee80211_stop_mesh(struct ieee80211_sub_if_data *sdata)
* should it be using the interface and enqueuing
* frames at this very time on another CPU.
*/
- synchronize_rcu();
+ rcu_barrier(); /* Wait for RX path and call_rcu()'s */
skb_queue_purge(&sdata->u.mesh.skb_queue);
}


2009-06-24 10:22:23

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 03/10] mac80211: Use rcu_barrier() on unload.

On Wed, 2009-06-24 at 12:06 +0200, Jesper Dangaard Brouer wrote:
> On Tue, 2009-06-23 at 17:15 +0200, Johannes Berg wrote:
> > On Tue, 2009-06-23 at 17:04 +0200, Jesper Dangaard Brouer wrote:
> > > The mac80211 module uses rcu_call() thus it should use rcu_barrier()
> > > on module unload.
> > >
> > > I'm having a hardtime verifying that no more call_rcu() callbacks can
> > > be schedules in the module unload path. Could a maintainer please
> > > look into this?
> > >
> > > Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> > > ---
> > >
> > > net/mac80211/main.c | 2 ++
> > > 1 files changed, 2 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/net/mac80211/main.c b/net/mac80211/main.c
> > > index 092a017..e9f70ce 100644
> > > --- a/net/mac80211/main.c
> > > +++ b/net/mac80211/main.c
> > > @@ -1100,6 +1100,8 @@ static void __exit ieee80211_exit(void)
> > > ieee80211s_stop();
> > >
> > > ieee80211_debugfs_netdev_exit();
> > > +
> > > + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> > > }
> >
> > I don't think this is correct at all -- note that call_rcu() is done in
> > some of the mesh code, so I would think you need to do this in
> > ieee80211_stop() since the call_rcu() code requires the interface to
> > still be around. And when it's stopped everything should be idle.
>
> Should it then not be in mesh.c ieee80211_stop_mesh(). We can replace
> the synchronize_rcu() in this function with a rcu_barrier().

Yes, this seems correct.

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-06-24 11:32:17

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH 03/10] mac80211: Use rcu_barrier() on unload.

On Wed, 2009-06-24 at 12:21 +0200, Johannes Berg wrote:
> On Wed, 2009-06-24 at 12:06 +0200, Jesper Dangaard Brouer wrote:
> > On Tue, 2009-06-23 at 17:15 +0200, Johannes Berg wrote:
> > > On Tue, 2009-06-23 at 17:04 +0200, Jesper Dangaard Brouer wrote:
> > > > The mac80211 module uses rcu_call() thus it should use rcu_barrier()
> > > > on module unload.
> > > >
> > > > I'm having a hardtime verifying that no more call_rcu() callbacks can
> > > > be schedules in the module unload path. Could a maintainer please
> > > > look into this?
> > > >
> > > > Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> > > > ---
> > > >
> > > > net/mac80211/main.c | 2 ++
> > > > 1 files changed, 2 insertions(+), 0 deletions(-)
> > > >
> > > > diff --git a/net/mac80211/main.c b/net/mac80211/main.c
> > > > index 092a017..e9f70ce 100644
> > > > --- a/net/mac80211/main.c
> > > > +++ b/net/mac80211/main.c
> > > > @@ -1100,6 +1100,8 @@ static void __exit ieee80211_exit(void)
> > > > ieee80211s_stop();
> > > >
> > > > ieee80211_debugfs_netdev_exit();
> > > > +
> > > > + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> > > > }
> > >
> > > I don't think this is correct at all -- note that call_rcu() is done in
> > > some of the mesh code, so I would think you need to do this in
> > > ieee80211_stop() since the call_rcu() code requires the interface to
> > > still be around. And when it's stopped everything should be idle.
> >
> > Should it then not be in mesh.c ieee80211_stop_mesh(). We can replace
> > the synchronize_rcu() in this function with a rcu_barrier().
>
> Yes, this seems correct.
>
> johannes

Can I consider this a:

Acked-by: Johannes Berg <[email protected]>

???

DaveM seems to like this as patchwork.ozlabs.org picks up this
automatically...

--
Med venlig hilsen / Best regards
Jesper Brouer
ComX Networks A/S
Linux Network developer
Cand. Scient Datalog / MSc.
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer

2009-06-24 11:39:00

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 03/10] mac80211: Use rcu_barrier() on unload.

On Wed, 2009-06-24 at 13:32 +0200, Jesper Dangaard Brouer wrote:

> > > Should it then not be in mesh.c ieee80211_stop_mesh(). We can replace
> > > the synchronize_rcu() in this function with a rcu_barrier().
> >
> > Yes, this seems correct.
> >
> > johannes
>
> Can I consider this a:
>
> Acked-by: Johannes Berg <[email protected]>

Yeah,
Acked-by: Johannes Berg <[email protected]>

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-06-24 11:44:43

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH 07/10] decnet: Use rcu_barrier() on module unload.

On Wed, 2009-06-24 at 07:23 +0100, Chrissie Caulfield wrote:
> The issues with DECnet module unloading are a little more than just an
> RCU leak I think!
>
> Though that area does need reviewing ... when I get some time.

Fine. Now you have read my comment in the code, then there is a updated
patch below. Will you ack-that?

--
Med venlig hilsen / Best regards
Jesper Brouer
ComX Networks A/S
Linux Network developer
Cand. Scient Datalog / MSc.
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer


[PATCH 07/10] decnet: Use rcu_barrier() on module unload.

From: Jesper Dangaard Brouer <[email protected]>

The decnet module unloading as been disabled with a '#if 0' statement,
because it have had issues.

We add a rcu_barrier() anyhow for correctness.

The maintainer (Chrissie Caulfield) will look into the unload issue
when time permits.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/decnet/af_decnet.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)


diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index d351b8d..bff12da 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -2393,6 +2393,10 @@ module_init(decnet_init);
* Prevent DECnet module unloading until its fixed properly.
* Requires an audit of the code to check for memory leaks and
* initialisation problems etc.
+ *
+ * [email protected] 2009-06-19:
+ * I have added a rcu_barrier() which should plug some of your
+ * module unload issues. Maintainers please try it out...
*/
#if 0
static void __exit decnet_exit(void)
@@ -2413,6 +2417,8 @@ static void __exit decnet_exit(void)
proc_net_remove(&init_net, "decnet");

proto_unregister(&dn_proto);
+
+ rcu_barrier_bh(); /* Wait for completion of call_rcu_bh()'s */
}
module_exit(decnet_exit);
#endif



2009-06-24 12:09:27

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH 07/10] decnet: Use rcu_barrier() on module unload.

On Wed, 2009-06-24 at 13:44 +0200, Jesper Dangaard Brouer wrote:
> On Wed, 2009-06-24 at 07:23 +0100, Chrissie Caulfield wrote:
> > The issues with DECnet module unloading are a little more than just an
> > RCU leak I think!
> >
> > Though that area does need reviewing ... when I get some time.
>
> Fine. Now you have read my comment in the code, then there is a updated
> patch below. Will you ack-that?

Sorry wrong patch... forgot save the code and 'stg refresh'...

[PATCH 07/10] decnet: Use rcu_barrier() on module unload.

From: Jesper Dangaard Brouer <[email protected]>

The decnet module unloading as been disabled with a '#if 0' statement,
because it have had issues.

We add a rcu_barrier() anyhow for correctness.

The maintainer (Chrissie Caulfield) will look into the unload issue
when time permits.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/decnet/af_decnet.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)


diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index d351b8d..77d4028 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -2413,6 +2413,8 @@ static void __exit decnet_exit(void)
proc_net_remove(&init_net, "decnet");

proto_unregister(&dn_proto);
+
+ rcu_barrier_bh(); /* Wait for completion of call_rcu_bh()'s */
}
module_exit(decnet_exit);
#endif

2009-06-24 13:50:42

by Chrissie Caulfield

[permalink] [raw]
Subject: Re: [PATCH 07/10] decnet: Use rcu_barrier() on module unload.


On 24 Jun 2009, at 13:09, Jesper Dangaard Brouer wrote:

> On Wed, 2009-06-24 at 13:44 +0200, Jesper Dangaard Brouer wrote:
>> On Wed, 2009-06-24 at 07:23 +0100, Chrissie Caulfield wrote:
>>> The issues with DECnet module unloading are a little more than
>>> just an
>>> RCU leak I think!
>>>
>>> Though that area does need reviewing ... when I get some time.
>>
>> Fine. Now you have read my comment in the code, then there is a
>> updated
>> patch below. Will you ack-that?
>

I don't have any objection to the patch at all, it just seemed a
little odd to deliberately add code inside #if 0 ;-)

Chrissie


> Sorry wrong patch... forgot save the code and 'stg refresh'...
>
> [PATCH 07/10] decnet: Use rcu_barrier() on module unload.
>
> From: Jesper Dangaard Brouer <[email protected]>
>
> The decnet module unloading as been disabled with a '#if 0' statement,
> because it have had issues.
>
> We add a rcu_barrier() anyhow for correctness.
>
> The maintainer (Chrissie Caulfield) will look into the unload issue
> when time permits.
>
> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> ---
>
> net/decnet/af_decnet.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
>
> diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
> index d351b8d..77d4028 100644
> --- a/net/decnet/af_decnet.c
> +++ b/net/decnet/af_decnet.c
> @@ -2413,6 +2413,8 @@ static void __exit decnet_exit(void)
> proc_net_remove(&init_net, "decnet");
>
> proto_unregister(&dn_proto);
> +
> + rcu_barrier_bh(); /* Wait for completion of call_rcu_bh()'s */
> }
> module_exit(decnet_exit);
> #endif
>
>

2009-06-24 13:58:34

by Patrick McHardy

[permalink] [raw]
Subject: Re: [PATCH v2 10/10] nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags

Jesper Dangaard Brouer wrote:
> Adjusting SLAB_DESTROY_BY_RCU flags.
>
> kmem_cache_create("nf_conntrack", ...) does not need the
> SLAB_DESTROY_BY_RCU flag.

It does need it. We're using it instead of call_rcu() for conntracks.

> But the
> kmem_cache_create("nf_conntrack_expect", ...) should use the
> SLAB_DESTROY_BY_RCU flag, because it uses a call_rcu() callback to
> invoke kmem_cache_free().

No, using call_rcu() means we don't need SLAB_DESTROY_BY_RCU.
Please see the note in include/linux/slab.h.

> RCU barriers, rcu_barrier(), is inserted two places.
>
> In nf_conntrack_expect.c nf_conntrack_expect_fini() before the
> kmem_cache_destroy(), even though the use of the SLAB_DESTROY_BY_RCU
> flag, because slub does not (currently) handle rcu sync correctly.

I think that should be fixed in slub then.

> And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to
> wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is
> invoked by __nf_ct_ext_add(). It might be more efficient to call
> rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but
> thats make it more difficult to read the code (as the callback code
> in located in nf_conntrack_extend.c).

This one looks fine.

> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 5f72b94..438ce84 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1242,7 +1242,7 @@ static int nf_conntrack_init_init_net(void)
>
> nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
> sizeof(struct nf_conn),
> - 0, SLAB_DESTROY_BY_RCU, NULL);
> + 0, 0, NULL);
> if (!nf_conntrack_cachep) {
> printk(KERN_ERR "Unable to create nf_conn slab cache\n");
> ret = -ENOMEM;
> diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
> index afde8f9..56227c2 100644
> --- a/net/netfilter/nf_conntrack_expect.c
> +++ b/net/netfilter/nf_conntrack_expect.c
> @@ -593,7 +593,7 @@ int nf_conntrack_expect_init(struct net *net)
> if (net_eq(net, &init_net)) {
> nf_ct_expect_cachep = kmem_cache_create("nf_conntrack_expect",
> sizeof(struct nf_conntrack_expect),
> - 0, 0, NULL);
> + 0, SLAB_DESTROY_BY_RCU, NULL);
> if (!nf_ct_expect_cachep)
> goto err2;
> }
> @@ -617,8 +617,15 @@ err1:
> void nf_conntrack_expect_fini(struct net *net)
> {
> exp_proc_remove(net);
> - if (net_eq(net, &init_net))
> + if (net_eq(net, &init_net)) {
> + /* [email protected] 2009-06-24: The rcu_barrier() can be
> + * removed once the sl*b allocators has been fixed
> + * regarding handling the SLAB_DESTROY_BY_RCU flag
> + * correctly.
> + */
> + rcu_barrier(); /* Wait for call_rcu() before destroy */
> kmem_cache_destroy(nf_ct_expect_cachep);
> + }
> nf_ct_free_hashtable(net->ct.expect_hash, net->ct.expect_vmalloc,
> nf_ct_expect_hsize);
> }
> diff --git a/net/netfilter/nf_conntrack_extend.c b/net/netfilter/nf_conntrack_extend.c
> index 4b2c769..fef95be 100644
> --- a/net/netfilter/nf_conntrack_extend.c
> +++ b/net/netfilter/nf_conntrack_extend.c
> @@ -186,6 +186,6 @@ void nf_ct_extend_unregister(struct nf_ct_ext_type *type)
> rcu_assign_pointer(nf_ct_ext_types[type->id], NULL);
> update_alloc_size(type);
> mutex_unlock(&nf_ct_ext_type_mutex);
> - synchronize_rcu();
> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> }
> EXPORT_SYMBOL_GPL(nf_ct_extend_unregister);
>


2009-06-24 14:05:05

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 09/10] cfq-iosched: Uses its own open-coded rcu_barrier.

On Wed, Jun 24, 2009 at 08:42:37AM +0200, Jens Axboe wrote:
> On Tue, Jun 23 2009, Jesper Dangaard Brouer wrote:
> > This module cfq-iosched, has discovered the value of waiting for
> > call_rcu() completion, but its has its own open-coded implementation
> > of rcu_barrier(), which I don't think is 'strong' enough.
> >
> > This patch only leaves a comment for the maintainers to consider.
>
> We need a stronger primitive and rcu_barrier(), since we also need to
> wait for the rcu calls to even be scheduled. So I don't think the below
> can be improved, it's already fine.

It is indeed important to first prevent new call_rcu() instances from
being invoked, and only then invoke rcu_barrier().

Thanx, Paul

> > Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> > ---
> >
> > block/cfq-iosched.c | 6 ++++++
> > 1 files changed, 6 insertions(+), 0 deletions(-)
> >
> > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> > index 833ec18..c15555b 100644
> > --- a/block/cfq-iosched.c
> > +++ b/block/cfq-iosched.c
> > @@ -2657,6 +2657,12 @@ static void __exit cfq_exit(void)
> > /*
> > * this also protects us from entering cfq_slab_kill() with
> > * pending RCU callbacks
> > + *
> > + * [email protected] 2009-06-18: Maintainer please consider using
> > + * rcu_barrier() instead of this open-coded wait for
> > + * completion implementation. I think it provides a better
> > + * guarantee that all CPUs are finished, although
> > + * elv_ioc_count_read() do consider all CPUs.
> > */
> > if (elv_ioc_count_read(ioc_count))
> > wait_for_completion(&all_gone);
> >
>
> --
> Jens Axboe
>

2009-06-25 09:29:14

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH v2 10/10] nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags


On Wed, 2009-06-24 at 15:58 +0200, Patrick McHardy wrote:
> Jesper Dangaard Brouer wrote:
> > Adjusting SLAB_DESTROY_BY_RCU flags.
> >
> > kmem_cache_create("nf_conntrack", ...) does not need the
> > SLAB_DESTROY_BY_RCU flag.
>
> It does need it. We're using it instead of call_rcu() for conntracks.
>
> > But the
> > kmem_cache_create("nf_conntrack_expect", ...) should use the
> > SLAB_DESTROY_BY_RCU flag, because it uses a call_rcu() callback to
> > invoke kmem_cache_free().
>
> No, using call_rcu() means we don't need SLAB_DESTROY_BY_RCU.
> Please see the note in include/linux/slab.h.

Oh, I see. The description is some what cryptic, but I think I got it,
after reading through the code.

BUT this still means that we need to do rcu_barrier() if the
SLAB_DESTROY_BY_RCU is NOT set and we do call_rcu() our self.

Look at: slab.c kmem_cache_destroy()

void kmem_cache_destroy(struct kmem_cache *cachep)
{
...<cut>...
if (__cache_shrink(cachep)) {
slab_error(cachep, "Can't free all objects");
...<cut>...
return;
}

if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
synchronize_rcu();

__kmem_cache_destroy(cachep);
...<cut>...
}

My understanding for the code is (please feel free to correct me): that
if SLAB_DESTROY_BY_RCU _is_ set, then the __cache_shrink() call will
call drain_freelist(), which calls slab_destroy().

If SLAB_DESTROY_BY_RCU _is_ set, then slab_destroy() will then start a
call_rcu() callback to kmem_rcu_free() which calls kmem_cache_free().
Given that the callback code kmem_rcu_free() is not removed, we are not
worried about unloading the module at this point.

I'm a bit worried about what happens if __kmem_cache_destroy() is
invoked and there is still callbacks for kmem_rcu_free() in flight?
The synchronize_rcu() between __cache_shrink() and
__kmem_cache_destroy() should perhaps be changed to rcu_barrier()?

But I'm sure that the SLAB/MM guys will tell me that this case is
handled (and something about its unlinked from the appropiate
lists)??? ;-)


> > RCU barriers, rcu_barrier(), is inserted two places.
> >
> > In nf_conntrack_expect.c nf_conntrack_expect_fini() before the
> > kmem_cache_destroy(), even though the use of the SLAB_DESTROY_BY_RCU
> > flag, because slub does not (currently) handle rcu sync correctly.
>
> I think that should be fixed in slub then.

I don't think so, we/I'm are talking about "nf_conntrack_expect" and not
"nf_conntrack" slab. Clearly the slab "nf_conntrack" is handled
correcly (according to description above).

We still need to make sure the callbacks for "nf_conntrack_expect", are
done before unloading/removing the code they are about to call.


> > And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to
> > wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is
> > invoked by __nf_ct_ext_add(). It might be more efficient to call
> > rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but
> > thats make it more difficult to read the code (as the callback code
> > in located in nf_conntrack_extend.c).
>
> This one looks fine.

Should I make two different patchs?


> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > index 5f72b94..438ce84 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -1242,7 +1242,7 @@ static int nf_conntrack_init_init_net(void)
> >
> > nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
> > sizeof(struct nf_conn),
> > - 0, SLAB_DESTROY_BY_RCU, NULL);
> > + 0, 0, NULL);
> > if (!nf_conntrack_cachep) {
> > printk(KERN_ERR "Unable to create nf_conn slab cache\n");
> > ret = -ENOMEM;
> > diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
> > index afde8f9..56227c2 100644
> > --- a/net/netfilter/nf_conntrack_expect.c
> > +++ b/net/netfilter/nf_conntrack_expect.c
> > @@ -593,7 +593,7 @@ int nf_conntrack_expect_init(struct net *net)
> > if (net_eq(net, &init_net)) {
> > nf_ct_expect_cachep = kmem_cache_create("nf_conntrack_expect",
> > sizeof(struct nf_conntrack_expect),
> > - 0, 0, NULL);
> > + 0, SLAB_DESTROY_BY_RCU, NULL);
> > if (!nf_ct_expect_cachep)
> > goto err2;
> > }
> > @@ -617,8 +617,15 @@ err1:
> > void nf_conntrack_expect_fini(struct net *net)
> > {
> > exp_proc_remove(net);
> > - if (net_eq(net, &init_net))
> > + if (net_eq(net, &init_net)) {
> > + /* [email protected] 2009-06-24: The rcu_barrier() can be
> > + * removed once the sl*b allocators has been fixed
> > + * regarding handling the SLAB_DESTROY_BY_RCU flag
> > + * correctly.
> > + */
> > + rcu_barrier(); /* Wait for call_rcu() before destroy */
> > kmem_cache_destroy(nf_ct_expect_cachep);
> > + }
> > nf_ct_free_hashtable(net->ct.expect_hash, net->ct.expect_vmalloc,
> > nf_ct_expect_hsize);
> > }
> > diff --git a/net/netfilter/nf_conntrack_extend.c b/net/netfilter/nf_conntrack_extend.c
> > index 4b2c769..fef95be 100644
> > --- a/net/netfilter/nf_conntrack_extend.c
> > +++ b/net/netfilter/nf_conntrack_extend.c
> > @@ -186,6 +186,6 @@ void nf_ct_extend_unregister(struct nf_ct_ext_type *type)
> > rcu_assign_pointer(nf_ct_ext_types[type->id], NULL);
> > update_alloc_size(type);
> > mutex_unlock(&nf_ct_ext_type_mutex);
> > - synchronize_rcu();
> > + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> > }
> > EXPORT_SYMBOL_GPL(nf_ct_extend_unregister);
> >
>
--
Med venlig hilsen / Best regards
Jesper Brouer
ComX Networks A/S
Linux Network developer
Cand. Scient Datalog / MSc.
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer

2009-06-25 10:02:57

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH v3 10/10] nf_conntrack: Use rcu_barrier()

On Thu, 2009-06-25 at 11:29 +0200, Jesper Dangaard Brouer wrote:
> Should I make two different patchs?

Here is the single patch... Patrick tell me if you want it split up?


[PATCH v3 10/10] nf_conntrack: Use rcu_barrier()

From: Jesper Dangaard Brouer <[email protected]>

RCU barriers, rcu_barrier(), is inserted two places.

In nf_conntrack_expect.c nf_conntrack_expect_fini() before the
kmem_cache_destroy(). Firstly to make sure the callback to the
nf_ct_expect_free_rcu() code is still around. Secondly because I'm
unsure about the consequence of having in flight
nf_ct_expect_free_rcu/kmem_cache_free() calls while doing a
kmem_cache_destroy() slab destroy.

And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to
wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is
invoked by __nf_ct_ext_add(). It might be more efficient to call
rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but
thats make it more difficult to read the code (as the callback code
in located in nf_conntrack_extend.c).

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/netfilter/nf_conntrack_expect.c | 4 +++-
net/netfilter/nf_conntrack_extend.c | 2 +-
2 files changed, 4 insertions(+), 2 deletions(-)


diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
index afde8f9..2032dfe 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -617,8 +617,10 @@ err1:
void nf_conntrack_expect_fini(struct net *net)
{
exp_proc_remove(net);
- if (net_eq(net, &init_net))
+ if (net_eq(net, &init_net)) {
+ rcu_barrier(); /* Wait for call_rcu() before destroy */
kmem_cache_destroy(nf_ct_expect_cachep);
+ }
nf_ct_free_hashtable(net->ct.expect_hash, net->ct.expect_vmalloc,
nf_ct_expect_hsize);
}
diff --git a/net/netfilter/nf_conntrack_extend.c b/net/netfilter/nf_conntrack_extend.c
index 4b2c769..fef95be 100644
--- a/net/netfilter/nf_conntrack_extend.c
+++ b/net/netfilter/nf_conntrack_extend.c
@@ -186,6 +186,6 @@ void nf_ct_extend_unregister(struct nf_ct_ext_type *type)
rcu_assign_pointer(nf_ct_ext_types[type->id], NULL);
update_alloc_size(type);
mutex_unlock(&nf_ct_ext_type_mutex);
- synchronize_rcu();
+ rcu_barrier(); /* Wait for completion of call_rcu()'s */
}
EXPORT_SYMBOL_GPL(nf_ct_extend_unregister);


2009-06-25 11:52:09

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH 07/10] decnet: Use rcu_barrier() on module unload.

On Wed, 2009-06-24 at 14:50 +0100, Chrissie Caulfield wrote:
> I don't have any objection to the patch at all,

Thus assuming an:

Acked-by: Chrissie Caulfield <christine.caulfield-gM/[email protected]>

(wondering if patchworks picks this up...)

> it just seemed a
> little odd to deliberately add code inside #if 0 ;-)

Yes, but it makes sense if you want to fix that code path later on.

And I'm not the only one who have added code here... git blame:

fa34ddd7 (Thomas Graf 2007-03-22 11:57:46 -0700 2401)
457c4cbc (Eric W. Biederman 2007-09-12 12:01:34 +0200 2413)

Cheers,
--
Med venlig hilsen / Best regards
Jesper Brouer
ComX Networks A/S
Linux Network developer
Cand. Scient Datalog / MSc.
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer

2009-06-25 13:59:05

by Patrick McHardy

[permalink] [raw]
Subject: Re: [PATCH v2 10/10] nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags

Jesper Dangaard Brouer wrote:
> On Wed, 2009-06-24 at 15:58 +0200, Patrick McHardy wrote:
>> Jesper Dangaard Brouer wrote:
>>> Adjusting SLAB_DESTROY_BY_RCU flags.
>>>
>>> kmem_cache_create("nf_conntrack", ...) does not need the
>>> SLAB_DESTROY_BY_RCU flag.
>> It does need it. We're using it instead of call_rcu() for conntracks.
>>
>>> But the
>>> kmem_cache_create("nf_conntrack_expect", ...) should use the
>>> SLAB_DESTROY_BY_RCU flag, because it uses a call_rcu() callback to
>>> invoke kmem_cache_free().
>> No, using call_rcu() means we don't need SLAB_DESTROY_BY_RCU.
>> Please see the note in include/linux/slab.h.
>
> Oh, I see. The description is some what cryptic, but I think I got it,
> after reading through the code.
>
> BUT this still means that we need to do rcu_barrier() if the
> SLAB_DESTROY_BY_RCU is NOT set and we do call_rcu() our self.

Correct, in that case its necessary.

> My understanding for the code is (please feel free to correct me): that
> if SLAB_DESTROY_BY_RCU _is_ set, then the __cache_shrink() call will
> call drain_freelist(), which calls slab_destroy().
>
> If SLAB_DESTROY_BY_RCU _is_ set, then slab_destroy() will then start a
> call_rcu() callback to kmem_rcu_free() which calls kmem_cache_free().
> Given that the callback code kmem_rcu_free() is not removed, we are not
> worried about unloading the module at this point.

Yep, thats my understanding as well.

> I'm a bit worried about what happens if __kmem_cache_destroy() is
> invoked and there is still callbacks for kmem_rcu_free() in flight?
> The synchronize_rcu() between __cache_shrink() and
> __kmem_cache_destroy() should perhaps be changed to rcu_barrier()?
>
> But I'm sure that the SLAB/MM guys will tell me that this case is
> handled (and something about its unlinked from the appropiate
> lists)??? ;-)

I'll leave that question to the MM guys :)

>>> RCU barriers, rcu_barrier(), is inserted two places.
>>>
>>> In nf_conntrack_expect.c nf_conntrack_expect_fini() before the
>>> kmem_cache_destroy(), even though the use of the SLAB_DESTROY_BY_RCU
>>> flag, because slub does not (currently) handle rcu sync correctly.
>> I think that should be fixed in slub then.
>
> I don't think so, we/I'm are talking about "nf_conntrack_expect" and not
> "nf_conntrack" slab. Clearly the slab "nf_conntrack" is handled
> correcly (according to description above).
>
> We still need to make sure the callbacks for "nf_conntrack_expect", are
> done before unloading/removing the code they are about to call.

Yes, my response was referring to potential sl*b bugs, but
you're correct, we do need rcu_barrier() for expectations.

>>> And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to
>>> wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is
>>> invoked by __nf_ct_ext_add(). It might be more efficient to call
>>> rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but
>>> thats make it more difficult to read the code (as the callback code
>>> in located in nf_conntrack_extend.c).
>> This one looks fine.
>
> Should I make two different patchs?

Either way is fine.

2009-06-25 14:33:44

by Patrick McHardy

[permalink] [raw]
Subject: Re: [PATCH v3 10/10] nf_conntrack: Use rcu_barrier()

Jesper Dangaard Brouer wrote:
> RCU barriers, rcu_barrier(), is inserted two places.
>
> In nf_conntrack_expect.c nf_conntrack_expect_fini() before the
> kmem_cache_destroy(). Firstly to make sure the callback to the
> nf_ct_expect_free_rcu() code is still around. Secondly because I'm
> unsure about the consequence of having in flight
> nf_ct_expect_free_rcu/kmem_cache_free() calls while doing a
> kmem_cache_destroy() slab destroy.
>
> And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to
> wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is
> invoked by __nf_ct_ext_add(). It might be more efficient to call
> rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but
> thats make it more difficult to read the code (as the callback code
> in located in nf_conntrack_extend.c).

Applied, thanks Jesper.

2009-06-25 19:32:24

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 10/10] nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags

On Thu, Jun 25, 2009 at 11:29:13AM +0200, Jesper Dangaard Brouer wrote:
>
> On Wed, 2009-06-24 at 15:58 +0200, Patrick McHardy wrote:
> > Jesper Dangaard Brouer wrote:
> > > Adjusting SLAB_DESTROY_BY_RCU flags.
> > >
> > > kmem_cache_create("nf_conntrack", ...) does not need the
> > > SLAB_DESTROY_BY_RCU flag.
> >
> > It does need it. We're using it instead of call_rcu() for conntracks.
> >
> > > But the
> > > kmem_cache_create("nf_conntrack_expect", ...) should use the
> > > SLAB_DESTROY_BY_RCU flag, because it uses a call_rcu() callback to
> > > invoke kmem_cache_free().
> >
> > No, using call_rcu() means we don't need SLAB_DESTROY_BY_RCU.
> > Please see the note in include/linux/slab.h.
>
> Oh, I see. The description is some what cryptic, but I think I got it,
> after reading through the code.
>
> BUT this still means that we need to do rcu_barrier() if the
> SLAB_DESTROY_BY_RCU is NOT set and we do call_rcu() our self.
>
> Look at: slab.c kmem_cache_destroy()
>
> void kmem_cache_destroy(struct kmem_cache *cachep)
> {
> ...<cut>...
> if (__cache_shrink(cachep)) {
> slab_error(cachep, "Can't free all objects");
> ...<cut>...
> return;
> }
>
> if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
> synchronize_rcu();
>
> __kmem_cache_destroy(cachep);
> ...<cut>...
> }
>
> My understanding for the code is (please feel free to correct me): that
> if SLAB_DESTROY_BY_RCU _is_ set, then the __cache_shrink() call will
> call drain_freelist(), which calls slab_destroy().
>
> If SLAB_DESTROY_BY_RCU _is_ set, then slab_destroy() will then start a
> call_rcu() callback to kmem_rcu_free() which calls kmem_cache_free().
> Given that the callback code kmem_rcu_free() is not removed, we are not
> worried about unloading the module at this point.
>
> I'm a bit worried about what happens if __kmem_cache_destroy() is
> invoked and there is still callbacks for kmem_rcu_free() in flight?
> The synchronize_rcu() between __cache_shrink() and
> __kmem_cache_destroy() should perhaps be changed to rcu_barrier()?

It looks to me like it should, good catch!!! I sent a proposed patch
to the maintainers.

Thanx, Paul

> But I'm sure that the SLAB/MM guys will tell me that this case is
> handled (and something about its unlinked from the appropiate
> lists)??? ;-)
>
>
> > > RCU barriers, rcu_barrier(), is inserted two places.
> > >
> > > In nf_conntrack_expect.c nf_conntrack_expect_fini() before the
> > > kmem_cache_destroy(), even though the use of the SLAB_DESTROY_BY_RCU
> > > flag, because slub does not (currently) handle rcu sync correctly.
> >
> > I think that should be fixed in slub then.
>
> I don't think so, we/I'm are talking about "nf_conntrack_expect" and not
> "nf_conntrack" slab. Clearly the slab "nf_conntrack" is handled
> correcly (according to description above).
>
> We still need to make sure the callbacks for "nf_conntrack_expect", are
> done before unloading/removing the code they are about to call.
>
>
> > > And in nf_conntrack_extend.c nf_ct_extend_unregister(), inorder to
> > > wait for completion of callbacks to __nf_ct_ext_free_rcu(), which is
> > > invoked by __nf_ct_ext_add(). It might be more efficient to call
> > > rcu_barrier() in nf_conntrack_core.c nf_conntrack_cleanup_net(), but
> > > thats make it more difficult to read the code (as the callback code
> > > in located in nf_conntrack_extend.c).
> >
> > This one looks fine.
>
> Should I make two different patchs?
>
>
> > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > index 5f72b94..438ce84 100644
> > > --- a/net/netfilter/nf_conntrack_core.c
> > > +++ b/net/netfilter/nf_conntrack_core.c
> > > @@ -1242,7 +1242,7 @@ static int nf_conntrack_init_init_net(void)
> > >
> > > nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
> > > sizeof(struct nf_conn),
> > > - 0, SLAB_DESTROY_BY_RCU, NULL);
> > > + 0, 0, NULL);
> > > if (!nf_conntrack_cachep) {
> > > printk(KERN_ERR "Unable to create nf_conn slab cache\n");
> > > ret = -ENOMEM;
> > > diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c
> > > index afde8f9..56227c2 100644
> > > --- a/net/netfilter/nf_conntrack_expect.c
> > > +++ b/net/netfilter/nf_conntrack_expect.c
> > > @@ -593,7 +593,7 @@ int nf_conntrack_expect_init(struct net *net)
> > > if (net_eq(net, &init_net)) {
> > > nf_ct_expect_cachep = kmem_cache_create("nf_conntrack_expect",
> > > sizeof(struct nf_conntrack_expect),
> > > - 0, 0, NULL);
> > > + 0, SLAB_DESTROY_BY_RCU, NULL);
> > > if (!nf_ct_expect_cachep)
> > > goto err2;
> > > }
> > > @@ -617,8 +617,15 @@ err1:
> > > void nf_conntrack_expect_fini(struct net *net)
> > > {
> > > exp_proc_remove(net);
> > > - if (net_eq(net, &init_net))
> > > + if (net_eq(net, &init_net)) {
> > > + /* [email protected] 2009-06-24: The rcu_barrier() can be
> > > + * removed once the sl*b allocators has been fixed
> > > + * regarding handling the SLAB_DESTROY_BY_RCU flag
> > > + * correctly.
> > > + */
> > > + rcu_barrier(); /* Wait for call_rcu() before destroy */
> > > kmem_cache_destroy(nf_ct_expect_cachep);
> > > + }
> > > nf_ct_free_hashtable(net->ct.expect_hash, net->ct.expect_vmalloc,
> > > nf_ct_expect_hsize);
> > > }
> > > diff --git a/net/netfilter/nf_conntrack_extend.c b/net/netfilter/nf_conntrack_extend.c
> > > index 4b2c769..fef95be 100644
> > > --- a/net/netfilter/nf_conntrack_extend.c
> > > +++ b/net/netfilter/nf_conntrack_extend.c
> > > @@ -186,6 +186,6 @@ void nf_ct_extend_unregister(struct nf_ct_ext_type *type)
> > > rcu_assign_pointer(nf_ct_ext_types[type->id], NULL);
> > > update_alloc_size(type);
> > > mutex_unlock(&nf_ct_ext_type_mutex);
> > > - synchronize_rcu();
> > > + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> > > }
> > > EXPORT_SYMBOL_GPL(nf_ct_extend_unregister);
> > >
> >
> --
> Med venlig hilsen / Best regards
> Jesper Brouer
> ComX Networks A/S
> Linux Network developer
> Cand. Scient Datalog / MSc.
> Author of http://adsl-optimizer.dk
> LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2009-06-25 23:10:46

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 07/10] decnet: Use rcu_barrier() on module unload.

From: Jesper Dangaard Brouer <[email protected]>
Date: Thu, 25 Jun 2009 13:52:09 +0200

> On Wed, 2009-06-24 at 14:50 +0100, Chrissie Caulfield wrote:
>> I don't have any objection to the patch at all,
>
> Thus assuming an:
>
> Acked-by: Chrissie Caulfield <christine.caulfield-gM/[email protected]>
>
> (wondering if patchworks picks this up...)

It usually does.

However, could you formally resubmit just the networking bits
as that would make life a lot easier for me.

Thanks a lot Jesper!

2009-06-26 19:18:24

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH 07/10] decnet: Use rcu_barrier() on module unload.

On Thu, 25 Jun 2009, David Miller wrote:

> However, could you formally resubmit just the networking bits
> as that would make life a lot easier for me.

As maintainer of now three kernel subsystems, you are allowed to push work
my way... And thanks to StGit (http://www.procode.org/stgit/) picking the
patchset a part is easy :-) (kudos to Catalin Marinas)

I'll resubmit the patches to you and netdev, to limit the spam effect...

Thus, you are getting 5 of the patches 02, 03, 04, 06 and 07. And I have
added the Acked-by's. And Patrick has already picked up the netfilter
patch.

> Thanks a lot Jesper!
You are welcome :-)

Cheers,
Jesper Brouer

--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------

2009-06-27 03:16:15

by Christian Kujau

[permalink] [raw]
Subject: Re: [PATCH 07/10] decnet: Use rcu_barrier() on module unload.

On Fri, 26 Jun 2009, Jesper Dangaard Brouer wrote:
> I'll resubmit the patches to you and netdev, to limit the spam effect...

Out of curiosity: why was linux-ext4 Cc'ed on these rcu_barrier patches
(but not other fs-lists but linux-nfs)? I did not see any ../fs/ext4/
changes.

Christian.
--
BOFH excuse #6:

global warming

2009-06-27 07:35:04

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH 07/10] decnet: Use rcu_barrier() on module unload.

On Fri, 2009-06-26 at 20:15 -0700, Christian Kujau wrote:
> On Fri, 26 Jun 2009, Jesper Dangaard Brouer wrote:
> > I'll resubmit the patches to you and netdev, to limit the spam effect...
>
> Out of curiosity: why was linux-ext4 Cc'ed on these rcu_barrier patches
> (but not other fs-lists but linux-nfs)? I did not see any ../fs/ext4/
> changes.

There was a ../fs/ext4/ change in patch [01/10].

Titled: "ext4: Use rcu_barrier() on module unload"

git show --stat d6a4ea73b7e8779607dd48735d9a9c521c890857
commit d6a4ea73b7e8779607dd48735d9a9c521c890857
Author: Jesper Dangaard Brouer <[email protected]>
Date: Tue Jun 23 15:40:54 2009 +0200

ext4: Use rcu_barrier() on module unload.

The ext4 module uses rcu_call() thus it should use rcu_barrier()on
module unload.

The kmem cache ext4_pspace_cachep is sometimes free'ed using
call_rcu() callbacks. Thus, we must wait for completion of call_rcu()
before doing kmem_cache_destroy().

I have difficult determining if no new call_rcu() callbacks can be envoked.
Would the maintainer please verify this?

Signed-off-by: Jesper Dangaard Brouer <[email protected]>

fs/ext4/mballoc.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

--
Med venlig hilsen / Best regards
Jesper Brouer
ComX Networks A/S
Linux Network developer
Cand. Scient Datalog / MSc.
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer


2009-07-06 02:31:13

by Theodore Y. Ts'o

[permalink] [raw]
Subject: Re: [PATCH 01/10] ext4: Use rcu_barrier() on module unload.

On Tue, Jun 23, 2009 at 05:03:59PM +0200, Jesper Dangaard Brouer wrote:
> The ext4 module uses rcu_call() thus it should use rcu_barrier()on
> module unload.
>
> The kmem cache ext4_pspace_cachep is sometimes free'ed using
> call_rcu() callbacks. Thus, we must wait for completion of call_rcu()
> before doing kmem_cache_destroy().
>
> I have difficult determining if no new call_rcu() callbacks can be envoked.
> Would the maintainer please verify this?

Yes, that's correct. Thanks; I've included this in the ext4 patch
queue.

- Ted