2009-06-08 13:11:23

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 0/5] We must use rcu_barrier() on module unload

If an unloadable module uses RCU callbacks, it need to use
rcu_barrier() so that the module may be safely unloaded.

While hacking on a netfilter module of my own, I learned the
importance of calling rcu_barrier() instead of only a
synchronize_rcu() on module unload (iif using any call_rcu()
callbacks). synchronize_rcu() does wait for a grace period to
elapse, but it does not wait for the callbacks to complete.

For documentation see:
http://lwn.net/Articles/217484/
Documentation/RCU/rcubarrier.txt


Looking through the net/ directory for call_rcu() users and unloadable
modules I found 5 modules that didn't behave correctly:

net/8021q/vlan.c

net/can/af_can.c

net/netfilter/nfnetlink_queue.c

net/sctp/protocol.c

net/sunrpc/auth_gss/auth_gss.c

I have made a patch for each individual module, so objections can be
made on a per module basis. I have Cc'ed all of the patches to the
maintainers of each module (according to the MAINTAINERS file).

The patchset is made on top of DaveM's net-next-2.6 tree (starting on
top of commit a1c1db392090). As this is my usual development tree,
even though this patchset is bugfixes. (Just tested it applied cleanly
on Linus'es tree ...)

---
Jesper Dangaard Brouer (5):
sunrpc/auth_gss: Call rcu_barrier() on module unload.
sctp: protocol.c call rcu_barrier() on unload.
can: af_can.c use rcu_barrier() on module unload.
nfnetlink_queue: Use rcu_barrier() on module unload.
8021q: Vlan driver should use rcu_barrier() on unload instead of syncronize_net()


net/8021q/vlan.c | 2 +-
net/can/af_can.c | 2 ++
net/netfilter/nfnetlink_queue.c | 4 +++-
net/sctp/protocol.c | 2 ++
net/sunrpc/auth_gss/auth_gss.c | 1 +
5 files changed, 9 insertions(+), 2 deletions(-)

--
Best regards,
Jesper Brouer
ComX Networks A/S
Linux Network developer
Cand. Scient Datalog / MSc.
Author of http://adsl-optimizer.dk
LinkedIn: http://www.linkedin.com/in/brouer



2009-06-09 16:26:25

by Vlad Yasevich

[permalink] [raw]
Subject: Re: [PATCH 4/5] sctp: protocol.c call rcu_barrier() on unload.

Paul E. McKenney wrote:
> On Tue, Jun 09, 2009 at 11:44:23AM -0400, Vlad Yasevich wrote:
>> Paul E. McKenney wrote:
>>> On Mon, Jun 08, 2009 at 03:11:43PM +0200, Jesper Dangaard Brouer wrote:
>>>> On module unload call rcu_barrier(), this is needed as synchronize_rcu()
>>>> is not strong enough. The kmem_cache_destroy() does invoke
>>>> synchronize_rcu() but it does not provide same protection.
>>> Good, looks like sctp_v4_del_protocol() invokes call_rcu(), which the
>>> rcu_barrier() would then wait for. And it looks like sctp_v6_del_protocol()
>>> does the same for IPv6.
>>>
>>> Reviewed_by: Paul E. McKenney <[email protected]>
>>>
>>>> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
>>>> ---
>>>>
>>>> net/sctp/protocol.c | 2 ++
>>>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
>>>> index cb2c50d..79cbd47 100644
>>>> --- a/net/sctp/protocol.c
>>>> +++ b/net/sctp/protocol.c
>>>> @@ -1370,6 +1370,8 @@ SCTP_STATIC __exit void sctp_exit(void)
>>>> sctp_proc_exit();
>>>> cleanup_sctp_mibs();
>>>>
>>>> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
>>>> +
>>>> kmem_cache_destroy(sctp_chunk_cachep);
>>>> kmem_cache_destroy(sctp_bucket_cachep);
>>>> }
>> Shouldn't the rcu_barrier call be before sctp_free_local_addr_list()?
>
> Hmmm... What sequence of events would lead to a failure if
> rcu_barrier() is after sctp_free_local_addr_list()?
>
> Thanx, Paul
>

I thought that the notifier could could potentially execute at the
same time as the unregister(), but I see that's protected. So, I guess
it doesn't really matter then where the barrier is.

Acked-by: Vlad Yasevich <[email protected]>

-vlad

2009-06-10 05:38:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2/5] nfnetlink_queue: Use rcu_barrier() on module unload.

On Mon, 08 Jun 2009 15:11:33 +0200 Jesper Dangaard Brouer <[email protected]> wrote:

> This module uses rcu_call() thus it should use rcu_barrier() on module unload.
>
> Also fixed a trivial typo 'nfetlink' -> 'nfnetlink' in comment.
>
> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> ---
>
> net/netfilter/nfnetlink_queue.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
> index 8c86011..71daa09 100644
> --- a/net/netfilter/nfnetlink_queue.c
> +++ b/net/netfilter/nfnetlink_queue.c
> @@ -1,6 +1,6 @@
> /*
> * This is a module which is used for queueing packets and communicating with
> - * userspace via nfetlink.
> + * userspace via nfnetlink.
> *
> * (C) 2005 by Harald Welte <[email protected]>
> * (C) 2007 by Patrick McHardy <[email protected]>
> @@ -932,6 +932,8 @@ static void __exit nfnetlink_queue_fini(void)
> #endif
> nfnetlink_subsys_unregister(&nfqnl_subsys);
> netlink_unregister_notifier(&nfqnl_rtnl_notifier);
> +
> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> }
>
> MODULE_DESCRIPTION("netfilter packet queue handler");

Possibly you've fixed the bug which the module_put(THIS_MODULE) in
instance_destroy_rcu() is addressing.

Do we still need to take a ref against the module for each instance
once the above fix is in place?

<goes git mining>

Nope, the THIS_MODULE games have been there since day one, and I can't
work out why they're there. net/netfilter/nfnetlink_log.c has them
too.


2009-06-10 08:10:27

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 3/5] can: af_can.c use rcu_barrier() on module unload.

From: Oliver Hartkopp <[email protected]>
Date: Mon, 08 Jun 2009 15:24:58 +0200

> Acked-By: Oliver Hartkopp <[email protected]>

Please don't capitalize the "By" in "Acked-By". Otherwise
patchwork doesn't pick them up automatically.

It's just like "Signed-off-by", only the first word is capitalized.

Thank you.

2009-06-10 08:11:53

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 0/5] We must use rcu_barrier() on module unload

From: Jesper Dangaard Brouer <[email protected]>
Date: Mon, 08 Jun 2009 15:11:23 +0200

> If an unloadable module uses RCU callbacks, it need to use
> rcu_barrier() so that the module may be safely unloaded.
...
> The patchset is made on top of DaveM's net-next-2.6 tree (starting on
> top of commit a1c1db392090). As this is my usual development tree,
> even though this patchset is bugfixes. (Just tested it applied cleanly
> on Linus'es tree ...)

All 5 patches applied, thanks!

2009-06-10 10:02:50

by Alan

[permalink] [raw]
Subject: Re: [PATCH 3/5] can: af_can.c use rcu_barrier() on module unload.

On Wed, 10 Jun 2009 01:10:27 -0700 (PDT)
David Miller <[email protected]> wrote:

> From: Oliver Hartkopp <[email protected]>
> Date: Mon, 08 Jun 2009 15:24:58 +0200
>
> > Acked-By: Oliver Hartkopp <[email protected]>
>
> Please don't capitalize the "By" in "Acked-By". Otherwise
> patchwork doesn't pick them up automatically.

Dave. We have computers to do all the silly tedious pointless jobs in
life, please just fix the scripts.

Alan

2009-06-10 11:33:15

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH 3/5] can: af_can.c use rcu_barrier() on module unload.


On Wednesday 2009-06-10 10:10, David Miller wrote:
>> Acked-By: Oliver Hartkopp <[email protected]>
>
>Please don't capitalize the "By" in "Acked-By". Otherwise
>patchwork doesn't pick them up automatically.

Is not that a patchwork bug then that should be reported? Or a
missing feature to checkpatch? Do we need a CodingStyle for
SOBs now too, just for that?

(Personally I prefer lowercase after the dash too, but I wanted to
illuminate all sides of the box.)

2009-06-08 13:24:58

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: [PATCH 3/5] can: af_can.c use rcu_barrier() on module unload.

Jesper Dangaard Brouer wrote:
> This module uses rcu_call() thus it should use rcu_barrier()
> on module unload.
>
> Signed-off-by: Jesper Dangaard Brouer <[email protected]>

Thanks Jesper for pointing at this issue!

Acked-By: Oliver Hartkopp <[email protected]>

Btw. i do agree with theses patches to be a bug fix that should go into
2.6.30-rc8 as well as into the stable series.

Best regards,
Oliver


> ---
>
> net/can/af_can.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/net/can/af_can.c b/net/can/af_can.c
> index 10f0528..e733725 100644
> --- a/net/can/af_can.c
> +++ b/net/can/af_can.c
> @@ -903,6 +903,8 @@ static __exit void can_exit(void)
> }
> spin_unlock(&can_rcvlists_lock);
>
> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> +
> kmem_cache_destroy(rcv_cache);
> }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2009-06-08 13:11:48

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 5/5] sunrpc/auth_gss: Call rcu_barrier() on module unload.

As the module uses rcu_call() we should make sure that all
rcu callback has been completed before removing the code.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/sunrpc/auth_gss/auth_gss.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index e630b38..66d458f 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -1548,6 +1548,7 @@ static void __exit exit_rpcsec_gss(void)
{
gss_svc_shutdown();
rpcauth_unregister(&authgss_ops);
+ rcu_barrier(); /* Wait for completion of call_rcu()'s */
}

MODULE_LICENSE("GPL");


2009-06-08 14:00:29

by Patrick McHardy

[permalink] [raw]
Subject: Re: [PATCH 0/5] We must use rcu_barrier() on module unload

Jesper Dangaard Brouer wrote:
> If an unloadable module uses RCU callbacks, it need to use
> rcu_barrier() so that the module may be safely unloaded.
>
> While hacking on a netfilter module of my own, I learned the
> importance of calling rcu_barrier() instead of only a
> synchronize_rcu() on module unload (iif using any call_rcu()
> callbacks). synchronize_rcu() does wait for a grace period to
> elapse, but it does not wait for the callbacks to complete.
>
> ...
> I have made a patch for each individual module, so objections can be
> made on a per module basis. I have Cc'ed all of the patches to the
> maintainers of each module (according to the MAINTAINERS file).

Acked-by: Patrick McHardy <[email protected]> for patches 1 and 2, good
catch.

2009-06-08 15:54:46

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/5] 8021q: Vlan driver should use rcu_barrier() on unload instead of syncronize_net()

On Mon, Jun 08, 2009 at 03:11:28PM +0200, Jesper Dangaard Brouer wrote:
> The VLAN 8021q driver needs to call rcu_barrier() when unloading the module,
> instead of syncronize_net(). This is needed to make sure that outstanding
> call_rcu() callbacks have completed, before the callback function code is
> removed on module unload.

Looks good! And thank you for checking up on this!!!

Reviewed-by: Paul E. McKenney <[email protected]>

> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> ---
>
> net/8021q/vlan.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 714e1c3..fe64908 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -758,7 +758,7 @@ static void __exit vlan_cleanup_module(void)
> BUG_ON(!hlist_empty(&vlan_group_hash[i]));
>
> unregister_pernet_gen_device(vlan_net_id, &vlan_net_ops);
> - synchronize_net();
> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
>
> vlan_gvrp_uninit();
> }
>

2009-06-08 16:05:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/5] nfnetlink_queue: Use rcu_barrier() on module unload.

On Mon, Jun 08, 2009 at 03:11:33PM +0200, Jesper Dangaard Brouer wrote:
> This module uses rcu_call() thus it should use rcu_barrier() on module unload.
>
> Also fixed a trivial typo 'nfetlink' -> 'nfnetlink' in comment.

Assuming that netlink_unregister_notifier(), nfnetlink_subsys_unregister(),
and so on prevent any subsequent calls to call_rcu(), looks good!!!

Acked-by: Paul E. McKenney <[email protected]>

> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> ---
>
> net/netfilter/nfnetlink_queue.c | 4 +++-
> 1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
> index 8c86011..71daa09 100644
> --- a/net/netfilter/nfnetlink_queue.c
> +++ b/net/netfilter/nfnetlink_queue.c
> @@ -1,6 +1,6 @@
> /*
> * This is a module which is used for queueing packets and communicating with
> - * userspace via nfetlink.
> + * userspace via nfnetlink.
> *
> * (C) 2005 by Harald Welte <[email protected]>
> * (C) 2007 by Patrick McHardy <[email protected]>
> @@ -932,6 +932,8 @@ static void __exit nfnetlink_queue_fini(void)
> #endif
> nfnetlink_subsys_unregister(&nfqnl_subsys);
> netlink_unregister_notifier(&nfqnl_rtnl_notifier);
> +
> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> }
>
> MODULE_DESCRIPTION("netfilter packet queue handler");
>

2009-06-08 16:13:29

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 3/5] can: af_can.c use rcu_barrier() on module unload.

On Mon, Jun 08, 2009 at 03:11:38PM +0200, Jesper Dangaard Brouer wrote:
> This module uses rcu_call() thus it should use rcu_barrier()
> on module unload.

This does appear to make things better!!!

However, I don't understand why it is safe to do the following in
can_exit():

hlist_for_each_entry_safe(d, n, next, &can_rx_dev_list, list) {
hlist_del(&d->list);
kfree(d);
}

Given that this list is scanned by RCU readers, shouldn't this kfree()
be something like "call_rcu(&d->rcu, can_rx_delete_device);"?

Also, what frees up the "struct receiver" structures?

Thanx, Paul

> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> ---
>
> net/can/af_can.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/net/can/af_can.c b/net/can/af_can.c
> index 10f0528..e733725 100644
> --- a/net/can/af_can.c
> +++ b/net/can/af_can.c
> @@ -903,6 +903,8 @@ static __exit void can_exit(void)
> }
> spin_unlock(&can_rcvlists_lock);
>
> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> +
> kmem_cache_destroy(rcv_cache);
> }
>
>

2009-06-08 16:22:27

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 4/5] sctp: protocol.c call rcu_barrier() on unload.

On Mon, Jun 08, 2009 at 03:11:43PM +0200, Jesper Dangaard Brouer wrote:
> On module unload call rcu_barrier(), this is needed as synchronize_rcu()
> is not strong enough. The kmem_cache_destroy() does invoke
> synchronize_rcu() but it does not provide same protection.

Good, looks like sctp_v4_del_protocol() invokes call_rcu(), which the
rcu_barrier() would then wait for. And it looks like sctp_v6_del_protocol()
does the same for IPv6.

Reviewed_by: Paul E. McKenney <[email protected]>

> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> ---
>
> net/sctp/protocol.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index cb2c50d..79cbd47 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -1370,6 +1370,8 @@ SCTP_STATIC __exit void sctp_exit(void)
> sctp_proc_exit();
> cleanup_sctp_mibs();
>
> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> +
> kmem_cache_destroy(sctp_chunk_cachep);
> kmem_cache_destroy(sctp_bucket_cachep);
> }
>

2009-06-08 16:26:56

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 5/5] sunrpc/auth_gss: Call rcu_barrier() on module unload.

On Mon, Jun 08, 2009 at 03:11:48PM +0200, Jesper Dangaard Brouer wrote:
> As the module uses rcu_call() we should make sure that all
> rcu callback has been completed before removing the code.

Good improvement!!! Assuming that gss_svc_shutdown() and
rpcauth_unregister() prevent any future RCU callbacks to be registered,
this patch will fix things up.

Acked-by: Paul E. McKenney <[email protected]>

> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> ---
>
> net/sunrpc/auth_gss/auth_gss.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> index e630b38..66d458f 100644
> --- a/net/sunrpc/auth_gss/auth_gss.c
> +++ b/net/sunrpc/auth_gss/auth_gss.c
> @@ -1548,6 +1548,7 @@ static void __exit exit_rpcsec_gss(void)
> {
> gss_svc_shutdown();
> rpcauth_unregister(&authgss_ops);
> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> }
>
> MODULE_LICENSE("GPL");
>

2009-06-08 17:00:10

by Oliver Hartkopp

[permalink] [raw]
Subject: Re: [PATCH 3/5] can: af_can.c use rcu_barrier() on module unload.

Paul E. McKenney wrote:
> On Mon, Jun 08, 2009 at 03:11:38PM +0200, Jesper Dangaard Brouer wrote:
>> This module uses rcu_call() thus it should use rcu_barrier()
>> on module unload.
>
> This does appear to make things better!!!
>
> However, I don't understand why it is safe to do the following in
> can_exit():
>
> hlist_for_each_entry_safe(d, n, next, &can_rx_dev_list, list) {
> hlist_del(&d->list);
> kfree(d);
> }
>
> Given that this list is scanned by RCU readers, shouldn't this kfree()
> be something like "call_rcu(&d->rcu, can_rx_delete_device);"?
>
> Also, what frees up the "struct receiver" structures?

Hi Paul,

af_can.c only provides an infrastructure for PF_CAN modules like can-raw.ko,
can-bcm.ko or can-isotp.ko.

Please take a look into can_notifier() in net/can/af_can.c and raw_notifier()
in net/can/raw.c:

The receivers are removed when the appropriate socket is closed that created
the belonging receivers. And you can not remove can.ko (af_can.c) when another
PF_CAN protocol like can-raw.ko is using it.

So when a netdev notifier removes the interface both the PF_CAN protocol (e.g.
can-raw.ko) and the PF_CAN core (af_can.c) cleans up all receivers and finally
removes the per-interface structure dev_rcv_lists (e.g. for can0).

In can_exit() all the dev_rcv_lists for ARPHRD_CAN interfaces are removed that
had been created by NETDEV_REGISTER notifier and are unused by any of the
PF_CAN protocols and therefore without any receivers attached to them.

The list is protected by spin_lock(&can_rcvlists_lock) - which is probably not
even needed in this particular case - and there is no PF_CAN protocol
registered at this time. So it's really save to remove the empty dev_rcv_lists
structs here that do not link to any receivers.

Puh - much text. But i hope it clarifies it.

Thinking about the rcu stuff again, rcu_barrier() still makes sense when you
are unloading the module chain of can-raw.ko and can.ko very fast.

Regards,
Oliver


>> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
>> ---
>>
>> net/can/af_can.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/can/af_can.c b/net/can/af_can.c
>> index 10f0528..e733725 100644
>> --- a/net/can/af_can.c
>> +++ b/net/can/af_can.c
>> @@ -903,6 +903,8 @@ static __exit void can_exit(void)
>> }
>> spin_unlock(&can_rcvlists_lock);
>>
>> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
>> +
>> kmem_cache_destroy(rcv_cache);
>> }

2009-06-08 17:01:16

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH 5/5] sunrpc/auth_gss: Call rcu_barrier() on module unload.

On Mon, 2009-06-08 at 09:26 -0700, Paul E. McKenney wrote:
> On Mon, Jun 08, 2009 at 03:11:48PM +0200, Jesper Dangaard Brouer wrote:
> > As the module uses rcu_call() we should make sure that all
> > rcu callback has been completed before removing the code.
>
> Good improvement!!! Assuming that gss_svc_shutdown() and
> rpcauth_unregister() prevent any future RCU callbacks to be registered,
> this patch will fix things up.
>
> Acked-by: Paul E. McKenney <[email protected]>
>
> > Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> > ---
> >
> > net/sunrpc/auth_gss/auth_gss.c | 1 +
> > 1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> > index e630b38..66d458f 100644
> > --- a/net/sunrpc/auth_gss/auth_gss.c
> > +++ b/net/sunrpc/auth_gss/auth_gss.c
> > @@ -1548,6 +1548,7 @@ static void __exit exit_rpcsec_gss(void)
> > {
> > gss_svc_shutdown();
> > rpcauth_unregister(&authgss_ops);
> > + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> > }
> >
> > MODULE_LICENSE("GPL");
> >

Hmm... If this is about ensuring that all the call_rcu() callbacks
complete before we remove the module, then don't we also need similar
barriers in net/sunrpc/sunrpc_syms.c:cleanup_sunrpc() and in
fs/nfs/inode.c:exit_nfs_fs()?

Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2009-06-08 19:49:41

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH 5/5] sunrpc/auth_gss: Call rcu_barrier() on module unload.


(trimmed Cc)

On Mon, 8 Jun 2009, Trond Myklebust wrote:

> Hmm... If this is about ensuring that all the call_rcu() callbacks
> complete before we remove the module, then don't we also need similar
> barriers in

Well, Trond that was a hard question, as I don't know this code... but if
it uses call_rcu() callbacks its most likely.

I have not done a complete sweep of the tree, I have only looked at the
netdev parts, as this is where I usually snoop around. So there is
probably plenty of work for some kernel-janitors (Cc'ing the list ;-))

> net/sunrpc/sunrpc_syms.c:cleanup_sunrpc()

Looking at the code that end up in sunrpc.ko, I see that both
net/sunrpc/auth_unix.c and net/sunrpc/auth_generic.c uses call_rcu()
callbacks.

> and in fs/nfs/inode.c:exit_nfs_fs()?

Looking at the code which end up into nfs.ko (which includes
fs/nfs/inode.c) I see that the fs/nfs/delegation.c uses call_rcu()
callbacks.

But its hard for me to figure out how this code interacts. Don't know if
we need to document a code path that can occur fast enough, before we add
this safe guard? It should be Paul's saying...

Cheers,
Jesper Brouer

--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------

2009-06-08 21:13:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 5/5] sunrpc/auth_gss: Call rcu_barrier() on module unload.

On Mon, Jun 08, 2009 at 09:48:54PM +0200, Jesper Dangaard Brouer wrote:
>
> (trimmed Cc)
>
> On Mon, 8 Jun 2009, Trond Myklebust wrote:
>
>> Hmm... If this is about ensuring that all the call_rcu() callbacks
>> complete before we remove the module, then don't we also need similar
>> barriers in
>
> Well, Trond that was a hard question, as I don't know this code... but if
> it uses call_rcu() callbacks its most likely.
>
> I have not done a complete sweep of the tree, I have only looked at the
> netdev parts, as this is where I usually snoop around. So there is
> probably plenty of work for some kernel-janitors (Cc'ing the list ;-))
>
>> net/sunrpc/sunrpc_syms.c:cleanup_sunrpc()
>
> Looking at the code that end up in sunrpc.ko, I see that both
> net/sunrpc/auth_unix.c and net/sunrpc/auth_generic.c uses call_rcu()
> callbacks.
>
>> and in fs/nfs/inode.c:exit_nfs_fs()?
>
> Looking at the code which end up into nfs.ko (which includes
> fs/nfs/inode.c) I see that the fs/nfs/delegation.c uses call_rcu()
> callbacks.
>
> But its hard for me to figure out how this code interacts. Don't know if
> we need to document a code path that can occur fast enough, before we add
> this safe guard? It should be Paul's saying...

Unless there is some other mechanism to ensure that all the RCU
callbacks have been invoked before the module exit, there needs to be
code in the module-exit function that does the following:

o Prevent any new RCU callbacks from being posted. In other
words, make sure that no future call_rcu() invocations happen
from this module -unless- those call_rcu() invocations touch
only functions and data that outlive this module.

o Invoke rcu_barrier().

o Of course, if the module uses call_rcu_sched() instead of
call_rcu(), then it should invoke rcu_barrier_sched() instead
of rcu_barrier(). Similarly, if it uses call_rcu_bh() instead
of call_rcu(), then it should invoke rcu_barrier_bh() instead of
rcu_barrier(). If the module uses more than one of call_rcu(),
call_rcu_sched(), and call_rcu_bh(), then it must invoke more than
one of rcu_barrier(), rcu_barrier_sched(), and rcu_barrier_bh().

What other mechanism could be used? I cannot think of one that it safe.
For example, a module that tried to count the number of RCU callbacks in
flight would be vulnerable to races as follows:

1. CPU 0: RCU callback decrements the counter.

2. CPU 1: module-exit function notices that the counter is zero,
so removes the module.

3. CPU 0: attempts to execute the code returning from the RCU
callback, and dies horribly due to that code no longer being
in memory.

If there was an easy solution (or even a hard solution) to this problem,
then I do not believe that Nikita Danilov would have asked Dipankar Sarma
for rcu_barrier(). Therefore, I do not expect anyone to be able to come
up with an alternative to rcu_barrier() and friends. Always happy to
learn something by being proven wrong, of course!!!

So unless someone can show me some other safe mechanism, every unloadable
module that uses call_rcu(), call_rcu_sched(), or call_rcu_bh() must use
rcu_barrier(), rcu_barrier_sched(), and/or rcu_barrier_bh() in its
module-exit function.

Thanx, Paul

2009-06-09 15:44:23

by Vlad Yasevich

[permalink] [raw]
Subject: Re: [PATCH 4/5] sctp: protocol.c call rcu_barrier() on unload.

Paul E. McKenney wrote:
> On Mon, Jun 08, 2009 at 03:11:43PM +0200, Jesper Dangaard Brouer wrote:
>> On module unload call rcu_barrier(), this is needed as synchronize_rcu()
>> is not strong enough. The kmem_cache_destroy() does invoke
>> synchronize_rcu() but it does not provide same protection.
>
> Good, looks like sctp_v4_del_protocol() invokes call_rcu(), which the
> rcu_barrier() would then wait for. And it looks like sctp_v6_del_protocol()
> does the same for IPv6.
>
> Reviewed_by: Paul E. McKenney <[email protected]>
>
>> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
>> ---
>>
>> net/sctp/protocol.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
>> index cb2c50d..79cbd47 100644
>> --- a/net/sctp/protocol.c
>> +++ b/net/sctp/protocol.c
>> @@ -1370,6 +1370,8 @@ SCTP_STATIC __exit void sctp_exit(void)
>> sctp_proc_exit();
>> cleanup_sctp_mibs();
>>
>> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
>> +
>> kmem_cache_destroy(sctp_chunk_cachep);
>> kmem_cache_destroy(sctp_bucket_cachep);
>> }
>>

Shouldn't the rcu_barrier call be before sctp_free_local_addr_list()?

-vlad

2009-06-09 15:50:11

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 4/5] sctp: protocol.c call rcu_barrier() on unload.

On Tue, Jun 09, 2009 at 11:44:23AM -0400, Vlad Yasevich wrote:
> Paul E. McKenney wrote:
> > On Mon, Jun 08, 2009 at 03:11:43PM +0200, Jesper Dangaard Brouer wrote:
> >> On module unload call rcu_barrier(), this is needed as synchronize_rcu()
> >> is not strong enough. The kmem_cache_destroy() does invoke
> >> synchronize_rcu() but it does not provide same protection.
> >
> > Good, looks like sctp_v4_del_protocol() invokes call_rcu(), which the
> > rcu_barrier() would then wait for. And it looks like sctp_v6_del_protocol()
> > does the same for IPv6.
> >
> > Reviewed_by: Paul E. McKenney <[email protected]>
> >
> >> Signed-off-by: Jesper Dangaard Brouer <[email protected]>
> >> ---
> >>
> >> net/sctp/protocol.c | 2 ++
> >> 1 files changed, 2 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> >> index cb2c50d..79cbd47 100644
> >> --- a/net/sctp/protocol.c
> >> +++ b/net/sctp/protocol.c
> >> @@ -1370,6 +1370,8 @@ SCTP_STATIC __exit void sctp_exit(void)
> >> sctp_proc_exit();
> >> cleanup_sctp_mibs();
> >>
> >> + rcu_barrier(); /* Wait for completion of call_rcu()'s */
> >> +
> >> kmem_cache_destroy(sctp_chunk_cachep);
> >> kmem_cache_destroy(sctp_bucket_cachep);
> >> }
>
> Shouldn't the rcu_barrier call be before sctp_free_local_addr_list()?

Hmmm... What sequence of events would lead to a failure if
rcu_barrier() is after sctp_free_local_addr_list()?

Thanx, Paul

2009-06-08 13:11:28

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 1/5] 8021q: Vlan driver should use rcu_barrier() on unload instead of syncronize_net()

The VLAN 8021q driver needs to call rcu_barrier() when unloading the module,
instead of syncronize_net(). This is needed to make sure that outstanding
call_rcu() callbacks have completed, before the callback function code is
removed on module unload.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/8021q/vlan.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 714e1c3..fe64908 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -758,7 +758,7 @@ static void __exit vlan_cleanup_module(void)
BUG_ON(!hlist_empty(&vlan_group_hash[i]));

unregister_pernet_gen_device(vlan_net_id, &vlan_net_ops);
- synchronize_net();
+ rcu_barrier(); /* Wait for completion of call_rcu()'s */

vlan_gvrp_uninit();
}


2009-06-08 13:11:33

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 2/5] nfnetlink_queue: Use rcu_barrier() on module unload.

This module uses rcu_call() thus it should use rcu_barrier() on module unload.

Also fixed a trivial typo 'nfetlink' -> 'nfnetlink' in comment.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/netfilter/nfnetlink_queue.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 8c86011..71daa09 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -1,6 +1,6 @@
/*
* This is a module which is used for queueing packets and communicating with
- * userspace via nfetlink.
+ * userspace via nfnetlink.
*
* (C) 2005 by Harald Welte <[email protected]>
* (C) 2007 by Patrick McHardy <[email protected]>
@@ -932,6 +932,8 @@ static void __exit nfnetlink_queue_fini(void)
#endif
nfnetlink_subsys_unregister(&nfqnl_subsys);
netlink_unregister_notifier(&nfqnl_rtnl_notifier);
+
+ rcu_barrier(); /* Wait for completion of call_rcu()'s */
}

MODULE_DESCRIPTION("netfilter packet queue handler");


2009-06-08 13:11:38

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 3/5] can: af_can.c use rcu_barrier() on module unload.

This module uses rcu_call() thus it should use rcu_barrier()
on module unload.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/can/af_can.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/can/af_can.c b/net/can/af_can.c
index 10f0528..e733725 100644
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -903,6 +903,8 @@ static __exit void can_exit(void)
}
spin_unlock(&can_rcvlists_lock);

+ rcu_barrier(); /* Wait for completion of call_rcu()'s */
+
kmem_cache_destroy(rcv_cache);
}



2009-06-08 13:11:43

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: [PATCH 4/5] sctp: protocol.c call rcu_barrier() on unload.

On module unload call rcu_barrier(), this is needed as synchronize_rcu()
is not strong enough. The kmem_cache_destroy() does invoke
synchronize_rcu() but it does not provide same protection.

Signed-off-by: Jesper Dangaard Brouer <[email protected]>
---

net/sctp/protocol.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index cb2c50d..79cbd47 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1370,6 +1370,8 @@ SCTP_STATIC __exit void sctp_exit(void)
sctp_proc_exit();
cleanup_sctp_mibs();

+ rcu_barrier(); /* Wait for completion of call_rcu()'s */
+
kmem_cache_destroy(sctp_chunk_cachep);
kmem_cache_destroy(sctp_bucket_cachep);
}