2007-05-02 11:50:48

by David Howells

[permalink] [raw]
Subject: Race between RCU and rmmod


Hi Dipankar, Rusty,

I seem to have found a race between RCU and rmmod. What I see appears to be
an RCU destructor function that has a call pending but lives in a module, gets
deleted before the RCU callback is processed:

RIP: 0010:[<ffffffff880329b7>] [<ffffffff880329b7>]
RSP: 0018:ffffffff805a4f08 EFLAGS: 00010296
RAX: 0000000000000026 RBX: ffff81002fe53a38 RCX: 0000000000000026
RDX: ffff810001f730c0 RSI: 0000000000000001 RDI: 0000000000000002
RBP: ffffffff805a4f18 R08: 0000000000000002 R09: ffffffff80228eab
R10: ffffffff805a4c98 R11: ffffffff805a4e28 R12: ffff81003979e140
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffffff80552000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffffff880329b7 CR3: 0000000000201000 CR4: 00000000000006e0
Process ksoftirqd/0 (pid: 3, threadinfo ffff81003e626000, task ffff810001f730c0)
Stack: ffff81002fe53a38 ffff810001dc5040 ffffffff805a4f48 ffffffff8023946a
ffff810001dc5140 0000000000000000 000000000000000a 0000000000000000
ffffffff805a4f58 ffffffff8023952c ffffffff805a4f78 ffffffff8022dca3
Call Trace:
<IRQ> [<ffffffff8023946a>] __rcu_process_callbacks+0x15d/0x1fc
[<ffffffff8023952c>] rcu_process_callbacks+0x23/0x44
[<ffffffff8022dca3>] tasklet_action+0x5e/0xb2
[<ffffffff8022db99>] __do_softirq+0x5f/0xe3
[<ffffffff8022daa9>] ksoftirqd+0x0/0x91
[<ffffffff8020a94c>] call_softirq+0x1c/0x28
<EOI> [<ffffffff8020c562>] do_softirq+0x39/0x9f
[<ffffffff8022dafb>] ksoftirqd+0x52/0x91
[<ffffffff8023b79e>] kthread+0xd8/0x10e
[<ffffffff8020a5d8>] child_rip+0xa/0x12
[<ffffffff8041b100>] _spin_unlock_irq+0x2b/0x31
[<ffffffff80209cec>] restore_args+0x0/0x30
[<ffffffff8023b6c6>] kthread+0x0/0x10e
[<ffffffff8020a5ce>] child_rip+0x0/0x12

I think that rmmod needs to clear the RCU destructor queue, probably inside of
__try_stop_module().

David


2007-05-02 12:01:38

by Dipankar Sarma

[permalink] [raw]
Subject: Re: Race between RCU and rmmod

On Wed, May 02, 2007 at 12:50:24PM +0100, David Howells wrote:
>
> Hi Dipankar, Rusty,
>
> I seem to have found a race between RCU and rmmod. What I see appears to be
> an RCU destructor function that has a call pending but lives in a module, gets
> deleted before the RCU callback is processed:
>
> RIP: 0010:[<ffffffff880329b7>] [<ffffffff880329b7>]
>
> I think that rmmod needs to clear the RCU destructor queue, probably inside of
> __try_stop_module().

This is why we have rcu_barrier() although the corresponding documentation
patch seems to have got dropped. Modules that use RCU must call
rcu_barrier() in their cleanup routine.

Thanks
Dipankar

2007-05-02 12:34:17

by Rusty Russell

[permalink] [raw]
Subject: Re: Race between RCU and rmmod

On Wed, 2007-05-02 at 17:30 +0530, Dipankar Sarma wrote:
> On Wed, May 02, 2007 at 12:50:24PM +0100, David Howells wrote:
> >
> > Hi Dipankar, Rusty,
> >
> > I seem to have found a race between RCU and rmmod. What I see appears to be
> > an RCU destructor function that has a call pending but lives in a module, gets
> > deleted before the RCU callback is processed:
> >
> > RIP: 0010:[<ffffffff880329b7>] [<ffffffff880329b7>]
> >
> > I think that rmmod needs to clear the RCU destructor queue, probably inside of
> > __try_stop_module().
>
> This is why we have rcu_barrier() although the corresponding documentation
> patch seems to have got dropped. Modules that use RCU must call
> rcu_barrier() in their cleanup routine.

Hi David, Dipankar,

My first thought was wondering if doing it for them isn't a bad thing,
but they need to do it if they've got other teardown which would mess up
RCU callbacks anyway...

Cheers,
Rusty.


2007-05-02 18:48:19

by Andrew Morton

[permalink] [raw]
Subject: Re: Race between RCU and rmmod

On Wed, 2 May 2007 17:30:51 +0530
Dipankar Sarma <[email protected]> wrote:

> On Wed, May 02, 2007 at 12:50:24PM +0100, David Howells wrote:
> >
> > Hi Dipankar, Rusty,
> >
> > I seem to have found a race between RCU and rmmod. What I see appears to be
> > an RCU destructor function that has a call pending but lives in a module, gets
> > deleted before the RCU callback is processed:
> >
> > RIP: 0010:[<ffffffff880329b7>] [<ffffffff880329b7>]
> >
> > I think that rmmod needs to clear the RCU destructor queue, probably inside of
> > __try_stop_module().
>
> This is why we have rcu_barrier() although the corresponding documentation
> patch seems to have got dropped. Modules that use RCU must call
> rcu_barrier() in their cleanup routine.
>

hm, I never knew that.

akpm:/usr/src/linux-2.6.21> grep -r rcu_ drivers | wc -l
182
akpm:/usr/src/linux-2.6.21> grep -r rcu_barrier drivers | wc -l
0

For a start we should undrop that documentation patch, please.