On Fri, Aug 20, 2004 at 10:19:26PM +0200, Manfred Spraul wrote:
| Jesse Barnes wrote:
|
| >Looks like a bit more context has changed. Manfred, care to respin
| >against -mm3 so I can test?
| >
| >
| >
| The patches are attached. Just boot-tested on a single-cpu system.
|
| Three changes:
| - I've placed the per-group structure into rcu_state. That's simpler but
| wrong: the state should be allocated from node-local memory, not a big
| global array.
| - I found a bug/race in the cpu_offline path: When the last cpu of a
| group goes offline then the group must be forced into quiescent state.
| The "&& (!forced)" was missing.
| - I've removed the spin_unlock_wait(). It was intended to synchronize
| cpu_online_mask changes with the calculation of ->outstanding. Paul
| convinced me that this is not necessary.
Manfred,
I just tried these patches against 2.6.9-rc1-mm3, on a 2p box, and I'm
getting intermittent hangs, either during bootup or shortly after reaching
multi-user. In all cases, one of the processes is stuck on rcu_bh_state:
[0]kdb> bt
Stack traceback for pid 0
0xa000000100970000 0 0 1 0 I 0xa000000100970430
*swapper
0xa0000001000090b0 ia64_spinlock_contention_pre3_4+0x30
args (0xa0000001009f7000, 0xa0000001001017e0, 0x288)
r31 (spinlock address) 0xa0000001009f7000 rcu_bh_state
0xa00000010010e100 _metered_spin_lock+0x40
0xa0000001001017e0 __rcu_process_callbacks+0x260
args (0xa0000001009eef00, 0xa0000001009f7000, 0xe000003004a07f70, 0xa0000001000debe0, 0x50d)
0xa0000001000debe0 tasklet_action+0x1c0
args (0xe000003004a07fe8, 0x0, 0xffffffffffff5e10, 0xffffffffffff0020, 0xffffffffffff0028)
0xa0000001000de290 __do_softirq+0x250
args (0xa000000100977bc0, 0x0, 0x1, 0xa000000100d8e2d0, 0x20000001)
0xa0000001000de360 do_softirq+0x80
args (0xef, 0xa000000100977bc0, 0xa000000100d8e2d0, 0xa00000010098b780, 0xa0000001009b5958)
0xa000000100019030 ia64_handle_irq+0x190
args (0xf, 0xa000000100977bc0, 0x0, 0x0, 0xfd) 0xa000000100012300 ia64_leave_kernel
args (0xf, 0xa000000100977bc0)
0xa000000100019840 ia64_save_scratch_fpregs+0x20
args (0xa000000100977dd0)
0xa00000010001ad20 default_idle+0xc0
args (0x1, 0xa000000100ba5fe4, 0xa000000100977db0, 0xa000000100977e28, 0xa000000100977e20)
0xa00000010001afe0 cpu_idle+0x1e0
args (0xa000000100d8e2d0, 0xa0000001009024e0, 0xa0000001009024e8, 0xa0000001009059b0, 0xa0000001009059b8)
0xa0000001000095c0 rest_init+0xa0
args (0xa0000001009190b0, 0x590)
0xa0000001009190b0 start_kernel+0x510
args (0x307bd8ac08, 0xd13, 0x3004a3e240, 0x300485da80, 0x307bd31b30)
0xa000000100008590 _start+0x270
args (0x307bd8ac08, 0x307bd5b398, 0x3004a3e270, 0x307bd1f008, 0x30045b2e60)
If I'm reading this right, this corresponds to (kernel/rcupdate.c):
507 if (!next_pending) {
508 /* and start it/schedule start if it's a new batch */
509 spin_lock(&rsp->lock); <---- here
510 rcu_start_batch(rcp, rsp, 1);
511 spin_unlock(&rsp->lock);
512 }
Whenever this happens, there is another thread waiting to sync buffers
(sometimes with several backed up behind him). It was kjournald in this
case. Have you seen this ever with these patches?
Regards,
Greg Edwards
Greg Edwards wrote:
>Whenever this happens, there is another thread waiting to sync buffers
>(sometimes with several backed up behind him). It was kjournald in this
>case. Have you seen this ever with these patches?
>
>
>
No - as I wrote, I tested only with qemu, without networking. Thus the
bh queue was never used. Just booting a complete system with
uniprocessor, spinlock debugging enabled immediately showed a bug in
line 315: It unlocked rcu_state.lock instead of rsp->lock :-(
pseudo-patch:
out_unlock_all:
- spin_unlock(&rcu_state.lock);
+ spin_unlock(&rsp->lock);
}
--
Manfred
Greg Edwards wrote:
>Manfred,
>
>Lockstat output attached from 2.6.9-rc2 at 512p and 2.6.9-rc2 + your two
>remaining RCU patches at 512p. kernbench was run without any arguments.
>
>The difference for RCU looks great.
>
>
>
Which value did you use for RCU_GROUP_SIZE? I'm not sure what's the
optimal value for 512p - probably 32 or so.
--
Manfred
On Tue, Sep 14, 2004 at 08:16:24PM +0200, Manfred Spraul wrote:
| Greg Edwards wrote:
|
| >Manfred,
| >
| >Lockstat output attached from 2.6.9-rc2 at 512p and 2.6.9-rc2 + your two
| >remaining RCU patches at 512p. kernbench was run without any arguments.
| >
| >The difference for RCU looks great.
| >
| >
| >
| Which value did you use for RCU_GROUP_SIZE? I'm not sure what's the
| optimal value for 512p - probably 32 or so.
I used what was defined in the patch
+#define RCU_GROUP_SIZE 2
I can try running with a couple different values and see how it looks.
Greg
Greg Edwards wrote:
>| >
>| Which value did you use for RCU_GROUP_SIZE? I'm not sure what's the
>| optimal value for 512p - probably 32 or so.
>
>I used what was defined in the patch
>
>+#define RCU_GROUP_SIZE 2
>
>I can try running with a couple different values and see how it looks.
>
>
>
Ok, that explains the lockstat: I've missed data on rcu_group_state.lock.
I'd just use 32. The code uses a two-level bitmap and the group size
specifies the size of the lower level.
Thus you use right now a 2/256 split. Therefore you still trash the
spinlock that protects the upper level. Group size 32 would generate a
32/16 split.
--
Manfred
Manfred,
Lockstat output attached from 2.6.9-rc2 at 512p and 2.6.9-rc2 + your two
remaining RCU patches at 512p. kernbench was run without any arguments.
The difference for RCU looks great.
Greg