Hi Andrew,
The kernel enters the xmon state while running the file system
stress on nfs v4 mounted partition.
0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c0000000dbd4f820]
pc: c000000000065be4: .__wake_up_common+0x44/0xe8
lr: c000000000069768: .__wake_up+0x54/0x88
sp: c0000000dbd4faa0
msr: 8000000000001032
dar: 0
dsisr: 40010000
current = 0xc0000000dfb6f680
paca = 0xc000000000574580
pid = 1865, comm = rpciod/0
0:mon> t
[c0000000dbd4fb50] c000000000069768 .__wake_up+0x54/0x88
[c0000000dbd4fc00] d00000000086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
[c0000000dbd4fc80] d000000000872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
[c0000000dbd4fd10] d000000000598510 .rpc_release_calldata+0x50/0x74 [sunrpc]
[c0000000dbd4fda0] c00000000008d960 .run_workqueue+0x10c/0x1f4
[c0000000dbd4fe50] c00000000008ec70 .worker_thread+0x118/0x138
[c0000000dbd4ff00] c0000000000939f4 .kthread+0x78/0xc4
[c0000000dbd4ff90] c00000000002b060 .kernel_thread+0x4c/0x68
0:mon> r
R00 = c000000000069768 R16 = 4000000001c00000
R01 = c0000000dbd4faa0 R17 = c0000000004410c0
R02 = c0000000006752d8 R18 = 0000000000000000
R03 = c0000000ace4ffc0 R19 = 000000000019c000
R04 = 0000000000000003 R20 = c00000000050af08
R05 = 0000000000000001 R21 = 000000000210af08
R06 = 0000000000000000 R22 = 000000000210b178
R07 = 0000000000000000 R23 = c00000000050b178
R08 = 0000000000000000 R24 = 0000000000000003
R09 = 0000000000000000 R25 = 0000000000000000
R10 = 0000000000000001 R26 = 0000000000000000
R11 = ffffffffffffffe8 R27 = c0000000ace4ffc0
R12 = 0000000000004000 R28 = 0000000000000001
R13 = c000000000574580 R29 = 0000000000000003
R14 = 0000000000000000 R30 = c00000000061bad8
R15 = c000000000442888 R31 = d0000000008baa50
pc = c000000000065be4 .__wake_up_common+0x44/0xe8
lr = c000000000069768 .__wake_up+0x54/0x88
msr = 8000000000001032 cr = 24000022
ctr = 80000000001af404 xer = 0000000000000002 trap = 300
dar = 0000000000000000 dsisr = 40010000
0:mon>
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
On Nov 16, 2007 3:15 PM, Kamalesh Babulal <[email protected]> wrote:
> Hi Andrew,
>
> The kernel enters the xmon state while running the file system
> stress on nfs v4 mounted partition.
[snip]
> 0:mon> t
> [c0000000dbd4fb50] c000000000069768 .__wake_up+0x54/0x88
> [c0000000dbd4fc00] d00000000086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
> [c0000000dbd4fc80] d000000000872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
> [c0000000dbd4fd10] d000000000598510 .rpc_release_calldata+0x50/0x74 [sunrpc]
> [c0000000dbd4fda0] c00000000008d960 .run_workqueue+0x10c/0x1f4
> [c0000000dbd4fe50] c00000000008ec70 .worker_thread+0x118/0x138
> [c0000000dbd4ff00] c0000000000939f4 .kthread+0x78/0xc4
> [c0000000dbd4ff90] c00000000002b060 .kernel_thread+0x4c/0x68
Definitely not a ppc problem.
Got nearly the same backtrace on 64bit x86:
[ 966.712167] BUG: soft lockup - CPU#3 stuck for 11s! [rpciod/3:605]
[ 966.718522] CPU 3:
[ 966.720589] Modules linked in: radeon drm nfsd exportfs ipv6
w83792d tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
v4l1_compat hid sg i2c_nforce2 pata_amd
[ 966.748306] Pid: 605, comm: rpciod/3 Not tainted 2.6.24-rc2-mm1 #4
[ 966.754653] RIP: 0010:[<ffffffff805b0542>] [<ffffffff805b0542>]
_spin_lock_irqsave+0x12/0x30
[ 966.763424] RSP: 0018:ffff81007ef33e28 EFLAGS: 00000286
[ 966.768879] RAX: 0000000000000286 RBX: ffff81007ef33e60 RCX: 0000000000000000
[ 966.776204] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff81011e107960
[ 966.783511] RBP: ffff81011cc6c588 R08: ffff8100db918130 R09: ffff81011cc6c540
[ 966.790837] R10: 0000000000000000 R11: ffffffff80266390 R12: ffff8100d2d693a8
[ 966.798170] R13: ffff81011cc6c588 R14: ffff8100d2d693a8 R15: ffffffff80302726
[ 966.805505] FS: 00007f9e739d96f0(0000) GS:ffff81011ff12700(0000)
knlGS:0000000000000000
[ 966.813805] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 966.819703] CR2: 0000000001b691d0 CR3: 0000000069861000 CR4: 00000000000006e0
[ 966.827039] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 966.834362] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 966.841687]
[ 966.841687] Call Trace:
[ 966.845728] [<ffffffff8022cf4d>] __wake_up+0x2d/0x70
[ 966.850900] [<ffffffff802f5e6e>] nfs_free_unlinkdata+0x1e/0x50
[ 966.857004] [<ffffffff80593f66>] rpc_release_calldata+0x26/0x50
[ 966.863161] [<ffffffff80594930>] rpc_async_schedule+0x0/0x10
[ 966.869078] [<ffffffff80245cec>] run_workqueue+0xcc/0x170
[ 966.874705] [<ffffffff802467a0>] worker_thread+0x0/0xb0
[ 966.880163] [<ffffffff802467a0>] worker_thread+0x0/0xb0
[ 966.885610] [<ffffffff8024680d>] worker_thread+0x6d/0xb0
[ 966.891148] [<ffffffff8024a140>] autoremove_wake_function+0x0/0x30
[ 966.897606] [<ffffffff802467a0>] worker_thread+0x0/0xb0
[ 966.903045] [<ffffffff802467a0>] worker_thread+0x0/0xb0
[ 966.908485] [<ffffffff80249d5b>] kthread+0x4b/0x80
[ 966.913484] [<ffffffff8020ca28>] child_rip+0xa/0x12
[ 966.918579] [<ffffffff80249d10>] kthread+0x0/0x80
[ 966.923498] [<ffffffff8020ca1e>] child_rip+0x0/0x12
[ 966.928584]
Sadly lockdep does not work for me, as it gets turned off early:
[ 39.851594] ---------------------------------
[ 39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[ 39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 39.866963] (&n->list_lock){-+..}, at: [<ffffffff802935c1>]
add_partial+0x31/0xa0
[ 39.874712] {softirq-on-W} state was registered at:
[ 39.879788] [<ffffffff80259fb8>] __lock_acquire+0x3e8/0x1140
[ 39.885763] [<ffffffff80259838>] debug_check_no_locks_freed+0x188/0x1a0
[ 39.892682] [<ffffffff8025ad65>] lock_acquire+0x55/0x70
[ 39.898840] [<ffffffff802935c1>] add_partial+0x31/0xa0
[ 39.904288] [<ffffffff805c76de>] _spin_lock+0x1e/0x30
[ 39.909650] [<ffffffff802935c1>] add_partial+0x31/0xa0
[ 39.915097] [<ffffffff80296f9c>] kmem_cache_open+0x1cc/0x330
[ 39.921066] [<ffffffff805c7984>] _spin_unlock_irq+0x24/0x30
[ 39.926946] [<ffffffff802974f4>] create_kmalloc_cache+0x64/0xf0
[ 39.933172] [<ffffffff80295640>] init_alloc_cpu_cpu+0x70/0x90
[ 39.939226] [<ffffffff8080ada5>] kmem_cache_init+0x65/0x1d0
[ 39.945289] [<ffffffff807f1b4e>] start_kernel+0x23e/0x350
[ 39.950996] [<ffffffff807f112d>] _sinittext+0x12d/0x140
[ 39.956529] [<ffffffffffffffff>] 0xffffffffffffffff
[ 39.961720] irq event stamp: 1207
[ 39.965048] hardirqs last enabled at (1206): [<ffffffff80259838>]
debug_check_no_locks_freed+0x188/0x1a0
[ 39.974701] hardirqs last disabled at (1207): [<ffffffff802952eb>]
__slab_free+0x3b/0x190
[ 39.982968] softirqs last enabled at (570): [<ffffffff8020cf0c>]
call_softirq+0x1c/0x30
[ 39.991148] softirqs last disabled at (1197): [<ffffffff8020cf0c>]
call_softirq+0x1c/0x30
[ 39.999415]
[ 39.999416] other info that might help us debug this:
[ 40.005990] no locks held by swapper/0.
[ 40.010018]
[ 40.010018] stack backtrace:
[ 40.014429]
[ 40.014429] Call Trace:
[ 40.018407] <IRQ> [<ffffffff8025847c>] print_usage_bug+0x18c/0x1a0
[ 40.024817] [<ffffffff802593ec>] mark_lock+0x64c/0x660
[ 40.030057] [<ffffffff80259f6e>] __lock_acquire+0x39e/0x1140
[ 40.035818] [<ffffffff80257717>] save_trace+0x37/0xa0
[ 40.040972] [<ffffffff802492cd>] __rcu_process_callbacks+0x8d/0x250
[ 40.047335] [<ffffffff8025ad65>] lock_acquire+0x55/0x70
[ 40.052663] [<ffffffff802935c1>] add_partial+0x31/0xa0
[ 40.057905] [<ffffffff802595d3>] trace_hardirqs_on+0x83/0x160
[ 40.063750] [<ffffffff805c76de>] _spin_lock+0x1e/0x30
[ 40.068905] [<ffffffff802935c1>] add_partial+0x31/0xa0
[ 40.074311] [<ffffffff802953b0>] __slab_free+0x100/0x190
[ 40.079724] [<ffffffff802492cd>] __rcu_process_callbacks+0x8d/0x250
[ 40.086088] [<ffffffff8023b79c>] tasklet_action+0x2c/0xc0
[ 40.091588] [<ffffffff802494b3>] rcu_process_callbacks+0x23/0x50
[ 40.097694] [<ffffffff8023b7ba>] tasklet_action+0x4a/0xc0
[ 40.103194] [<ffffffff8023b67a>] __do_softirq+0x7a/0x100
[ 40.108607] [<ffffffff8020cf0c>] call_softirq+0x1c/0x30
[ 40.113935] [<ffffffff8020f125>] do_softirq+0x55/0xb0
[ 40.119089] [<ffffffff8023b5f7>] irq_exit+0x97/0xa0
[ 40.124073] [<ffffffff8021bf2c>] smp_apic_timer_interrupt+0x7c/0xc0
[ 40.130434] [<ffffffff8020ac70>] default_idle+0x0/0x60
[ 40.135840] [<ffffffff8020ac70>] default_idle+0x0/0x60
[ 40.141080] [<ffffffff8020c9bb>] apic_timer_interrupt+0x6b/0x70
[ 40.147100] <EOI> [<ffffffff8020aca7>] default_idle+0x37/0x60
[ 40.153066] [<ffffffff8020aca5>] default_idle+0x35/0x60
[ 40.158393] [<ffffffff8020ad2f>] cpu_idle+0x5f/0x90
[ 40.163374]
[ 40.164888] INFO: lockdep is turned off.
Don't know who to bug about that.
Torsten
On Sat, 17 Nov 2007 18:53:45 +0100 "Torsten Kaiser" <[email protected]> wrote:
> On Nov 16, 2007 3:15 PM, Kamalesh Babulal <[email protected]> wrote:
> > Hi Andrew,
> >
> > The kernel enters the xmon state while running the file system
> > stress on nfs v4 mounted partition.
> [snip]
> > 0:mon> t
> > [c0000000dbd4fb50] c000000000069768 .__wake_up+0x54/0x88
> > [c0000000dbd4fc00] d00000000086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
> > [c0000000dbd4fc80] d000000000872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
> > [c0000000dbd4fd10] d000000000598510 .rpc_release_calldata+0x50/0x74 [sunrpc]
> > [c0000000dbd4fda0] c00000000008d960 .run_workqueue+0x10c/0x1f4
> > [c0000000dbd4fe50] c00000000008ec70 .worker_thread+0x118/0x138
> > [c0000000dbd4ff00] c0000000000939f4 .kthread+0x78/0xc4
> > [c0000000dbd4ff90] c00000000002b060 .kernel_thread+0x4c/0x68
>
> Definitely not a ppc problem.
> Got nearly the same backtrace on 64bit x86:
> [ 966.712167] BUG: soft lockup - CPU#3 stuck for 11s! [rpciod/3:605]
> [ 966.718522] CPU 3:
> [ 966.720589] Modules linked in: radeon drm nfsd exportfs ipv6
> w83792d tuner tea5767 tda8290 tuner_xc2028 tda9887 tuner_simple mt20xx
> tea5761 tvaudio msp3400 bttv ir_common compat_ioctl32 videobuf_dma_sg
> videobuf_core btcx_risc tveeprom videodev usbhid v4l2_common
> v4l1_compat hid sg i2c_nforce2 pata_amd
> [ 966.748306] Pid: 605, comm: rpciod/3 Not tainted 2.6.24-rc2-mm1 #4
> [ 966.754653] RIP: 0010:[<ffffffff805b0542>] [<ffffffff805b0542>]
> _spin_lock_irqsave+0x12/0x30
> [ 966.763424] RSP: 0018:ffff81007ef33e28 EFLAGS: 00000286
> [ 966.768879] RAX: 0000000000000286 RBX: ffff81007ef33e60 RCX: 0000000000000000
> [ 966.776204] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff81011e107960
> [ 966.783511] RBP: ffff81011cc6c588 R08: ffff8100db918130 R09: ffff81011cc6c540
> [ 966.790837] R10: 0000000000000000 R11: ffffffff80266390 R12: ffff8100d2d693a8
> [ 966.798170] R13: ffff81011cc6c588 R14: ffff8100d2d693a8 R15: ffffffff80302726
> [ 966.805505] FS: 00007f9e739d96f0(0000) GS:ffff81011ff12700(0000)
> knlGS:0000000000000000
> [ 966.813805] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 966.819703] CR2: 0000000001b691d0 CR3: 0000000069861000 CR4: 00000000000006e0
> [ 966.827039] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 966.834362] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 966.841687]
> [ 966.841687] Call Trace:
> [ 966.845728] [<ffffffff8022cf4d>] __wake_up+0x2d/0x70
> [ 966.850900] [<ffffffff802f5e6e>] nfs_free_unlinkdata+0x1e/0x50
> [ 966.857004] [<ffffffff80593f66>] rpc_release_calldata+0x26/0x50
> [ 966.863161] [<ffffffff80594930>] rpc_async_schedule+0x0/0x10
> [ 966.869078] [<ffffffff80245cec>] run_workqueue+0xcc/0x170
> [ 966.874705] [<ffffffff802467a0>] worker_thread+0x0/0xb0
> [ 966.880163] [<ffffffff802467a0>] worker_thread+0x0/0xb0
> [ 966.885610] [<ffffffff8024680d>] worker_thread+0x6d/0xb0
> [ 966.891148] [<ffffffff8024a140>] autoremove_wake_function+0x0/0x30
> [ 966.897606] [<ffffffff802467a0>] worker_thread+0x0/0xb0
> [ 966.903045] [<ffffffff802467a0>] worker_thread+0x0/0xb0
> [ 966.908485] [<ffffffff80249d5b>] kthread+0x4b/0x80
> [ 966.913484] [<ffffffff8020ca28>] child_rip+0xa/0x12
> [ 966.918579] [<ffffffff80249d10>] kthread+0x0/0x80
> [ 966.923498] [<ffffffff8020ca1e>] child_rip+0x0/0x12
> [ 966.928584]
I don't know what'a causing that. I spose I should set up nfs4.
> Sadly lockdep does not work for me, as it gets turned off early:
> [ 39.851594] ---------------------------------
> [ 39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> [ 39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> [ 39.866963] (&n->list_lock){-+..}, at: [<ffffffff802935c1>]
> add_partial+0x31/0xa0
> [ 39.874712] {softirq-on-W} state was registered at:
> [ 39.879788] [<ffffffff80259fb8>] __lock_acquire+0x3e8/0x1140
> [ 39.885763] [<ffffffff80259838>] debug_check_no_locks_freed+0x188/0x1a0
> [ 39.892682] [<ffffffff8025ad65>] lock_acquire+0x55/0x70
> [ 39.898840] [<ffffffff802935c1>] add_partial+0x31/0xa0
> [ 39.904288] [<ffffffff805c76de>] _spin_lock+0x1e/0x30
> [ 39.909650] [<ffffffff802935c1>] add_partial+0x31/0xa0
> [ 39.915097] [<ffffffff80296f9c>] kmem_cache_open+0x1cc/0x330
> [ 39.921066] [<ffffffff805c7984>] _spin_unlock_irq+0x24/0x30
> [ 39.926946] [<ffffffff802974f4>] create_kmalloc_cache+0x64/0xf0
> [ 39.933172] [<ffffffff80295640>] init_alloc_cpu_cpu+0x70/0x90
> [ 39.939226] [<ffffffff8080ada5>] kmem_cache_init+0x65/0x1d0
> [ 39.945289] [<ffffffff807f1b4e>] start_kernel+0x23e/0x350
> [ 39.950996] [<ffffffff807f112d>] _sinittext+0x12d/0x140
> [ 39.956529] [<ffffffffffffffff>] 0xffffffffffffffff
> [ 39.961720] irq event stamp: 1207
> [ 39.965048] hardirqs last enabled at (1206): [<ffffffff80259838>]
> debug_check_no_locks_freed+0x188/0x1a0
> [ 39.974701] hardirqs last disabled at (1207): [<ffffffff802952eb>]
> __slab_free+0x3b/0x190
> [ 39.982968] softirqs last enabled at (570): [<ffffffff8020cf0c>]
> call_softirq+0x1c/0x30
> [ 39.991148] softirqs last disabled at (1197): [<ffffffff8020cf0c>]
> call_softirq+0x1c/0x30
> [ 39.999415]
> [ 39.999416] other info that might help us debug this:
> [ 40.005990] no locks held by swapper/0.
> [ 40.010018]
> [ 40.010018] stack backtrace:
> [ 40.014429]
> [ 40.014429] Call Trace:
> [ 40.018407] <IRQ> [<ffffffff8025847c>] print_usage_bug+0x18c/0x1a0
> [ 40.024817] [<ffffffff802593ec>] mark_lock+0x64c/0x660
> [ 40.030057] [<ffffffff80259f6e>] __lock_acquire+0x39e/0x1140
> [ 40.035818] [<ffffffff80257717>] save_trace+0x37/0xa0
> [ 40.040972] [<ffffffff802492cd>] __rcu_process_callbacks+0x8d/0x250
> [ 40.047335] [<ffffffff8025ad65>] lock_acquire+0x55/0x70
> [ 40.052663] [<ffffffff802935c1>] add_partial+0x31/0xa0
> [ 40.057905] [<ffffffff802595d3>] trace_hardirqs_on+0x83/0x160
> [ 40.063750] [<ffffffff805c76de>] _spin_lock+0x1e/0x30
> [ 40.068905] [<ffffffff802935c1>] add_partial+0x31/0xa0
> [ 40.074311] [<ffffffff802953b0>] __slab_free+0x100/0x190
> [ 40.079724] [<ffffffff802492cd>] __rcu_process_callbacks+0x8d/0x250
> [ 40.086088] [<ffffffff8023b79c>] tasklet_action+0x2c/0xc0
> [ 40.091588] [<ffffffff802494b3>] rcu_process_callbacks+0x23/0x50
> [ 40.097694] [<ffffffff8023b7ba>] tasklet_action+0x4a/0xc0
> [ 40.103194] [<ffffffff8023b67a>] __do_softirq+0x7a/0x100
> [ 40.108607] [<ffffffff8020cf0c>] call_softirq+0x1c/0x30
> [ 40.113935] [<ffffffff8020f125>] do_softirq+0x55/0xb0
> [ 40.119089] [<ffffffff8023b5f7>] irq_exit+0x97/0xa0
> [ 40.124073] [<ffffffff8021bf2c>] smp_apic_timer_interrupt+0x7c/0xc0
> [ 40.130434] [<ffffffff8020ac70>] default_idle+0x0/0x60
> [ 40.135840] [<ffffffff8020ac70>] default_idle+0x0/0x60
> [ 40.141080] [<ffffffff8020c9bb>] apic_timer_interrupt+0x6b/0x70
> [ 40.147100] <EOI> [<ffffffff8020aca7>] default_idle+0x37/0x60
> [ 40.153066] [<ffffffff8020aca5>] default_idle+0x35/0x60
> [ 40.158393] [<ffffffff8020ad2f>] cpu_idle+0x5f/0x90
> [ 40.163374]
> [ 40.164888] INFO: lockdep is turned off.
>
> Don't know who to bug about that.
>
That's slub. It appears that list_lock is being taken from process context
in one place and from softirq in another.
* Torsten Kaiser <[email protected]> wrote:
> Sadly lockdep does not work for me, as it gets turned off early:
> [ 39.851594] ---------------------------------
> [ 39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> [ 39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> [ 39.866963] (&n->list_lock){-+..}, at: [<ffffffff802935c1>]
hey, that means it found a bug - which is not sad at all :-)
Ingo
On Sat, 17 Nov 2007 19:09:46 +0100 Ingo Molnar <[email protected]> wrote:
>
> * Torsten Kaiser <[email protected]> wrote:
>
> > Sadly lockdep does not work for me, as it gets turned off early:
> > [ 39.851594] ---------------------------------
> > [ 39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> > [ 39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> > [ 39.866963] (&n->list_lock){-+..}, at: [<ffffffff802935c1>]
>
> hey, that means it found a bug - which is not sad at all :-)
>
mutter.
Torsten, you could try CONFIG_SLAB=y, CONFIG_SLUB=n to see if you can make
some progress on the NFS problem.
On Sat, 2007-11-17 at 18:53 +0100, Torsten Kaiser wrote:
> On Nov 16, 2007 3:15 PM, Kamalesh Babulal <[email protected]> wrote:
> > Hi Andrew,
> >
> > The kernel enters the xmon state while running the file system
> > stress on nfs v4 mounted partition.
> [snip]
> > 0:mon> t
> > [c0000000dbd4fb50] c000000000069768 .__wake_up+0x54/0x88
> > [c0000000dbd4fc00] d00000000086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
> > [c0000000dbd4fc80] d000000000872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
> > [c0000000dbd4fd10] d000000000598510 .rpc_release_calldata+0x50/0x74 [sunrpc]
> > [c0000000dbd4fda0] c00000000008d960 .run_workqueue+0x10c/0x1f4
> > [c0000000dbd4fe50] c00000000008ec70 .worker_thread+0x118/0x138
> > [c0000000dbd4ff00] c0000000000939f4 .kthread+0x78/0xc4
> > [c0000000dbd4ff90] c00000000002b060 .kernel_thread+0x4c/0x68
Could you try with the attached patch.
Cheers
Trond
On Nov 17, 2007 7:58 PM, Trond Myklebust <[email protected]> wrote:
>
> On Sat, 2007-11-17 at 18:53 +0100, Torsten Kaiser wrote:
> > On Nov 16, 2007 3:15 PM, Kamalesh Babulal <[email protected]> wrote:
> > > Hi Andrew,
> > >
> > > The kernel enters the xmon state while running the file system
> > > stress on nfs v4 mounted partition.
> > [snip]
> > > 0:mon> t
> > > [c0000000dbd4fb50] c000000000069768 .__wake_up+0x54/0x88
> > > [c0000000dbd4fc00] d00000000086b890 .nfs_sb_deactive+0x44/0x58 [nfs]
> > > [c0000000dbd4fc80] d000000000872658 .nfs_free_unlinkdata+0x2c/0x74 [nfs]
> > > [c0000000dbd4fd10] d000000000598510 .rpc_release_calldata+0x50/0x74 [sunrpc]
> > > [c0000000dbd4fda0] c00000000008d960 .run_workqueue+0x10c/0x1f4
> > > [c0000000dbd4fe50] c00000000008ec70 .worker_thread+0x118/0x138
> > > [c0000000dbd4ff00] c0000000000939f4 .kthread+0x78/0xc4
> > > [c0000000dbd4ff90] c00000000002b060 .kernel_thread+0x4c/0x68
>
> Could you try with the attached patch.
[snip]
> Fix is to move the call to nfs_sb_deactive() into
> nfs_async_unlink_release().
I realley doubt that will fix it.
My stacktrace was like:
run_workqueue
called: rpc_async_schedule
that called: rpc_release_calldata
which points to: nfs_async_unlink_release
that called: nfs_free_unlinkdata
So it does not matter for me if nfs_sb_deactive is called one step earlier.
Currently building with SLAB instead SLUB to see if lockdep tells something...
Torsten
On Sat, 17 Nov 2007, Andrew Morton wrote:
> > Don't know who to bug about that.
>
> That's slub. It appears that list_lock is being taken from process context
> in one place and from softirq in another.
I kicked out some weird interrupt disable code in mm that was only run during
NUMA bootstrap.
This should fix it but isnt there some mechanism to convince lockdep that
it is okay to do these things during bootstrap?
---
mm/slub.c | 2 ++
1 file changed, 2 insertions(+)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2007-11-17 11:31:21.044136631 -0800
+++ linux-2.6/mm/slub.c 2007-11-17 11:32:17.364386560 -0800
@@ -2044,7 +2044,9 @@ static struct kmem_cache_node *early_kme
#endif
init_kmem_cache_node(n);
atomic_long_inc(&n->nr_slabs);
+ local_irq_disable();
add_partial(kmalloc_caches, page, 0);
+ local_irq_enable();
return n;
}
On Nov 17, 2007 7:19 PM, Andrew Morton <[email protected]> wrote:
>
> On Sat, 17 Nov 2007 19:09:46 +0100 Ingo Molnar <[email protected]> wrote:
>
> >
> > * Torsten Kaiser <[email protected]> wrote:
> >
> > > Sadly lockdep does not work for me, as it gets turned off early:
> > > [ 39.851594] ---------------------------------
> > > [ 39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> > > [ 39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> > > [ 39.866963] (&n->list_lock){-+..}, at: [<ffffffff802935c1>]
> >
> > hey, that means it found a bug - which is not sad at all :-)
It was sad, that it found a bug that I was not searching for. ;)
> mutter.
>
> Torsten, you could try CONFIG_SLAB=y, CONFIG_SLUB=n to see if you can make
> some progress on the NFS problem.
I should had thought of that myself... OK anyway here is the result:
The hang is reproducable, emerge froze the system again after download
the source.
Lockdep triggers immedetly before the freeze, but the result is still
not helpful:
[ 221.565011] INFO: trying to register non-static key.
[ 221.566999] the code is fine but needs lockdep annotation.
[ 221.569206] turning off the locking correctness validator.
[ 221.571404]
[ 221.571405] Call Trace:
[ 221.572996] [<ffffffff8025a1b4>] __lock_acquire+0x4c4/0x1140
[ 221.575298] [<ffffffff8025ae85>] lock_acquire+0x55/0x70
[ 221.577429] [<ffffffff8022d6fd>] __wake_up+0x2d/0x70
[ 221.579457] [<ffffffff805c5f04>] _spin_lock_irqsave+0x34/0x50
[ 221.581800] [<ffffffff805c5e45>] _spin_unlock_irqrestore+0x55/0x70
[ 221.584317] [<ffffffff8022d6fd>] __wake_up+0x2d/0x70
[ 221.586344] [<ffffffff805a88b0>] rpc_async_schedule+0x0/0x10
[ 221.588648] [<ffffffff802fface>] nfs_free_unlinkdata+0x1e/0x50
[ 221.591023] [<ffffffff805a7e96>] rpc_release_calldata+0x26/0x50
[ 221.593428] [<ffffffff8024778f>] run_workqueue+0x16f/0x210
[ 221.595662] [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
[ 221.598004] [<ffffffff802483d0>] worker_thread+0x0/0xb0
[ 221.600130] [<ffffffff802483d0>] worker_thread+0x0/0xb0
[ 221.602265] [<ffffffff8024843d>] worker_thread+0x6d/0xb0
[ 221.604431] [<ffffffff8024bfc0>] autoremove_wake_function+0x0/0x30
[ 221.606939] [<ffffffff802483d0>] worker_thread+0x0/0xb0
[ 221.609067] [<ffffffff802483d0>] worker_thread+0x0/0xb0
[ 221.611199] [<ffffffff8024bbeb>] kthread+0x4b/0x80
[ 221.613156] [<ffffffff8020cb98>] child_rip+0xa/0x12
[ 221.615151] [<ffffffff8020c2af>] restore_args+0x0/0x30
[ 221.617247] [<ffffffff8024bba0>] kthread+0x0/0x80
[ 221.619162] [<ffffffff8020cb8e>] child_rip+0x0/0x12
[ 221.621147]
[ 221.621749] INFO: lockdep is turned off.
[ 226.369259] SysRq : Emergency Sync
[ 226.331342] Emergency Sync complete
[ 227.064545] SysRq : Emergency Remount R/O
[ 228.193491] SysRq : Emergency Sync
[ 228.155593] Emergency Sync complete
[ 228.767931] SysRq : Resetting
I also had another BUG output during system startup, but that should
be unrelated:
[ 103.254681] BUG: sleeping function called from invalid context at
kernel/rwsem.c:20
[ 103.257757] in_atomic():0, irqs_disabled():1
[ 103.259469] 1 lock held by artsd/5883:
[ 103.259470] #0: (pm_qos_lock){....}, at: [<ffffffff80250efb>]
pm_qos_add_requirement+0x6b/0xf0
[ 103.263316] irq event stamp: 49712
[ 103.263318] hardirqs last enabled at (49711): [<ffffffff802941ed>]
__kmalloc+0x10d/0x180
[ 103.263321] hardirqs last disabled at (49712): [<ffffffff805c5eea>]
_spin_lock_irqsave+0x1a/0x50
[ 103.263326] softirqs last enabled at (48820): [<ffffffff805954d9>]
unix_release_sock+0x79/0x240
[ 103.263330] softirqs last disabled at (48818): [<ffffffff805c5b89>]
_write_lock_bh+0x9/0x30
[ 103.263333]
[ 103.263333] Call Trace:
[ 103.263335] [<ffffffff8024fc25>] down_read+0x15/0x40
[ 103.263338] [<ffffffff802507e6>] __blocking_notifier_call_chain+0x46/0x90
[ 103.263341] [<ffffffff80250f23>] pm_qos_add_requirement+0x93/0xf0
[ 103.263344] [<ffffffff804fdc4a>] snd_pcm_hw_params+0x2fa/0x380
[ 103.263347] [<ffffffff804fe93c>] snd_pcm_common_ioctl1+0xb4c/0xdc0
[ 103.263350] [<ffffffff8027b167>] __do_fault+0x227/0x470
[ 103.263353] [<ffffffff8025a435>] __lock_acquire+0x745/0x1140
[ 103.263357] [<ffffffff805c5e45>] _spin_unlock_irqrestore+0x55/0x70
[ 103.263359] [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
[ 103.263362] [<ffffffff804fee88>] snd_pcm_playback_ioctl1+0x48/0x240
[ 103.263365] [<ffffffff804ffa36>] snd_pcm_playback_ioctl+0x36/0x50
[ 103.263367] [<ffffffff802a80bf>] vfs_ioctl+0x2f/0xa0
[ 103.263369] [<ffffffff802a8390>] do_vfs_ioctl+0x260/0x2e0
[ 103.263371] [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
[ 103.263373] [<ffffffff802a84a1>] sys_ioctl+0x91/0xb0
[ 103.263376] [<ffffffff8020bc5e>] system_call+0x7e/0x83
[ 103.263379]
Torsten
On Nov 17, 2007 8:33 PM, Christoph Lameter <[email protected]> wrote:
> On Sat, 17 Nov 2007, Andrew Morton wrote:
>
> > That's slub. It appears that list_lock is being taken from process context
> > in one place and from softirq in another.
>
> I kicked out some weird interrupt disable code in mm that was only run during
> NUMA bootstrap.
I'm using NUMA (Opteron), so this indeed fixes it.
A kernel complied with SLUB now outputs the same message as the SLAB
one, that lockdep annotations are needed at the place where nfs hangs.
> This should fix it but isnt there some mechanism to convince lockdep that
> it is okay to do these things during bootstrap?
>
> ---
> mm/slub.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c 2007-11-17 11:31:21.044136631 -0800
> +++ linux-2.6/mm/slub.c 2007-11-17 11:32:17.364386560 -0800
> @@ -2044,7 +2044,9 @@ static struct kmem_cache_node *early_kme
> #endif
> init_kmem_cache_node(n);
> atomic_long_inc(&n->nr_slabs);
> + local_irq_disable();
> add_partial(kmalloc_caches, page, 0);
> + local_irq_enable();
> return n;
> }
>
>
>
On Sat, Nov 17, 2007 at 07:09:46PM +0100, Ingo Molnar wrote:
>
> * Torsten Kaiser <[email protected]> wrote:
>
> > Sadly lockdep does not work for me, as it gets turned off early:
> > [ 39.851594] ---------------------------------
> > [ 39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> > [ 39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> > [ 39.866963] (&n->list_lock){-+..}, at: [<ffffffff802935c1>]
>
> hey, that means it found a bug - which is not sad at all :-)
---
Subject: lockdep: slub: annotate boot time node->list_lock usage
inconsistent {softirq-on-W} -> {in-softirq-W} usage.
swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
(&n->list_lock){-+..}, at: [<ffffffff802935c1>] add_partial+0x31/0xa0
{softirq-on-W} state was registered at:
[<ffffffff80259fb8>] __lock_acquire+0x3e8/0x1140
[<ffffffff80259838>] debug_check_no_locks_freed+0x188/0x1a0
[<ffffffff8025ad65>] lock_acquire+0x55/0x70
[<ffffffff802935c1>] add_partial+0x31/0xa0
[<ffffffff805c76de>] _spin_lock+0x1e/0x30
[<ffffffff802935c1>] add_partial+0x31/0xa0
[<ffffffff80296f9c>] kmem_cache_open+0x1cc/0x330
[<ffffffff805c7984>] _spin_unlock_irq+0x24/0x30
[<ffffffff802974f4>] create_kmalloc_cache+0x64/0xf0
[<ffffffff80295640>] init_alloc_cpu_cpu+0x70/0x90
[<ffffffff8080ada5>] kmem_cache_init+0x65/0x1d0
[<ffffffff807f1b4e>] start_kernel+0x23e/0x350
[<ffffffff807f112d>] _sinittext+0x12d/0x140
[<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: Peter Zijlstra <[email protected]>
CC: Christoph Lameter <[email protected]>
CC: Kamalesh Babulal <[email protected]>
---
mm/slub.c | 8 ++++++++
1 file changed, 8 insertions(+)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c
+++ linux-2.6/mm/slub.c
@@ -2155,6 +2155,7 @@ static struct kmem_cache_node *early_kme
{
struct page *page;
struct kmem_cache_node *n;
+ unsigned long flags;
BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node));
@@ -2179,7 +2180,14 @@ static struct kmem_cache_node *early_kme
#endif
init_kmem_cache_node(n);
atomic_long_inc(&n->nr_slabs);
+ /*
+ * lockdep requires consistent irq usage for each lock
+ * so even though there cannot be a race this early in
+ * the boot sequence, we still disable irqs.
+ */
+ local_irq_save(flags);
add_partial(kmalloc_caches, page, 0);
+ local_irq_restore(flags);
return n;
}
On Sat, Nov 17, 2007 at 08:40:22PM +0100, Torsten Kaiser wrote:
> Lockdep triggers immedetly before the freeze, but the result is still
> not helpful:
>
> [ 221.565011] INFO: trying to register non-static key.
> [ 221.566999] the code is fine but needs lockdep annotation.
> [ 221.569206] turning off the locking correctness validator.
> [ 221.571404]
> [ 221.571405] Call Trace:
> [ 221.572996] [<ffffffff8025a1b4>] __lock_acquire+0x4c4/0x1140
> [ 221.575298] [<ffffffff8025ae85>] lock_acquire+0x55/0x70
> [ 221.577429] [<ffffffff8022d6fd>] __wake_up+0x2d/0x70
> [ 221.579457] [<ffffffff805c5f04>] _spin_lock_irqsave+0x34/0x50
> [ 221.581800] [<ffffffff805c5e45>] _spin_unlock_irqrestore+0x55/0x70
> [ 221.584317] [<ffffffff8022d6fd>] __wake_up+0x2d/0x70
> [ 221.586344] [<ffffffff805a88b0>] rpc_async_schedule+0x0/0x10
> [ 221.588648] [<ffffffff802fface>] nfs_free_unlinkdata+0x1e/0x50
> [ 221.591023] [<ffffffff805a7e96>] rpc_release_calldata+0x26/0x50
> [ 221.593428] [<ffffffff8024778f>] run_workqueue+0x16f/0x210
> [ 221.595662] [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
> [ 221.598004] [<ffffffff802483d0>] worker_thread+0x0/0xb0
> [ 221.600130] [<ffffffff802483d0>] worker_thread+0x0/0xb0
> [ 221.602265] [<ffffffff8024843d>] worker_thread+0x6d/0xb0
> [ 221.604431] [<ffffffff8024bfc0>] autoremove_wake_function+0x0/0x30
> [ 221.606939] [<ffffffff802483d0>] worker_thread+0x0/0xb0
> [ 221.609067] [<ffffffff802483d0>] worker_thread+0x0/0xb0
> [ 221.611199] [<ffffffff8024bbeb>] kthread+0x4b/0x80
> [ 221.613156] [<ffffffff8020cb98>] child_rip+0xa/0x12
> [ 221.615151] [<ffffffff8020c2af>] restore_args+0x0/0x30
> [ 221.617247] [<ffffffff8024bba0>] kthread+0x0/0x80
> [ 221.619162] [<ffffffff8020cb8e>] child_rip+0x0/0x12
> [ 221.621147]
> [ 221.621749] INFO: lockdep is turned off.
I've been staring at this NFS code for a while an can't make any sense
out of it. It seems to correctly initialize the waitqueue. So this would
indicate corruption of some sort.
> I also had another BUG output during system startup, but that should
> be unrelated:
> [ 103.254681] BUG: sleeping function called from invalid context at
> kernel/rwsem.c:20
> [ 103.257757] in_atomic():0, irqs_disabled():1
> [ 103.259469] 1 lock held by artsd/5883:
> [ 103.259470] #0: (pm_qos_lock){....}, at: [<ffffffff80250efb>]
> pm_qos_add_requirement+0x6b/0xf0
> [ 103.263316] irq event stamp: 49712
> [ 103.263318] hardirqs last enabled at (49711): [<ffffffff802941ed>]
> __kmalloc+0x10d/0x180
> [ 103.263321] hardirqs last disabled at (49712): [<ffffffff805c5eea>]
> _spin_lock_irqsave+0x1a/0x50
> [ 103.263326] softirqs last enabled at (48820): [<ffffffff805954d9>]
> unix_release_sock+0x79/0x240
> [ 103.263330] softirqs last disabled at (48818): [<ffffffff805c5b89>]
> _write_lock_bh+0x9/0x30
> [ 103.263333]
> [ 103.263333] Call Trace:
> [ 103.263335] [<ffffffff8024fc25>] down_read+0x15/0x40
> [ 103.263338] [<ffffffff802507e6>] __blocking_notifier_call_chain+0x46/0x90
> [ 103.263341] [<ffffffff80250f23>] pm_qos_add_requirement+0x93/0xf0
> [ 103.263344] [<ffffffff804fdc4a>] snd_pcm_hw_params+0x2fa/0x380
> [ 103.263347] [<ffffffff804fe93c>] snd_pcm_common_ioctl1+0xb4c/0xdc0
> [ 103.263350] [<ffffffff8027b167>] __do_fault+0x227/0x470
> [ 103.263353] [<ffffffff8025a435>] __lock_acquire+0x745/0x1140
> [ 103.263357] [<ffffffff805c5e45>] _spin_unlock_irqrestore+0x55/0x70
> [ 103.263359] [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
> [ 103.263362] [<ffffffff804fee88>] snd_pcm_playback_ioctl1+0x48/0x240
> [ 103.263365] [<ffffffff804ffa36>] snd_pcm_playback_ioctl+0x36/0x50
> [ 103.263367] [<ffffffff802a80bf>] vfs_ioctl+0x2f/0xa0
> [ 103.263369] [<ffffffff802a8390>] do_vfs_ioctl+0x260/0x2e0
> [ 103.263371] [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
> [ 103.263373] [<ffffffff802a84a1>] sys_ioctl+0x91/0xb0
> [ 103.263376] [<ffffffff8020bc5e>] system_call+0x7e/0x83
> [ 103.263379]
This pm-qos code is fubar, it calls blocking_notifier_call_chain while
holding a spinlock (and that is after 'fixing' it from a
srcu_notifier_call_chain - which is equally wrong).
On Nov 18, 2007 12:05 AM, Peter Zijlstra <[email protected]> wrote:
>
> On Sat, Nov 17, 2007 at 08:40:22PM +0100, Torsten Kaiser wrote:
>
> > Lockdep triggers immedetly before the freeze, but the result is still
> > not helpful:
> >
> > [ 221.565011] INFO: trying to register non-static key.
> > [ 221.566999] the code is fine but needs lockdep annotation.
> > [ 221.569206] turning off the locking correctness validator.
> > [ 221.571404]
> > [ 221.571405] Call Trace:
> > [ 221.572996] [<ffffffff8025a1b4>] __lock_acquire+0x4c4/0x1140
> > [ 221.575298] [<ffffffff8025ae85>] lock_acquire+0x55/0x70
> > [ 221.577429] [<ffffffff8022d6fd>] __wake_up+0x2d/0x70
> > [ 221.579457] [<ffffffff805c5f04>] _spin_lock_irqsave+0x34/0x50
> > [ 221.581800] [<ffffffff805c5e45>] _spin_unlock_irqrestore+0x55/0x70
> > [ 221.584317] [<ffffffff8022d6fd>] __wake_up+0x2d/0x70
> > [ 221.586344] [<ffffffff805a88b0>] rpc_async_schedule+0x0/0x10
> > [ 221.588648] [<ffffffff802fface>] nfs_free_unlinkdata+0x1e/0x50
> > [ 221.591023] [<ffffffff805a7e96>] rpc_release_calldata+0x26/0x50
> > [ 221.593428] [<ffffffff8024778f>] run_workqueue+0x16f/0x210
> > [ 221.595662] [<ffffffff80259731>] trace_hardirqs_on+0xc1/0x160
> > [ 221.598004] [<ffffffff802483d0>] worker_thread+0x0/0xb0
> > [ 221.600130] [<ffffffff802483d0>] worker_thread+0x0/0xb0
> > [ 221.602265] [<ffffffff8024843d>] worker_thread+0x6d/0xb0
> > [ 221.604431] [<ffffffff8024bfc0>] autoremove_wake_function+0x0/0x30
> > [ 221.606939] [<ffffffff802483d0>] worker_thread+0x0/0xb0
> > [ 221.609067] [<ffffffff802483d0>] worker_thread+0x0/0xb0
> > [ 221.611199] [<ffffffff8024bbeb>] kthread+0x4b/0x80
> > [ 221.613156] [<ffffffff8020cb98>] child_rip+0xa/0x12
> > [ 221.615151] [<ffffffff8020c2af>] restore_args+0x0/0x30
> > [ 221.617247] [<ffffffff8024bba0>] kthread+0x0/0x80
> > [ 221.619162] [<ffffffff8020cb8e>] child_rip+0x0/0x12
> > [ 221.621147]
> > [ 221.621749] INFO: lockdep is turned off.
>
> I've been staring at this NFS code for a while an can't make any sense
> out of it. It seems to correctly initialize the waitqueue. So this would
> indicate corruption of some sort.
Not sure if this is helpful, but after looking into the code, the
above stacktrace looks somewhat damaged.
Might be my fault: # CONFIG_FRAME_POINTER is not set
On the other hand the stacktrace from the run with the SLUB lockdep
fix shows the same function names.
That trace contains this line:
[<ffffffff8030167e>] nfs_free_unlinkdata+0x1e/0x50
(gdb) list *0xffffffff8030167e
0xffffffff8030167e is in nfs_free_unlinkdata (fs/nfs/unlink.c:33).
28 */
29 static void
30 nfs_free_unlinkdata(struct nfs_unlinkdata *data)
31 {
32 nfs_sb_deactive(NFS_SERVER(data->dir));
33 iput(data->dir);
34 put_rpccred(data->cred);
35 kfree(data->args.name.name);
36 kfree(data);
37 }
Is some inode lock guilty?
Please ask, if you need more information.
Torsten
On Nov 18, 2007 12:05 AM, Peter Zijlstra <[email protected]> wrote:
> I've been staring at this NFS code for a while an can't make any sense
> out of it. It seems to correctly initialize the waitqueue. So this would
> indicate corruption of some sort.
No, it does not "correctly" initialize the waitqueue. It doesn't even
try to initialize it.
I now found the guilty patch and what is wrong with it.
nfs-stop-sillyname-renames-and-unmounts-from-racing.patch adds:
@@ -110,8 +112,22 @@ struct nfs_server {
filesystem */
#endif
void (*destroy)(struct nfs_server *);
+
+ atomic_t active; /* Keep trace of any activity to this server */
+ wait_queue_head_t active_wq; /* Wait for any activity to stop */
and tries to initialize it:
@@ -593,6 +593,10 @@ static int nfs_init_server(struct nfs_server *server,
server->namelen = data->namlen;
/* Create a client RPC handle for the NFSv3 ACL management interface */
nfs_init_server_aclclient(server);
+
+ init_waitqueue_head(&server->active_wq);
+ atomic_set(&server->active, 0);
+
and then uses it via nfs_sb_active and nfs_sb_deactive:
@@ -29,6 +29,7 @@ struct nfs_unlinkdata {
static void
nfs_free_unlinkdata(struct nfs_unlinkdata *data)
{
+ nfs_sb_deactive(NFS_SERVER(data->dir));
iput(data->dir);
put_rpccred(data->cred);
kfree(data->args.name.name);
@@ -151,6 +152,7 @@ static int nfs_do_call_unlink(struct dentry
*parent, struct inode *dir, struct n
nfs_dec_sillycount(dir);
return 0;
}
+ nfs_sb_active(NFS_SERVER(dir));
data->args.fh = NFS_FH(dir);
nfs_fattr_init(&data->res.dir_attr);
But it does not notice this:
struct dentry_operations nfs_dentry_operations = {
.d_revalidate = nfs_lookup_revalidate,
.d_delete = nfs_dentry_delete,
.d_iput = nfs_dentry_iput,
};
struct dentry_operations nfs4_dentry_operations = {
.d_revalidate = nfs_open_revalidate,
.d_delete = nfs_dentry_delete,
.d_iput = nfs_dentry_iput,
};
NFSv2/3 and NFSv4 share the same dentry_iput and so share the same
unlink and sillyrename logic.
But they do not share nfs_init_server()!
I wonder why this doesn't blow up more violently, but only hangs...
But as I don't know if it is correct to add the workqueue
initialization to nfs4_init_server() or remove the nfs_sb_active /
nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this.
Torsten
On Sun, 2007-11-18 at 19:44 +0100, Torsten Kaiser wrote:
> On Nov 18, 2007 12:05 AM, Peter Zijlstra <[email protected]> wrote:
> > I've been staring at this NFS code for a while an can't make any sense
> > out of it. It seems to correctly initialize the waitqueue. So this would
> > indicate corruption of some sort.
>
> No, it does not "correctly" initialize the waitqueue. It doesn't even
> try to initialize it.
>
> I now found the guilty patch and what is wrong with it.
>
> nfs-stop-sillyname-renames-and-unmounts-from-racing.patch adds:
>
> @@ -110,8 +112,22 @@ struct nfs_server {
> filesystem */
> #endif
> void (*destroy)(struct nfs_server *);
> +
> + atomic_t active; /* Keep trace of any activity to this server */
> + wait_queue_head_t active_wq; /* Wait for any activity to stop */
>
> and tries to initialize it:
> @@ -593,6 +593,10 @@ static int nfs_init_server(struct nfs_server *server,
> server->namelen = data->namlen;
> /* Create a client RPC handle for the NFSv3 ACL management interface */
> nfs_init_server_aclclient(server);
> +
> + init_waitqueue_head(&server->active_wq);
> + atomic_set(&server->active, 0);
> +
>
> and then uses it via nfs_sb_active and nfs_sb_deactive:
>
> @@ -29,6 +29,7 @@ struct nfs_unlinkdata {
> static void
> nfs_free_unlinkdata(struct nfs_unlinkdata *data)
> {
> + nfs_sb_deactive(NFS_SERVER(data->dir));
> iput(data->dir);
> put_rpccred(data->cred);
> kfree(data->args.name.name);
> @@ -151,6 +152,7 @@ static int nfs_do_call_unlink(struct dentry
> *parent, struct inode *dir, struct n
> nfs_dec_sillycount(dir);
> return 0;
> }
> + nfs_sb_active(NFS_SERVER(dir));
> data->args.fh = NFS_FH(dir);
> nfs_fattr_init(&data->res.dir_attr);
>
>
> But it does not notice this:
> struct dentry_operations nfs_dentry_operations = {
> .d_revalidate = nfs_lookup_revalidate,
> .d_delete = nfs_dentry_delete,
> .d_iput = nfs_dentry_iput,
> };
> struct dentry_operations nfs4_dentry_operations = {
> .d_revalidate = nfs_open_revalidate,
> .d_delete = nfs_dentry_delete,
> .d_iput = nfs_dentry_iput,
> };
>
> NFSv2/3 and NFSv4 share the same dentry_iput and so share the same
> unlink and sillyrename logic.
> But they do not share nfs_init_server()!
>
> I wonder why this doesn't blow up more violently, but only hangs...
>
> But as I don't know if it is correct to add the workqueue
> initialization to nfs4_init_server() or remove the nfs_sb_active /
> nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this.
>
> Torsten
I had already fixed that one in my own stack. Attached are the 3 patches
that I've got. 1 from SteveD, 2 fixes.
Andrew, could you please unapply the sillyrename patches you've got, and
apply these 3 instead?
Trond
On Nov 18, 2007 8:18 PM, Trond Myklebust <[email protected]> wrote:
> On Sun, 2007-11-18 at 19:44 +0100, Torsten Kaiser wrote:
> > NFSv2/3 and NFSv4 share the same dentry_iput and so share the same
> > unlink and sillyrename logic.
> > But they do not share nfs_init_server()!
> >
> > I wonder why this doesn't blow up more violently, but only hangs...
> >
> > But as I don't know if it is correct to add the workqueue
> > initialization to nfs4_init_server() or remove the nfs_sb_active /
> > nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this.
> >
> > Torsten
>
> I had already fixed that one in my own stack. Attached are the 3 patches
> that I've got. 1 from SteveD, 2 fixes.
>
Moving the init_waitqueue_head() like patch
linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying
linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work.
Also lockdep no longer complains about the non-static key.
Torsten
On Mon, 19 Nov 2007 08:15:48 +0100 "Torsten Kaiser" <[email protected]> wrote:
> On Nov 18, 2007 8:18 PM, Trond Myklebust <[email protected]> wrote:
> > On Sun, 2007-11-18 at 19:44 +0100, Torsten Kaiser wrote:
> > > NFSv2/3 and NFSv4 share the same dentry_iput and so share the same
> > > unlink and sillyrename logic.
> > > But they do not share nfs_init_server()!
> > >
> > > I wonder why this doesn't blow up more violently, but only hangs...
> > >
> > > But as I don't know if it is correct to add the workqueue
> > > initialization to nfs4_init_server() or remove the nfs_sb_active /
> > > nfs_sb_deactive for the NFSv4 case, I can't offer a patch to fix this.
> > >
> > > Torsten
> >
> > I had already fixed that one in my own stack. Attached are the 3 patches
> > that I've got. 1 from SteveD, 2 fixes.
> >
>
> Moving the init_waitqueue_head() like patch
> linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying
> linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work.
> Also lockdep no longer complains about the non-static key.
>
Thanks.
To avoid goofups, could you please send the full fix against 2.6.24-rc2-mm1?
On Nov 19, 2007 10:00 AM, Andrew Morton <[email protected]> wrote:
> On Mon, 19 Nov 2007 08:15:48 +0100 "Torsten Kaiser" <[email protected]> wrote:
> > On Nov 18, 2007 8:18 PM, Trond Myklebust <[email protected]> wrote:
> > > I had already fixed that one in my own stack. Attached are the 3 patches
> > > that I've got. 1 from SteveD, 2 fixes.
> >
> > Moving the init_waitqueue_head() like patch
> > linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif and applying
> > linux-2.6.24-007-fix_nfs_free_unlinkdata.dif lets my testcase work.
> > Also lockdep no longer complains about the non-static key.
>
> Thanks.
>
> To avoid goofups, could you please send the full fix against 2.6.24-rc2-mm1?
Umm... As I applied this changes manually there is a not insignificant
change of goofups on my part...
For the hang problem I think Tronds suggestion with replacing the
patches from -mm with fresh versions would be the best.
Anyway, currently I have the patch from
http://lkml.org/lkml/2007/11/16/74 to fix the can't-create-files-bug.
To fix the hang bug I used Tronds
linux-2.6.24-007-fix_nfs_free_unlinkdata.dif and the first two hunks
from linux-2.6.24-006-fix_to_fix_sillyrename_bug_on_umount.dif.
Torsten
The needed 2 hunks for reference:
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -594,9 +594,6 @@ static int nfs_init_server(struct nfs_server *server,
/* Create a client RPC handle for the NFSv3 ACL management interface */
nfs_init_server_aclclient(server);
- init_waitqueue_head(&server->active_wq);
- atomic_set(&server->active, 0);
-
dprintk("<-- nfs_init_server() = 0 [new %p]\n", clp);
return 0;
@@ -736,6 +733,9 @@ static struct nfs_server *nfs_alloc_server(void)
INIT_LIST_HEAD(&server->client_link);
INIT_LIST_HEAD(&server->master_link);
+ init_waitqueue_head(&server->active_wq);
+ atomic_set(&server->active, 0);
+
server->io_stats = nfs_alloc_iostats();
if (!server->io_stats) {
kfree(server);
On Sun, 18 Nov 2007, root wrote:
> @@ -2155,6 +2155,7 @@ static struct kmem_cache_node *early_kme
> {
> struct page *page;
> struct kmem_cache_node *n;
> + unsigned long flags;
>
> BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node));
>
Well local_irq_save is a bit of an overkill. We know that interrupts are
enabled during this phase of the boot sequence.
On Sun, 18 Nov 2007 14:18:06 -0500 Trond Myklebust <[email protected]> wrote:
> >
> > Torsten
>
> I had already fixed that one in my own stack. Attached are the 3 patches
> that I've got. 1 from SteveD, 2 fixes.
>
> Andrew, could you please unapply the sillyrename patches you've got, and
> apply these 3 instead?
I'd expect to see things like this appear in git-nfs.patch. Did something change?
CC's somewhat trimmed...
On Nov 18, 2007 12:00 AM, root <[email protected]> wrote:
> On Sat, Nov 17, 2007 at 07:09:46PM +0100, Ingo Molnar wrote:
> > * Torsten Kaiser <[email protected]> wrote:
> >
> > > Sadly lockdep does not work for me, as it gets turned off early:
> > > [ 39.851594] ---------------------------------
> > > [ 39.855963] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> > > [ 39.861981] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> > > [ 39.866963] (&n->list_lock){-+..}, at: [<ffffffff802935c1>]
> >
> > hey, that means it found a bug - which is not sad at all :-)
>
> ---
> Subject: lockdep: slub: annotate boot time node->list_lock usage
>
> inconsistent {softirq-on-W} -> {in-softirq-W} usage.
> swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
> (&n->list_lock){-+..}, at: [<ffffffff802935c1>] add_partial+0x31/0xa0
> {softirq-on-W} state was registered at:
> [<ffffffff80259fb8>] __lock_acquire+0x3e8/0x1140
> [<ffffffff80259838>] debug_check_no_locks_freed+0x188/0x1a0
> [<ffffffff8025ad65>] lock_acquire+0x55/0x70
> [<ffffffff802935c1>] add_partial+0x31/0xa0
> [<ffffffff805c76de>] _spin_lock+0x1e/0x30
> [<ffffffff802935c1>] add_partial+0x31/0xa0
> [<ffffffff80296f9c>] kmem_cache_open+0x1cc/0x330
> [<ffffffff805c7984>] _spin_unlock_irq+0x24/0x30
> [<ffffffff802974f4>] create_kmalloc_cache+0x64/0xf0
> [<ffffffff80295640>] init_alloc_cpu_cpu+0x70/0x90
> [<ffffffff8080ada5>] kmem_cache_init+0x65/0x1d0
> [<ffffffff807f1b4e>] start_kernel+0x23e/0x350
> [<ffffffff807f112d>] _sinittext+0x12d/0x140
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> CC: Christoph Lameter <[email protected]>
> CC: Kamalesh Babulal <[email protected]>
> ---
> mm/slub.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c
> +++ linux-2.6/mm/slub.c
> @@ -2155,6 +2155,7 @@ static struct kmem_cache_node *early_kme
> {
> struct page *page;
> struct kmem_cache_node *n;
> + unsigned long flags;
>
> BUG_ON(kmalloc_caches->size < sizeof(struct kmem_cache_node));
>
> @@ -2179,7 +2180,14 @@ static struct kmem_cache_node *early_kme
> #endif
> init_kmem_cache_node(n);
> atomic_long_inc(&n->nr_slabs);
> + /*
> + * lockdep requires consistent irq usage for each lock
> + * so even though there cannot be a race this early in
> + * the boot sequence, we still disable irqs.
> + */
> + local_irq_save(flags);
> add_partial(kmalloc_caches, page, 0);
> + local_irq_restore(flags);
> return n;
> }
I just tested something with vanilla 2.6.24-rc6 and had the same problem.
Should this patch, or something similar be included for 2.6.24?
The lockdep report:
[ 40.057281] PCI: BIOS Bug: MCFG area at f0000000 is not E820-reserved
[ 40.063736] PCI: Not using MMCONFIG.
[ 40.067329] PCI: Using configuration type 1
[ 40.063153]
[ 40.063154] =================================
[ 40.063156] [ INFO: inconsistent lock state ]
[ 40.063157] 2.6.24-rc6 #1
[ 40.063158] ---------------------------------
[ 40.063160] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[ 40.063162] swapper/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 40.063163] (&n->list_lock){-+..}, at: [<ffffffff8029409c>]
add_partial+0x1c/0x50
[ 40.063172] {softirq-on-W} state was registered at:
[ 40.063173] [<ffffffff8025a237>] __lock_acquire+0x3c7/0x1140
[ 40.063179] [<ffffffff8025991f>] trace_hardirqs_on+0xbf/0x160
[ 40.063182] [<ffffffff8025b00b>] lock_acquire+0x5b/0x80
[ 40.063185] [<ffffffff8029409c>] add_partial+0x1c/0x50
[ 40.063187] [<ffffffff805d0345>] _spin_lock+0x25/0x40
[ 40.063192] [<ffffffff8029409c>] add_partial+0x1c/0x50
[ 40.063195] [<ffffffff802979a7>] kmem_cache_open+0x1c7/0x330
[ 40.063198] [<ffffffff80297f23>] create_kmalloc_cache+0x63/0xc0
[ 40.063200] [<ffffffff80812d65>] kmem_cache_init+0x65/0x1d0
[ 40.063204] [<ffffffff807f9bc5>] start_kernel+0x245/0x360
[ 40.063208] [<ffffffff807f9131>] _sinittext+0x131/0x140
[ 40.063211] [<ffffffffffffffff>] 0xffffffffffffffff
[ 40.063214] irq event stamp: 569
[ 40.063215] hardirqs last enabled at (568): [<ffffffff8029677d>]
kmem_cache_free+0xcd/0x100
[ 40.063219] hardirqs last disabled at (569): [<ffffffff80296718>]
kmem_cache_free+0x68/0x100
[ 40.063222] softirqs last enabled at (550): [<ffffffff8023bb7f>]
__do_softirq+0xef/0x110
[ 40.063226] softirqs last disabled at (557): [<ffffffff8020cf0c>]
call_softirq+0x1c/0x30
[ 40.063230]
[ 40.063230] other info that might help us debug this:
[ 40.063231] no locks held by swapper/0.
[ 40.063232]
[ 40.063233] stack backtrace:
[ 40.063235] Pid: 0, comm: swapper Not tainted 2.6.24-rc6 #1
[ 40.063236]
[ 40.063236] Call Trace:
[ 40.063237] <IRQ> [<ffffffff80258699>] print_usage_bug+0x189/0x190
[ 40.063243] [<ffffffff802596fd>] mark_lock+0x63d/0x650
[ 40.063246] [<ffffffff8025a1ee>] __lock_acquire+0x37e/0x1140
[ 40.063248] [<ffffffff8020d1b7>] dump_trace+0xd7/0x2d0
[ 40.063250] [<ffffffff802144a8>] save_stack_trace+0x28/0x50
[ 40.063253] [<ffffffff802b4084>] free_fdtable_rcu+0x94/0xa0
[ 40.063255] [<ffffffff8025b00b>] lock_acquire+0x5b/0x80
[ 40.063257] [<ffffffff8029409c>] add_partial+0x1c/0x50
[ 40.063259] [<ffffffff805d0345>] _spin_lock+0x25/0x40
[ 40.063261] [<ffffffff8029409c>] add_partial+0x1c/0x50
[ 40.063264] [<ffffffff80295c3f>] __slab_free+0xaf/0x2f0
[ 40.063265] [<ffffffff802b4084>] free_fdtable_rcu+0x94/0xa0
[ 40.063267] [<ffffffff802b4084>] free_fdtable_rcu+0x94/0xa0
[ 40.063269] [<ffffffff80296751>] kmem_cache_free+0xa1/0x100
[ 40.063271] [<ffffffff802b4084>] free_fdtable_rcu+0x94/0xa0
[ 40.063274] [<ffffffff80249589>] __rcu_process_callbacks+0x89/0x240
[ 40.063276] [<ffffffff80249763>] rcu_process_callbacks+0x23/0x50
[ 40.063278] [<ffffffff8023bc43>] tasklet_action+0x53/0xd0
[ 40.063280] [<ffffffff8023bb14>] __do_softirq+0x84/0x110
[ 40.063284] [<ffffffff802377fe>] profile_tick+0x5e/0xa0
[ 40.063286] [<ffffffff8020ac50>] default_idle+0x0/0x60
[ 40.063288] [<ffffffff8020cf0c>] call_softirq+0x1c/0x30
[ 40.063290] [<ffffffff8020f115>] do_softirq+0x65/0xc0
[ 40.063292] [<ffffffff8023ba85>] irq_exit+0x95/0xa0
[ 40.063296] [<ffffffff8021bcf9>] smp_apic_timer_interrupt+0x49/0x70
[ 40.063297] [<ffffffff8020ac50>] default_idle+0x0/0x60
[ 40.063300] [<ffffffff8020c9bb>] apic_timer_interrupt+0x6b/0x70
[ 40.063301] <EOI> [<ffffffff8020ac87>] default_idle+0x37/0x60
[ 40.063304] [<ffffffff8020ac85>] default_idle+0x35/0x60
[ 40.063305] [<ffffffff8020ad1b>] cpu_idle+0x6b/0xa0
[ 40.063308] [<ffffffff808061d8>] start_secondary+0x2f8/0x430
[ 40.063310]
[ 40.419678] ACPI: EC: Look up EC in DSDT
[ 40.428641] ACPI: Interpreter enabled
[ 40.432320] ACPI: (supports S0 S5)
[ 40.435823] ACPI: Using IOAPIC for interrupt routing
On Wed, 2 Jan 2008, Torsten Kaiser wrote:
> I just tested something with vanilla 2.6.24-rc6 and had the same problem.
> Should this patch, or something similar be included for 2.6.24?
Such a patch is in Andrew's tree.
2.6.24-rc6-mm1:
tatic struct kmem_cache_node *early_kmem_cache_node_alloc(gfp_t gfpflags,
int node)
{
struct page *page;
struct kmem_cache_node *n;
unsigned long flags;
...
/*
* lockdep requires consistent irq usage for each lock
* so even though there cannot be a race this early in
* the boot sequence, we still disable irqs.
*/
local_irq_save(flags);
add_partial(kmalloc_caches, page, 0);
local_irq_restore(flags);
return n;
}
On Jan 2, 2008 9:51 PM, Christoph Lameter <[email protected]> wrote:
> On Wed, 2 Jan 2008, Torsten Kaiser wrote:
>
> > I just tested something with vanilla 2.6.24-rc6 and had the same problem.
> > Should this patch, or something similar be included for 2.6.24?
>
> Such a patch is in Andrew's tree.
>
> 2.6.24-rc6-mm1:
>
> tatic struct kmem_cache_node *early_kmem_cache_node_alloc(gfp_t gfpflags,
> int node)
> {
> struct page *page;
> struct kmem_cache_node *n;
> unsigned long flags;
> ...
>
> /*
>
> * lockdep requires consistent irq usage for each lock
> * so even though there cannot be a race this early in
> * the boot sequence, we still disable irqs.
> */
> local_irq_save(flags);
> add_partial(kmalloc_caches, page, 0);
> local_irq_restore(flags);
> return n;
> }
>
from 2.6.24-rc6-mm1 patch-series file:
slub-noinline-some-functions-to-avoid-them-being-folded-into-alloc-free.patch
slub-move-kmem_cache_node-determination-into-add_full-and-add_partial.patch
slub-move-kmem_cache_node-determination-into-add_full-and-add_partial-slub-workaround-for-lockdep-confusion.patch
slub-avoid-checking-for-a-valid-object-before-zeroing-on-the-fast-path.patch
It seems it got lumped into some other slub patches, but the bug does
not seem to be introduced by them, as I can see it in mainline
2.6.24-rc6.
Should this patch made a candidate for the merge-before-2.6.24-final-queue?
Torsten