2013-08-04 15:41:05

by Nix

[permalink] [raw]
Subject: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

I just got this panic on 3.10.4, in the middle of a large parallel
compilation (of Chromium, as it happens) over NFSv3:

[16364.527516] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[16364.527571] IP: [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
[16364.527611] PGD 0
[16364.527626] Oops: 0000 [#1] PREEMPT SMP
[16364.527656] Modules linked in: [last unloaded: microcode]
[16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted 3.10.4-05315-gf4ce424-dirty #1
[16364.527730] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
[16364.527775] task: ffff88041a97ad60 ti: ffff8803501d4000 task.ti: ffff8803501d4000
[16364.527813] RIP: 0010:[<ffffffff81245157>] [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
[16364.527860] RSP: 0018:ffff8803501d5c58 EFLAGS: 00010282
[16364.527889] RAX: ffff88041a97ad60 RBX: ffff8803e49c8800 RCX: 0000000000000000
[16364.527926] RDX: 0000000000000000 RSI: 000000000000004a RDI: ffff8803e49c8b54
[16364.527962] RBP: ffff8803501d5c68 R08: 0000000000015720 R09: 0000000000000000
[16364.527998] R10: 00007ffffffff000 R11: ffff8803501d5d58 R12: ffff8803501d5d58
[16364.528034] R13: ffff88041bd2bc00 R14: 0000000000000000 R15: ffff8803fc9e2900
[16364.528070] FS: 0000000000000000(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
[16364.528111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16364.528142] CR2: 0000000000000008 CR3: 0000000001c0b000 CR4: 00000000001407f0
[16364.528177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[16364.528214] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[16364.528303] Stack:
[16364.528316] ffff8803501d5d58 ffff8803e49c8800 ffff8803501d5cd8 ffffffff81245418
[16364.528369] 0000000000000000 ffff8803516f0bc0 ffff8803d7b7b6c0 ffffffff81215c81
[16364.528418] ffff880300000007 ffff88041bd2bdc8 ffff8801aabe9650 ffff8803fc9e2900
[16364.528467] Call Trace:
[16364.528485] [<ffffffff81245418>] nlmclnt_proc+0x148/0x5fb
[16364.528516] [<ffffffff81215c81>] ? nfs_put_lock_context+0x69/0x6e
[16364.528550] [<ffffffff812209a2>] nfs3_proc_lock+0x21/0x23
[16364.528581] [<ffffffff812149dd>] do_unlk+0x96/0xb2
[16364.528608] [<ffffffff81214b41>] nfs_flock+0x5a/0x71
[16364.528637] [<ffffffff8119a747>] locks_remove_flock+0x9e/0x113
[16364.528668] [<ffffffff8115cc68>] __fput+0xb6/0x1e6
[16364.528695] [<ffffffff8115cda6>] ____fput+0xe/0x10
[16364.528724] [<ffffffff810998da>] task_work_run+0x7e/0x98
[16364.528754] [<ffffffff81082bc5>] do_exit+0x3cc/0x8fa
[16364.528782] [<ffffffff81083501>] ? SyS_wait4+0xa5/0xc2
[16364.528811] [<ffffffff8108328d>] do_group_exit+0x6f/0xa2
[16364.528843] [<ffffffff810832d7>] SyS_exit_group+0x17/0x17
[16364.528876] [<ffffffff81613e92>] system_call_fastpath+0x16/0x1b
[16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 68 05 00 00 <48> 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 38 48 8b
[16364.529176] RIP [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
[16364.529264] RSP <ffff8803501d5c58>
[16364.529283] CR2: 0000000000000008
[16364.539039] ---[ end trace 5a73fddf23441377 ]---

This is the same machine on which this panic has been occurring on
shutdown since 3.9.x: Al Viro has previously pointed out the problem and
nothing has happened:

[50618.993226] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[50618.993904] IP: [<ffffffff81165e76>] path_init+0x11c/0x36f
[50618.994609] PGD 0
[50618.995329] Oops: 0000 [#1] PREEMPT SMP
[50618.996027] Modules linked in: [last unloaded: microcode]
[50618.996758] CPU: 3 PID: 1262 Comm: pulseaudio Not tainted 3.10.4-05315-gf4ce424-dirty #1
[50618.997506] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
[50618.998268] task: ffff88041bf1ad60 ti: ffff88041b19e000 task.ti: ffff88041b19e000
[50618.999017] RIP: 0010:[<ffffffff81165e76>] [<ffffffff81165e76>] path_init+0x11c/0x36f
[50618.999804] RSP: 0018:ffff88041b19f508 EFLAGS: 00010246
[50619.000592] RAX: 0000000000000000 RBX: ffff88041b19f658 RCX: 000000000000005c
[50619.001398] RDX: 0000000000005c5c RSI: ffff880419b3781a RDI: ffffffff81c34a10
[50619.002198] RBP: ffff88041b19f558 R08: ffff88041b19f588 R09: ffff88041b19f7c4
[50619.002999] R10: 00000000ffffff9c R11: ffff88041b19f658 R12: 0000000000000041
[50619.003816] R13: 0000000000000040 R14: ffff880419b3781a R15: ffff88041b19f7c4
[50619.004638] FS: 00007fca19bc2740(0000) GS:ffff88042fac0000(0000) knlGS:0000000000000000
[50619.005465] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50619.006284] CR2: 0000000000000008 CR3: 0000000001c0b000 CR4: 00000000001407e0
[50619.007092] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[50619.007922] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[50619.008750] Stack: [50619.009576] ffff88041b19f518 00000000ffbfaa5e 0000000000000000 ffffffff8151e735
[50619.010437] ffffc900080ae000 ffff88041b19f658 0000000000000041 ffff880419b3781a
[50619.011292] ffff88041b19f628 ffff88041b19f7c4 ffff88041b19f5e8 ffffffff811660fc
[50619.012119] Call Trace:
[50619.012947] [<ffffffff8151e735>] ? skb_checksum+0x4f/0x25b
[50619.013782] [<ffffffff811660fc>] path_lookupat+0x33/0x6c5
[50619.014618] [<ffffffff8152c623>] ? dev_hard_start_xmit+0x2e5/0x50b
[50619.015457] [<ffffffff811667b4>] filename_lookup.isra.27+0x26/0x5c
[50619.016298] [<ffffffff8116687e>] do_path_lookup+0x33/0x35
[50619.017123] [<ffffffff81166aac>] kern_path+0x2a/0x4d
[50619.017973] [<ffffffff815203d8>] ? __alloc_skb+0x75/0x186
[50619.018832] [<ffffffff81520324>] ? __kmalloc_reserve.isra.42+0x2d/0x6c
[50619.019702] [<ffffffff815871a3>] unix_find_other+0x38/0x1b9
[50619.020568] [<ffffffff81589043>] unix_stream_connect+0x102/0x3ed
[50619.021429] [<ffffffff81518cbc>] ? __sock_create+0x168/0x1c0
[50619.022301] [<ffffffff815de7e3>] ? call_refreshresult+0x91/0x91
[50619.023170] [<ffffffff81516531>] kernel_connect+0x10/0x12
[50619.024047] [<ffffffff815e1d36>] xs_local_setup_socket+0x122/0x191
[50619.024945] [<ffffffff815e2f50>] xs_local_connect+0x2c/0x48
[50619.025849] [<ffffffff815e01f6>] xprt_connect+0x112/0x11b
[50619.026756] [<ffffffff815de81c>] call_connect+0x39/0x3b
[50619.027662] [<ffffffff815e4e68>] __rpc_execute+0xe8/0x2ca
[50619.028567] [<ffffffff815e5109>] rpc_execute+0x76/0x9d
[50619.029473] [<ffffffff815debd1>] rpc_run_task+0x78/0x80
[50619.030376] [<ffffffff815ded0f>] rpc_call_sync+0x88/0x9e
[50619.031270] [<ffffffff815ebd2f>] rpcb_register_call+0x1f/0x2e
[50619.032143] [<ffffffff815ec216>] rpcb_v4_register+0xb2/0x13c
[50619.033031] [<ffffffff8108addb>] ? call_timer_fn+0x15e/0x15e
[50619.033918] [<ffffffff815e7816>] svc_unregister.isra.11+0x5a/0xcb
[50619.034804] [<ffffffff815e789b>] svc_rpcb_cleanup+0x14/0x21
[50619.035706] [<ffffffff815e70ef>] svc_shutdown_net+0x2b/0x30
[50619.036586] [<ffffffff812471c5>] lockd_down_net+0x7f/0xa3
[50619.037465] [<ffffffff81247413>] lockd_down+0x30/0xb2
[50619.038346] [<ffffffff8124439f>] nlmclnt_done+0x1f/0x23
[50619.039227] [<ffffffff8120fd72>] ? nfs_start_lockd+0xc8/0xc8
[50619.040086] [<ffffffff8120fd89>] nfs_destroy_server+0x17/0x19
[50619.040962] [<ffffffff8121024b>] nfs_free_server+0xeb/0x15c
[50619.041947] [<ffffffff812172c3>] nfs_kill_super+0x1f/0x23
[50619.042824] [<ffffffff8115da33>] deactivate_locked_super+0x26/0x52
[50619.043696] [<ffffffff8115e73d>] deactivate_super+0x42/0x47
[50619.044562] [<ffffffff8117453e>] mntput_no_expire+0x135/0x13e
[50619.045424] [<ffffffff81174574>] mntput+0x2d/0x2f
[50619.046287] [<ffffffff8115cd78>] __fput+0x1c6/0x1e6
[50619.047111] [<ffffffff8115cda6>] ____fput+0xe/0x10
[50619.047943] [<ffffffff810998da>] task_work_run+0x7e/0x98
[50619.048764] [<ffffffff81082bc5>] do_exit+0x3cc/0x8fa
[50619.049580] [<ffffffff81174449>] ? mntput_no_expire+0x40/0x13e
[50619.050399] [<ffffffff8108ca8b>] ? __dequeue_signal+0x1a/0x118
[50619.051215] [<ffffffff8108328d>] do_group_exit+0x6f/0xa2
[50619.052000] [<ffffffff8108f0e7>] get_signal_to_deliver+0x4f2/0x530
[50619.052797] [<ffffffff81036a39>] do_signal+0x4d/0x4a4
[50619.053577] [<ffffffff810f2810>] ? call_rcu+0x17/0x19
[50619.054344] [<ffffffff81036ebc>] do_notify_resume+0x2c/0x6b
[50619.055084] [<ffffffff81614098>] int_signal+0x12/0x17
[50619.055852] Code: c7 c7 10 4a c3 81 e8 79 c4 f3 ff e8 99 3a f3 ff 48 83 7b 20 00 0f 85 8d 00 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 80 58 05 00 00 <8b> 50 08 f6 c2 01 74 04 f3 90 eb f4 48 8b 48 18 48 89 4b 20 48
[50619.057735] RIP [<ffffffff81165e76>] path_init+0x11c/0x36f
[50619.058586] RSP <ffff88041b19f508>
[50619.059429] CR2: 0000000000000008

.config available on request, but it seems like I've been posting it to
l-k with various crashes too often and I don't want to be accused of
spamming!


2013-08-05 12:43:58

by Jeff Layton

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Sun, 04 Aug 2013 16:40:58 +0100
Nix <[email protected]> wrote:

> I just got this panic on 3.10.4, in the middle of a large parallel
> compilation (of Chromium, as it happens) over NFSv3:
>
> [16364.527516] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [16364.527571] IP: [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> [16364.527611] PGD 0
> [16364.527626] Oops: 0000 [#1] PREEMPT SMP
> [16364.527656] Modules linked in: [last unloaded: microcode]
> [16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted 3.10.4-05315-gf4ce424-dirty #1
> [16364.527730] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
> [16364.527775] task: ffff88041a97ad60 ti: ffff8803501d4000 task.ti: ffff8803501d4000
> [16364.527813] RIP: 0010:[<ffffffff81245157>] [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> [16364.527860] RSP: 0018:ffff8803501d5c58 EFLAGS: 00010282
> [16364.527889] RAX: ffff88041a97ad60 RBX: ffff8803e49c8800 RCX: 0000000000000000
> [16364.527926] RDX: 0000000000000000 RSI: 000000000000004a RDI: ffff8803e49c8b54
> [16364.527962] RBP: ffff8803501d5c68 R08: 0000000000015720 R09: 0000000000000000
> [16364.527998] R10: 00007ffffffff000 R11: ffff8803501d5d58 R12: ffff8803501d5d58
> [16364.528034] R13: ffff88041bd2bc00 R14: 0000000000000000 R15: ffff8803fc9e2900
> [16364.528070] FS: 0000000000000000(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
> [16364.528111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [16364.528142] CR2: 0000000000000008 CR3: 0000000001c0b000 CR4: 00000000001407f0
> [16364.528177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [16364.528214] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [16364.528303] Stack:
> [16364.528316] ffff8803501d5d58 ffff8803e49c8800 ffff8803501d5cd8 ffffffff81245418
> [16364.528369] 0000000000000000 ffff8803516f0bc0 ffff8803d7b7b6c0 ffffffff81215c81
> [16364.528418] ffff880300000007 ffff88041bd2bdc8 ffff8801aabe9650 ffff8803fc9e2900
> [16364.528467] Call Trace:
> [16364.528485] [<ffffffff81245418>] nlmclnt_proc+0x148/0x5fb
> [16364.528516] [<ffffffff81215c81>] ? nfs_put_lock_context+0x69/0x6e
> [16364.528550] [<ffffffff812209a2>] nfs3_proc_lock+0x21/0x23
> [16364.528581] [<ffffffff812149dd>] do_unlk+0x96/0xb2
> [16364.528608] [<ffffffff81214b41>] nfs_flock+0x5a/0x71
> [16364.528637] [<ffffffff8119a747>] locks_remove_flock+0x9e/0x113
> [16364.528668] [<ffffffff8115cc68>] __fput+0xb6/0x1e6
> [16364.528695] [<ffffffff8115cda6>] ____fput+0xe/0x10
> [16364.528724] [<ffffffff810998da>] task_work_run+0x7e/0x98
> [16364.528754] [<ffffffff81082bc5>] do_exit+0x3cc/0x8fa
> [16364.528782] [<ffffffff81083501>] ? SyS_wait4+0xa5/0xc2
> [16364.528811] [<ffffffff8108328d>] do_group_exit+0x6f/0xa2
> [16364.528843] [<ffffffff810832d7>] SyS_exit_group+0x17/0x17
> [16364.528876] [<ffffffff81613e92>] system_call_fastpath+0x16/0x1b
> [16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 68 05 00 00 <48> 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 38 48 8b
> [16364.529176] RIP [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> [16364.529264] RSP <ffff8803501d5c58>
> [16364.529283] CR2: 0000000000000008
> [16364.539039] ---[ end trace 5a73fddf23441377 ]---
>

What might be most helpful is to figure out exactly where the above
panic occurred. The instructions here may be helpful:

http://wiki.samba.org/index.php/LinuxCIFS_troubleshooting#Oopses

..but you'll need to replace cifs.ko with lockd.ko in the gdb command.

> This is the same machine on which this panic has been occurring on
> shutdown since 3.9.x: Al Viro has previously pointed out the problem and
> nothing has happened:
>
> [50618.993226] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [50618.993904] IP: [<ffffffff81165e76>] path_init+0x11c/0x36f
> [50618.994609] PGD 0
> [50618.995329] Oops: 0000 [#1] PREEMPT SMP
> [50618.996027] Modules linked in: [last unloaded: microcode]
> [50618.996758] CPU: 3 PID: 1262 Comm: pulseaudio Not tainted 3.10.4-05315-gf4ce424-dirty #1
> [50618.997506] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
> [50618.998268] task: ffff88041bf1ad60 ti: ffff88041b19e000 task.ti: ffff88041b19e000
> [50618.999017] RIP: 0010:[<ffffffff81165e76>] [<ffffffff81165e76>] path_init+0x11c/0x36f
> [50618.999804] RSP: 0018:ffff88041b19f508 EFLAGS: 00010246
> [50619.000592] RAX: 0000000000000000 RBX: ffff88041b19f658 RCX: 000000000000005c
> [50619.001398] RDX: 0000000000005c5c RSI: ffff880419b3781a RDI: ffffffff81c34a10
> [50619.002198] RBP: ffff88041b19f558 R08: ffff88041b19f588 R09: ffff88041b19f7c4
> [50619.002999] R10: 00000000ffffff9c R11: ffff88041b19f658 R12: 0000000000000041
> [50619.003816] R13: 0000000000000040 R14: ffff880419b3781a R15: ffff88041b19f7c4
> [50619.004638] FS: 00007fca19bc2740(0000) GS:ffff88042fac0000(0000) knlGS:0000000000000000
> [50619.005465] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [50619.006284] CR2: 0000000000000008 CR3: 0000000001c0b000 CR4: 00000000001407e0
> [50619.007092] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [50619.007922] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [50619.008750] Stack: [50619.009576] ffff88041b19f518 00000000ffbfaa5e 0000000000000000 ffffffff8151e735
> [50619.010437] ffffc900080ae000 ffff88041b19f658 0000000000000041 ffff880419b3781a
> [50619.011292] ffff88041b19f628 ffff88041b19f7c4 ffff88041b19f5e8 ffffffff811660fc
> [50619.012119] Call Trace:
> [50619.012947] [<ffffffff8151e735>] ? skb_checksum+0x4f/0x25b
> [50619.013782] [<ffffffff811660fc>] path_lookupat+0x33/0x6c5
> [50619.014618] [<ffffffff8152c623>] ? dev_hard_start_xmit+0x2e5/0x50b
> [50619.015457] [<ffffffff811667b4>] filename_lookup.isra.27+0x26/0x5c
> [50619.016298] [<ffffffff8116687e>] do_path_lookup+0x33/0x35
> [50619.017123] [<ffffffff81166aac>] kern_path+0x2a/0x4d
> [50619.017973] [<ffffffff815203d8>] ? __alloc_skb+0x75/0x186
> [50619.018832] [<ffffffff81520324>] ? __kmalloc_reserve.isra.42+0x2d/0x6c
> [50619.019702] [<ffffffff815871a3>] unix_find_other+0x38/0x1b9
> [50619.020568] [<ffffffff81589043>] unix_stream_connect+0x102/0x3ed
> [50619.021429] [<ffffffff81518cbc>] ? __sock_create+0x168/0x1c0
> [50619.022301] [<ffffffff815de7e3>] ? call_refreshresult+0x91/0x91
> [50619.023170] [<ffffffff81516531>] kernel_connect+0x10/0x12
> [50619.024047] [<ffffffff815e1d36>] xs_local_setup_socket+0x122/0x191
> [50619.024945] [<ffffffff815e2f50>] xs_local_connect+0x2c/0x48
> [50619.025849] [<ffffffff815e01f6>] xprt_connect+0x112/0x11b
> [50619.026756] [<ffffffff815de81c>] call_connect+0x39/0x3b
> [50619.027662] [<ffffffff815e4e68>] __rpc_execute+0xe8/0x2ca
> [50619.028567] [<ffffffff815e5109>] rpc_execute+0x76/0x9d
> [50619.029473] [<ffffffff815debd1>] rpc_run_task+0x78/0x80
> [50619.030376] [<ffffffff815ded0f>] rpc_call_sync+0x88/0x9e
> [50619.031270] [<ffffffff815ebd2f>] rpcb_register_call+0x1f/0x2e
> [50619.032143] [<ffffffff815ec216>] rpcb_v4_register+0xb2/0x13c
> [50619.033031] [<ffffffff8108addb>] ? call_timer_fn+0x15e/0x15e
> [50619.033918] [<ffffffff815e7816>] svc_unregister.isra.11+0x5a/0xcb
> [50619.034804] [<ffffffff815e789b>] svc_rpcb_cleanup+0x14/0x21
> [50619.035706] [<ffffffff815e70ef>] svc_shutdown_net+0x2b/0x30
> [50619.036586] [<ffffffff812471c5>] lockd_down_net+0x7f/0xa3
> [50619.037465] [<ffffffff81247413>] lockd_down+0x30/0xb2
> [50619.038346] [<ffffffff8124439f>] nlmclnt_done+0x1f/0x23
> [50619.039227] [<ffffffff8120fd72>] ? nfs_start_lockd+0xc8/0xc8
> [50619.040086] [<ffffffff8120fd89>] nfs_destroy_server+0x17/0x19
> [50619.040962] [<ffffffff8121024b>] nfs_free_server+0xeb/0x15c
> [50619.041947] [<ffffffff812172c3>] nfs_kill_super+0x1f/0x23
> [50619.042824] [<ffffffff8115da33>] deactivate_locked_super+0x26/0x52
> [50619.043696] [<ffffffff8115e73d>] deactivate_super+0x42/0x47
> [50619.044562] [<ffffffff8117453e>] mntput_no_expire+0x135/0x13e
> [50619.045424] [<ffffffff81174574>] mntput+0x2d/0x2f
> [50619.046287] [<ffffffff8115cd78>] __fput+0x1c6/0x1e6
> [50619.047111] [<ffffffff8115cda6>] ____fput+0xe/0x10
> [50619.047943] [<ffffffff810998da>] task_work_run+0x7e/0x98
> [50619.048764] [<ffffffff81082bc5>] do_exit+0x3cc/0x8fa
> [50619.049580] [<ffffffff81174449>] ? mntput_no_expire+0x40/0x13e
> [50619.050399] [<ffffffff8108ca8b>] ? __dequeue_signal+0x1a/0x118
> [50619.051215] [<ffffffff8108328d>] do_group_exit+0x6f/0xa2
> [50619.052000] [<ffffffff8108f0e7>] get_signal_to_deliver+0x4f2/0x530
> [50619.052797] [<ffffffff81036a39>] do_signal+0x4d/0x4a4
> [50619.053577] [<ffffffff810f2810>] ? call_rcu+0x17/0x19
> [50619.054344] [<ffffffff81036ebc>] do_notify_resume+0x2c/0x6b
> [50619.055084] [<ffffffff81614098>] int_signal+0x12/0x17
> [50619.055852] Code: c7 c7 10 4a c3 81 e8 79 c4 f3 ff e8 99 3a f3 ff 48 83 7b 20 00 0f 85 8d 00 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 80 58 05 00 00 <8b> 50 08 f6 c2 01 74 04 f3 90 eb f4 48 8b 48 18 48 89 4b 20 48
> [50619.057735] RIP [<ffffffff81165e76>] path_init+0x11c/0x36f
> [50619.058586] RSP <ffff88041b19f508>
> [50619.059429] CR2: 0000000000000008
>
> .config available on request, but it seems like I've been posting it to
> l-k with various crashes too often and I don't want to be accused of
> spamming!

Prob would have been a good idea to cc linux-nfs. It can be easy to
miss things on LKML. In any case, here's what Al said:

> > [ 251.256556] EIP is at path_init+0xc7/0x27f
>
> Apparently that's set_root_rcu() with current->fs being NULL. Which comes from
> AF_UNIX connect done by some twisted call chain in context of hell knows what.
>

...and then:

> Why is it done in essentially random process context, anyway? There's such thing
> as chroot, after all, which would screw that sucker as hard as NULL ->fs, but in
> a less visible way...

Having not studied the problem, I can't offer up much of an idea on
how to fix it at this point.

--
Jeff Layton <[email protected]>

2013-08-05 14:48:26

by Nix

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On 5 Aug 2013, Jeff Layton stated:

> On Sun, 04 Aug 2013 16:40:58 +0100
> Nix <[email protected]> wrote:
>
>> I just got this panic on 3.10.4, in the middle of a large parallel
>> compilation (of Chromium, as it happens) over NFSv3:
>>
>> [16364.527516] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> [16364.527571] IP: [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
>> [16364.527611] PGD 0
>> [16364.527626] Oops: 0000 [#1] PREEMPT SMP
>> [16364.527656] Modules linked in: [last unloaded: microcode]
>> [16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted 3.10.4-05315-gf4ce424-dirty #1
>> [16364.527730] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
>> [16364.527775] task: ffff88041a97ad60 ti: ffff8803501d4000 task.ti: ffff8803501d4000
>> [16364.527813] RIP: 0010:[<ffffffff81245157>] [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
>> [16364.527860] RSP: 0018:ffff8803501d5c58 EFLAGS: 00010282
>> [16364.527889] RAX: ffff88041a97ad60 RBX: ffff8803e49c8800 RCX: 0000000000000000
>> [16364.527926] RDX: 0000000000000000 RSI: 000000000000004a RDI: ffff8803e49c8b54
>> [16364.527962] RBP: ffff8803501d5c68 R08: 0000000000015720 R09: 0000000000000000
>> [16364.527998] R10: 00007ffffffff000 R11: ffff8803501d5d58 R12: ffff8803501d5d58
>> [16364.528034] R13: ffff88041bd2bc00 R14: 0000000000000000 R15: ffff8803fc9e2900
>> [16364.528070] FS: 0000000000000000(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
>> [16364.528111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [16364.528142] CR2: 0000000000000008 CR3: 0000000001c0b000 CR4: 00000000001407f0
>> [16364.528177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [16364.528214] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [16364.528303] Stack:
>> [16364.528316] ffff8803501d5d58 ffff8803e49c8800 ffff8803501d5cd8 ffffffff81245418
>> [16364.528369] 0000000000000000 ffff8803516f0bc0 ffff8803d7b7b6c0 ffffffff81215c81
>> [16364.528418] ffff880300000007 ffff88041bd2bdc8 ffff8801aabe9650 ffff8803fc9e2900
>> [16364.528467] Call Trace:
>> [16364.528485] [<ffffffff81245418>] nlmclnt_proc+0x148/0x5fb
>> [16364.528516] [<ffffffff81215c81>] ? nfs_put_lock_context+0x69/0x6e
>> [16364.528550] [<ffffffff812209a2>] nfs3_proc_lock+0x21/0x23
>> [16364.528581] [<ffffffff812149dd>] do_unlk+0x96/0xb2
>> [16364.528608] [<ffffffff81214b41>] nfs_flock+0x5a/0x71
>> [16364.528637] [<ffffffff8119a747>] locks_remove_flock+0x9e/0x113
>> [16364.528668] [<ffffffff8115cc68>] __fput+0xb6/0x1e6
>> [16364.528695] [<ffffffff8115cda6>] ____fput+0xe/0x10
>> [16364.528724] [<ffffffff810998da>] task_work_run+0x7e/0x98
>> [16364.528754] [<ffffffff81082bc5>] do_exit+0x3cc/0x8fa
>> [16364.528782] [<ffffffff81083501>] ? SyS_wait4+0xa5/0xc2
>> [16364.528811] [<ffffffff8108328d>] do_group_exit+0x6f/0xa2
>> [16364.528843] [<ffffffff810832d7>] SyS_exit_group+0x17/0x17
>> [16364.528876] [<ffffffff81613e92>] system_call_fastpath+0x16/0x1b
>> [16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 68 05 00 00 <48> 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 38 48 8b
>> [16364.529176] RIP [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
>> [16364.529264] RSP <ffff8803501d5c58>
>> [16364.529283] CR2: 0000000000000008
>> [16364.539039] ---[ end trace 5a73fddf23441377 ]---
>
> What might be most helpful is to figure out exactly where the above
> panic occurred.

OK. My kernel is non-modular and built without debugging information:
rebuilt with debugging info, and got this:

0xffffffff81245418 is in nlmclnt_proc (fs/lockd/clntproc.c:172).
167 return -ENOMEM;
168 }
169 /* Set up the argument struct */
170 nlmclnt_setlockargs(call, fl);
171
172 if (IS_SETLK(cmd) || IS_SETLKW(cmd)) {
173 if (fl->fl_type != F_UNLCK) {
174 call->a_args.block = IS_SETLKW(cmd) ? 1 : 0;
175 status = nlmclnt_lock(call, fl);
176 } else

That's decimal 328:

0xffffffff81245413 <+323>: callq 0xffffffff81245102 <nlmclnt_setlockargs>
0xffffffff81245418 <+328>: mov -0x40(%rbp),%eax
0xffffffff8124541b <+331>: sub $0x6,%eax
0xffffffff8124541e <+334>: cmp $0x1,%eax

nlm_alloc_call() cannot fail (we have a NULL check right there), and fl
also cannot be NULL because it's dereferenced in nfs_flock(), up the
call chain from where we are.

Time to stick some printk()s in, I susupect. (Not sure how to keep them
from utterly flooding the log, though.)

>> This is the same machine on which this panic has been occurring on
>> shutdown since 3.9.x: Al Viro has previously pointed out the problem and
>> nothing has happened:
>>
>> [50618.993226] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> [50618.993904] IP: [<ffffffff81165e76>] path_init+0x11c/0x36f
>> [50618.994609] PGD 0
>> [50618.995329] Oops: 0000 [#1] PREEMPT SMP
>> [50618.996027] Modules linked in: [last unloaded: microcode]
>> [50618.996758] CPU: 3 PID: 1262 Comm: pulseaudio Not tainted 3.10.4-05315-gf4ce424-dirty #1
>> [50618.997506] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
>> [50618.998268] task: ffff88041bf1ad60 ti: ffff88041b19e000 task.ti: ffff88041b19e000
>> [50618.999017] RIP: 0010:[<ffffffff81165e76>] [<ffffffff81165e76>] path_init+0x11c/0x36f
>> [50618.999804] RSP: 0018:ffff88041b19f508 EFLAGS: 00010246
>> [50619.000592] RAX: 0000000000000000 RBX: ffff88041b19f658 RCX: 000000000000005c
>> [50619.001398] RDX: 0000000000005c5c RSI: ffff880419b3781a RDI: ffffffff81c34a10
>> [50619.002198] RBP: ffff88041b19f558 R08: ffff88041b19f588 R09: ffff88041b19f7c4
>> [50619.002999] R10: 00000000ffffff9c R11: ffff88041b19f658 R12: 0000000000000041
>> [50619.003816] R13: 0000000000000040 R14: ffff880419b3781a R15: ffff88041b19f7c4
>> [50619.004638] FS: 00007fca19bc2740(0000) GS:ffff88042fac0000(0000) knlGS:0000000000000000
>> [50619.005465] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [50619.006284] CR2: 0000000000000008 CR3: 0000000001c0b000 CR4: 00000000001407e0
>> [50619.007092] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [50619.007922] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [50619.008750] Stack: [50619.009576] ffff88041b19f518 00000000ffbfaa5e 0000000000000000 ffffffff8151e735
>> [50619.010437] ffffc900080ae000 ffff88041b19f658 0000000000000041 ffff880419b3781a
>> [50619.011292] ffff88041b19f628 ffff88041b19f7c4 ffff88041b19f5e8 ffffffff811660fc
>> [50619.012119] Call Trace:
>> [50619.012947] [<ffffffff8151e735>] ? skb_checksum+0x4f/0x25b
>> [50619.013782] [<ffffffff811660fc>] path_lookupat+0x33/0x6c5
>> [50619.014618] [<ffffffff8152c623>] ? dev_hard_start_xmit+0x2e5/0x50b
>> [50619.015457] [<ffffffff811667b4>] filename_lookup.isra.27+0x26/0x5c
>> [50619.016298] [<ffffffff8116687e>] do_path_lookup+0x33/0x35
>> [50619.017123] [<ffffffff81166aac>] kern_path+0x2a/0x4d
>> [50619.017973] [<ffffffff815203d8>] ? __alloc_skb+0x75/0x186
>> [50619.018832] [<ffffffff81520324>] ? __kmalloc_reserve.isra.42+0x2d/0x6c
>> [50619.019702] [<ffffffff815871a3>] unix_find_other+0x38/0x1b9
>> [50619.020568] [<ffffffff81589043>] unix_stream_connect+0x102/0x3ed
>> [50619.021429] [<ffffffff81518cbc>] ? __sock_create+0x168/0x1c0
>> [50619.022301] [<ffffffff815de7e3>] ? call_refreshresult+0x91/0x91
>> [50619.023170] [<ffffffff81516531>] kernel_connect+0x10/0x12
>> [50619.024047] [<ffffffff815e1d36>] xs_local_setup_socket+0x122/0x191
>> [50619.024945] [<ffffffff815e2f50>] xs_local_connect+0x2c/0x48
>> [50619.025849] [<ffffffff815e01f6>] xprt_connect+0x112/0x11b
>> [50619.026756] [<ffffffff815de81c>] call_connect+0x39/0x3b
>> [50619.027662] [<ffffffff815e4e68>] __rpc_execute+0xe8/0x2ca
>> [50619.028567] [<ffffffff815e5109>] rpc_execute+0x76/0x9d
>> [50619.029473] [<ffffffff815debd1>] rpc_run_task+0x78/0x80
>> [50619.030376] [<ffffffff815ded0f>] rpc_call_sync+0x88/0x9e
>> [50619.031270] [<ffffffff815ebd2f>] rpcb_register_call+0x1f/0x2e
>> [50619.032143] [<ffffffff815ec216>] rpcb_v4_register+0xb2/0x13c
>> [50619.033031] [<ffffffff8108addb>] ? call_timer_fn+0x15e/0x15e
>> [50619.033918] [<ffffffff815e7816>] svc_unregister.isra.11+0x5a/0xcb
>> [50619.034804] [<ffffffff815e789b>] svc_rpcb_cleanup+0x14/0x21
>> [50619.035706] [<ffffffff815e70ef>] svc_shutdown_net+0x2b/0x30
>> [50619.036586] [<ffffffff812471c5>] lockd_down_net+0x7f/0xa3
>> [50619.037465] [<ffffffff81247413>] lockd_down+0x30/0xb2
>> [50619.038346] [<ffffffff8124439f>] nlmclnt_done+0x1f/0x23
>> [50619.039227] [<ffffffff8120fd72>] ? nfs_start_lockd+0xc8/0xc8
>> [50619.040086] [<ffffffff8120fd89>] nfs_destroy_server+0x17/0x19
>> [50619.040962] [<ffffffff8121024b>] nfs_free_server+0xeb/0x15c
>> [50619.041947] [<ffffffff812172c3>] nfs_kill_super+0x1f/0x23
>> [50619.042824] [<ffffffff8115da33>] deactivate_locked_super+0x26/0x52
>> [50619.043696] [<ffffffff8115e73d>] deactivate_super+0x42/0x47
>> [50619.044562] [<ffffffff8117453e>] mntput_no_expire+0x135/0x13e
>> [50619.045424] [<ffffffff81174574>] mntput+0x2d/0x2f
>> [50619.046287] [<ffffffff8115cd78>] __fput+0x1c6/0x1e6
>> [50619.047111] [<ffffffff8115cda6>] ____fput+0xe/0x10
>> [50619.047943] [<ffffffff810998da>] task_work_run+0x7e/0x98
>> [50619.048764] [<ffffffff81082bc5>] do_exit+0x3cc/0x8fa
>> [50619.049580] [<ffffffff81174449>] ? mntput_no_expire+0x40/0x13e
>> [50619.050399] [<ffffffff8108ca8b>] ? __dequeue_signal+0x1a/0x118
>> [50619.051215] [<ffffffff8108328d>] do_group_exit+0x6f/0xa2
>> [50619.052000] [<ffffffff8108f0e7>] get_signal_to_deliver+0x4f2/0x530
>> [50619.052797] [<ffffffff81036a39>] do_signal+0x4d/0x4a4
>> [50619.053577] [<ffffffff810f2810>] ? call_rcu+0x17/0x19
>> [50619.054344] [<ffffffff81036ebc>] do_notify_resume+0x2c/0x6b
>> [50619.055084] [<ffffffff81614098>] int_signal+0x12/0x17
>> [50619.055852] Code: c7 c7 10 4a c3 81 e8 79 c4 f3 ff e8 99 3a f3 ff 48 83 7b 20 00 0f 85 8d 00 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 80 58 05 00 00 <8b> 50 08 f6 c2 01 74 04 f3 90 eb f4 48 8b 48 18 48 89 4b 20 48
>> [50619.057735] RIP [<ffffffff81165e76>] path_init+0x11c/0x36f
>> [50619.058586] RSP <ffff88041b19f508>
>> [50619.059429] CR2: 0000000000000008
[...]
>> Why is it done in essentially random process context, anyway? There's such thing
>> as chroot, after all, which would screw that sucker as hard as NULL ->fs, but in
>> a less visible way...
>
> Having not studied the problem, I can't offer up much of an idea on
> how to fix it at this point.

What mystifies me is why a v3 server shutdown and unregistration is
triggering a v4 registration. It's not really that surprising that a
registration that late in shutdown, after the disconnected fs has been
destroyed, would cause problems... (I have NFSv4 built in, but am not
using it yet.)

My .config seems useful at this juncture, if people on this list haven't
seen it:

CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_ARCH_HAS_CPU_AUTOPROBE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_INTEL_TXT=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11"
CONFIG_ARCH_CPU_PROBE_RELEASE=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_KERNEL_LZMA=y
CONFIG_DEFAULT_HOSTNAME="mutilate"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_FHANDLE=y
CONFIG_AUDIT=y
CONFIG_HAVE_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=8
CONFIG_RCU_FANOUT_LEAF=8
CONFIG_RCU_BOOST=y
CONFIG_RCU_BOOST_PRIO=1
CONFIG_RCU_BOOST_DELAY=500
CONFIG_LOG_BUF_SHIFT=18
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANTS_PROT_NUMA_PROT_NONE=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_NAMESPACES=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_UIDGID_CONVERTED=y
CONFIG_SCHED_AUTOGROUP=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE="usr/initramfs.mutilate"
CONFIG_INITRAMFS_ROOT_UID=99
CONFIG_INITRAMFS_ROOT_GID=101
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_INITRAMFS_COMPRESSION_LZMA=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HOTPLUG=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_UID16=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_PCI_QUIRKS=y
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_EVENTS=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
CONFIG_TRACEPOINTS=y
CONFIG_HAVE_OPROFILE=y
CONFIG_OPROFILE_NMI_TIMER=y
CONFIG_JUMP_LABEL=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y
CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_HAVE_CMPXCHG_LOCAL=y
CONFIG_HAVE_CMPXCHG_DOUBLE=y
CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y
CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y
CONFIG_HAVE_ARCH_SECCOMP_FILTER=y
CONFIG_SECCOMP_FILTER=y
CONFIG_HAVE_CONTEXT_TRACKING=y
CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_MODULES_USE_ELF_RELA=y
CONFIG_OLD_SIGSUSPEND3=y
CONFIG_COMPAT_OLD_SIGACTION=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_MSDOS_PARTITION=y
CONFIG_EFI_PARTITION=y
CONFIG_BLOCK_COMPAT=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=m
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_CFQ=y
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_UNINLINE_SPIN_UNLOCK=y
CONFIG_MUTEX_SPIN_ON_OWNER=y
CONFIG_FREEZER=y
CONFIG_ZONE_DMA=y
CONFIG_SMP=y
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_NO_BOOTMEM=y
CONFIG_MCORE2=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=6
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_P6_NOP=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
CONFIG_NR_CPUS=8
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_THRESHOLD=y
CONFIG_X86_THERMAL_VECTOR=y
CONFIG_MICROCODE=m
CONFIG_MICROCODE_INTEL=y
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_MICROCODE_INTEL_LIB=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_HAVE_MEMBLOCK_NODE_MAP=y
CONFIG_ARCH_DISCARD_MEMBLOCK=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_X86_CHECK_BIOS_CORRUPTION=y
CONFIG_X86_BOOTPARAM_MEMORY_CORRUPTION_CHECK=y
CONFIG_X86_RESERVE_LOW=64
CONFIG_MTRR=y
CONFIG_X86_PAT=y
CONFIG_ARCH_USES_PG_UNCACHED=y
CONFIG_ARCH_RANDOM=y
CONFIG_X86_SMAP=y
CONFIG_EFI=y
CONFIG_SECCOMP=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_PHYSICAL_START=0x1000000
CONFIG_PHYSICAL_ALIGN=0x1000000
CONFIG_HOTPLUG_CPU=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_HIBERNATION_HEADER=y
CONFIG_HIBERNATE_CALLBACKS=y
CONFIG_HIBERNATION=y
CONFIG_PM_STD_PARTITION=""
CONFIG_TOI_CORE=y
CONFIG_TOI_SWAP=y
CONFIG_TOI_CRYPTO=y
CONFIG_TOI_USERUI=y
CONFIG_TOI_USERUI_DEFAULT_PATH="/usr/sbin/tuxoniceui_text"
CONFIG_TOI_DEFAULT_IMAGE_SIZE_LIMIT=-2
CONFIG_TOI_REPLACE_SWSUSP=y
CONFIG_TOI_IGNORE_LATE_INITCALL=y
CONFIG_TOI_DEFAULT_WAIT=-1
CONFIG_TOI_DEFAULT_EXTRA_PAGES_ALLOWANCE=50000
CONFIG_TOI_CHECKSUM=y
CONFIG_TOI=y
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_RUNTIME=y
CONFIG_PM=y
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_I2C=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_CUSTOM_DSDT_FILE=""
CONFIG_ACPI_BLACKLIST_YEAR=0
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HED=y
CONFIG_ACPI_APEI=y
CONFIG_ACPI_APEI_GHES=y
CONFIG_ACPI_APEI_PCIEAER=y
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_X86_INTEL_PSTATE=y
CONFIG_X86_ACPI_CPUFREQ=y
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y
CONFIG_INTEL_IDLE=y
CONFIG_I7300_IDLE_IOAT_CHANNEL=y
CONFIG_I7300_IDLE=y
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y
CONFIG_PCIEASPM=y
CONFIG_PCIEASPM_DEFAULT=y
CONFIG_PCIE_PME=y
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
CONFIG_HT_IRQ=y
CONFIG_PCI_ATS=y
CONFIG_PCI_IOV=y
CONFIG_PCI_IOAPIC=y
CONFIG_PCI_LABEL=y
CONFIG_ISA_DMA_API=y
CONFIG_AMD_NB=y
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_BINFMT_MISC=y
CONFIG_COREDUMP=y
CONFIG_IA32_EMULATION=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_KEYS_COMPAT=y
CONFIG_HAVE_TEXT_POKE_SMP=y
CONFIG_X86_DEV_DMA_OPS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=y
CONFIG_UNIX=y
CONFIG_UNIX_DIAG=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_INET_LRO=y
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
CONFIG_INET_UDP_DIAG=y
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_IPV6=y
CONFIG_IPV6_PRIVACY=y
CONFIG_HAVE_NET_DSA=y
CONFIG_DNS_RESOLVER=y
CONFIG_NETLINK_DIAG=y
CONFIG_RPS=y
CONFIG_RFS_ACCEL=y
CONFIG_XPS=y
CONFIG_BQL=y
CONFIG_BPF_JIT=y
CONFIG_BT=y
CONFIG_BT_RFCOMM=y
CONFIG_BT_HCIBTUSB=y
CONFIG_HAVE_BPF_JIT=y
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE="radeon/BARTS_mc.bin radeon/BARTS_me.bin radeon/BARTS_pfp.bin radeon/SUMO_uvd.bin radeon/BTC_rlc.bin rtl_nic/rtl8168f-1.fw rtl_nic/rtl8168f-2.fw"
CONFIG_EXTRA_FIRMWARE_DIR="/usr/src/linux/linux-firmware"
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_DMA_SHARED_BUFFER=y
CONFIG_PNP=y
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_CDROM_PKTCDVD=y
CONFIG_CDROM_PKTCDVD_BUFFERS=16
CONFIG_HAVE_IDE=y
CONFIG_SCSI_MOD=y
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
CONFIG_BLK_DEV_SD=y
CONFIG_BLK_DEV_SR=y
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_SCAN_ASYNC=y
CONFIG_ATA=y
CONFIG_ATA_VERBOSE_ERROR=y
CONFIG_ATA_ACPI=y
CONFIG_SATA_ZPODD=y
CONFIG_SATA_AHCI=y
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_RAID1=y
CONFIG_BLK_DEV_DM=y
CONFIG_DM_SNAPSHOT=y
CONFIG_DM_MIRROR=y
CONFIG_DM_ZERO=y
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
CONFIG_DUMMY=m
CONFIG_MII=y
CONFIG_NETCONSOLE=y
CONFIG_NETCONSOLE_DYNAMIC=y
CONFIG_NETPOLL=y
CONFIG_NET_POLL_CONTROLLER=y
CONFIG_TUN=y
CONFIG_VHOST_NET=y
CONFIG_VHOST_RING=y
CONFIG_ETHERNET=y
CONFIG_NET_VENDOR_REALTEK=y
CONFIG_R8169=y
CONFIG_USB_USBNET=m
CONFIG_USB_NET_CDCETHER=m
CONFIG_INPUT=y
CONFIG_INPUT_POLLDEV=y
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1680
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=1050
CONFIG_INPUT_JOYDEV=y
CONFIG_INPUT_EVDEV=y
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_CYPRESS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_LIBPS2=y
CONFIG_TTY=y
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_VT_CONSOLE_SLEEP=y
CONFIG_HW_CONSOLE=y
CONFIG_UNIX98_PTYS=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_CORE=y
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_INTEL=y
CONFIG_NVRAM=m
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
CONFIG_DEVPORT=y
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=y
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=y
CONFIG_I2C_I801=y
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
CONFIG_GPIO_DEVRES=y
CONFIG_POWER_SUPPLY=y
CONFIG_HWMON=y
CONFIG_HWMON_VID=m
CONFIG_SENSORS_CORETEMP=y
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_W83627EHF=m
CONFIG_THERMAL=y
CONFIG_THERMAL_HWMON=y
CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y
CONFIG_THERMAL_GOV_STEP_WISE=y
CONFIG_SSB_POSSIBLE=y
CONFIG_BCMA_POSSIBLE=y
CONFIG_MFD_CORE=y
CONFIG_LPC_ICH=y
CONFIG_MEDIA_SUPPORT=y
CONFIG_MEDIA_CAMERA_SUPPORT=y
CONFIG_VIDEO_DEV=y
CONFIG_VIDEO_V4L2=y
CONFIG_VIDEOBUF2_CORE=y
CONFIG_VIDEOBUF2_MEMOPS=y
CONFIG_VIDEOBUF2_VMALLOC=y
CONFIG_MEDIA_USB_SUPPORT=y
CONFIG_USB_VIDEO_CLASS=y
CONFIG_USB_VIDEO_CLASS_INPUT_EVDEV=y
CONFIG_MEDIA_SUBDRV_AUTOSELECT=y
CONFIG_AGP=y
CONFIG_VGA_ARB=y
CONFIG_VGA_ARB_MAX_GPUS=2
CONFIG_DRM=y
CONFIG_DRM_KMS_HELPER=y
CONFIG_DRM_TTM=y
CONFIG_DRM_RADEON=y
CONFIG_HDMI=y
CONFIG_FB=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=512
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY=y
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_SOUND=y
CONFIG_SOUND_OSS_CORE=y
CONFIG_SND=y
CONFIG_SND_TIMER=y
CONFIG_SND_PCM=y
CONFIG_SND_JACK=y
CONFIG_SND_SEQUENCER=y
CONFIG_SND_SEQ_DUMMY=m
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=y
CONFIG_SND_PCM_OSS=y
CONFIG_SND_PCM_OSS_PLUGINS=y
CONFIG_SND_SEQUENCER_OSS=y
CONFIG_SND_HRTIMER=y
CONFIG_SND_SEQ_HRTIMER_DEFAULT=y
CONFIG_SND_DYNAMIC_MINORS=y
CONFIG_SND_VERBOSE_PROCFS=y
CONFIG_SND_VMASTER=y
CONFIG_SND_KCTL_JACK=y
CONFIG_SND_DMA_SGBUF=y
CONFIG_SND_PCI=y
CONFIG_SND_HDA_INTEL=y
CONFIG_SND_HDA_PREALLOC_SIZE=2048
CONFIG_SND_HDA_INPUT_JACK=y
CONFIG_SND_HDA_GENERIC=y
CONFIG_SND_HDA_POWER_SAVE_DEFAULT=0
CONFIG_HID=y
CONFIG_HID_GENERIC=y
CONFIG_HID_A4TECH=y
CONFIG_HID_APPLE=y
CONFIG_HID_BELKIN=y
CONFIG_HID_CHERRY=y
CONFIG_HID_CHICONY=y
CONFIG_HID_CYPRESS=y
CONFIG_HID_EZKEY=y
CONFIG_HID_KENSINGTON=y
CONFIG_HID_LOGITECH=y
CONFIG_HID_MICROSOFT=y
CONFIG_HID_MONTEREY=y
CONFIG_USB_HID=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB_ARCH_HAS_XHCI=y
CONFIG_USB_SUPPORT=y
CONFIG_USB_COMMON=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB=y
CONFIG_USB_DEFAULT_PERSIST=y
CONFIG_USB_DYNAMIC_MINORS=y
CONFIG_USB_XHCI_HCD=y
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_PCI=y
CONFIG_USB_STORAGE=y
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_RTC_LIB=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_HCTOSYS=y
CONFIG_RTC_SYSTOHC=y
CONFIG_RTC_HCTOSYS_DEVICE="rtc0"
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
CONFIG_RTC_DRV_CMOS=y
CONFIG_CLKEVT_I8253=y
CONFIG_I8253_LOCK=y
CONFIG_CLKBLD_I8253=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y
CONFIG_DMAR_TABLE=y
CONFIG_INTEL_IOMMU=y
CONFIG_INTEL_IOMMU_DEFAULT_ON=y
CONFIG_INTEL_IOMMU_FLOPPY_WA=y
CONFIG_IRQ_REMAP=y
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_DMIID=y
CONFIG_EFI_VARS=y
CONFIG_EFI_VARS_PSTORE=y
CONFIG_DCACHE_WORD_ACCESS=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT23=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_JBD2=y
CONFIG_FS_MBCACHE=y
CONFIG_FS_POSIX_ACL=y
CONFIG_EXPORTFS=y
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_FANOTIFY=y
CONFIG_QUOTA=y
CONFIG_QUOTA_NETLINK_INTERFACE=y
CONFIG_PRINT_QUOTA_WARNING=y
CONFIG_QUOTA_TREE=y
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
CONFIG_QUOTACTL_COMPAT=y
CONFIG_FUSE_FS=y
CONFIG_CUSE=y
CONFIG_GENERIC_ACL=y
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
CONFIG_PROC_FS=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=y
CONFIG_MISC_FILESYSTEMS=y
CONFIG_PSTORE=y
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_NFSD=y
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_ASCII=m
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_UTF8=m
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_PRINTK_TIME=y
CONFIG_DEFAULT_MESSAGE_LOGLEVEL=4
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=1024
CONFIG_STRIP_ASM_SYMS=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_LOCKUP_DETECTOR=y
CONFIG_HARDLOCKUP_DETECTOR=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC_VALUE=0
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_PANIC_ON_OOPS_VALUE=0
CONFIG_DETECT_HUNG_TASK=y
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=120
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
CONFIG_TIMER_STATS=y
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_STACKTRACE=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_CPU_STALL_VERBOSE=y
CONFIG_LATENCYTOP=y
CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS=y
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_FENTRY=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_TRACE_CLOCK=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_BRANCH_PROFILE_NONE=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_HAVE_ARCH_KMEMCHECK=y
CONFIG_STRICT_DEVMEM=y
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
CONFIG_DEBUG_RODATA=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
CONFIG_DEFAULT_IO_DELAY_TYPE=0
CONFIG_KEYS=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_PATH=y
CONFIG_SECURITY_APPARMOR=y
CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE=0
CONFIG_DEFAULT_SECURITY_DAC=y
CONFIG_DEFAULT_SECURITY=""
CONFIG_CRYPTO=y
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP2=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=y
CONFIG_CRYPTO_WORKQUEUE=y
CONFIG_CRYPTO_CRYPTD=y
CONFIG_CRYPTO_ABLK_HELPER_X86=y
CONFIG_CRYPTO_GLUE_HELPER_X86=y
CONFIG_CRYPTO_CBC=y
CONFIG_CRYPTO_ECB=y
CONFIG_CRYPTO_LRW=y
CONFIG_CRYPTO_XTS=y
CONFIG_CRYPTO_CRC32C=y
CONFIG_CRYPTO_CRC32C_INTEL=y
CONFIG_CRYPTO_CRC32_PCLMUL=y
CONFIG_CRYPTO_MD4=y
CONFIG_CRYPTO_SHA256=y
CONFIG_CRYPTO_AES=y
CONFIG_CRYPTO_SERPENT=y
CONFIG_CRYPTO_SERPENT_AVX_X86_64=y
CONFIG_CRYPTO_TWOFISH_COMMON=y
CONFIG_CRYPTO_TWOFISH_X86_64=y
CONFIG_CRYPTO_TWOFISH_X86_64_3WAY=y
CONFIG_CRYPTO_TWOFISH_AVX_X86_64=y
CONFIG_CRYPTO_LZO=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=y
CONFIG_KVM_INTEL=y
CONFIG_KVM_DEVICE_ASSIGNMENT=y
CONFIG_BINARY_PRINTF=y
CONFIG_BITREVERSE=y
CONFIG_GENERIC_STRNCPY_FROM_USER=y
CONFIG_GENERIC_STRNLEN_USER=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_PCI_IOMAP=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_IO=y
CONFIG_CRC16=y
CONFIG_CRC_ITU_T=y
CONFIG_CRC32=y
CONFIG_CRC32_SLICEBY8=y
CONFIG_ZLIB_INFLATE=y
CONFIG_LZO_COMPRESS=y
CONFIG_LZO_DECOMPRESS=y
CONFIG_XZ_DEC=y
CONFIG_XZ_DEC_X86=y
CONFIG_XZ_DEC_POWERPC=y
CONFIG_XZ_DEC_IA64=y
CONFIG_XZ_DEC_ARM=y
CONFIG_XZ_DEC_ARMTHUMB=y
CONFIG_XZ_DEC_SPARC=y
CONFIG_XZ_DEC_BCJ=y
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_DECOMPRESS_XZ=y
CONFIG_DECOMPRESS_LZO=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_CHECK_SIGNATURE=y
CONFIG_CPU_RMAP=y
CONFIG_DQL=y
CONFIG_NLATTR=y
CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y
CONFIG_OID_REGISTRY=y
CONFIG_UCS2_STRING=y

2013-08-05 15:04:36

by Jeff Layton

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Mon, 05 Aug 2013 15:48:01 +0100
Nix <[email protected]> wrote:

> On 5 Aug 2013, Jeff Layton stated:
>
> > On Sun, 04 Aug 2013 16:40:58 +0100
> > Nix <[email protected]> wrote:
> >
> >> I just got this panic on 3.10.4, in the middle of a large parallel
> >> compilation (of Chromium, as it happens) over NFSv3:
> >>
> >> [16364.527516] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> >> [16364.527571] IP: [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> >> [16364.527611] PGD 0
> >> [16364.527626] Oops: 0000 [#1] PREEMPT SMP
> >> [16364.527656] Modules linked in: [last unloaded: microcode]
> >> [16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted 3.10.4-05315-gf4ce424-dirty #1
> >> [16364.527730] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
> >> [16364.527775] task: ffff88041a97ad60 ti: ffff8803501d4000 task.ti: ffff8803501d4000
> >> [16364.527813] RIP: 0010:[<ffffffff81245157>] [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> >> [16364.527860] RSP: 0018:ffff8803501d5c58 EFLAGS: 00010282
> >> [16364.527889] RAX: ffff88041a97ad60 RBX: ffff8803e49c8800 RCX: 0000000000000000
> >> [16364.527926] RDX: 0000000000000000 RSI: 000000000000004a RDI: ffff8803e49c8b54
> >> [16364.527962] RBP: ffff8803501d5c68 R08: 0000000000015720 R09: 0000000000000000
> >> [16364.527998] R10: 00007ffffffff000 R11: ffff8803501d5d58 R12: ffff8803501d5d58
> >> [16364.528034] R13: ffff88041bd2bc00 R14: 0000000000000000 R15: ffff8803fc9e2900
> >> [16364.528070] FS: 0000000000000000(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
> >> [16364.528111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [16364.528142] CR2: 0000000000000008 CR3: 0000000001c0b000 CR4: 00000000001407f0
> >> [16364.528177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> [16364.528214] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> [16364.528303] Stack:
> >> [16364.528316] ffff8803501d5d58 ffff8803e49c8800 ffff8803501d5cd8 ffffffff81245418
> >> [16364.528369] 0000000000000000 ffff8803516f0bc0 ffff8803d7b7b6c0 ffffffff81215c81
> >> [16364.528418] ffff880300000007 ffff88041bd2bdc8 ffff8801aabe9650 ffff8803fc9e2900
> >> [16364.528467] Call Trace:
> >> [16364.528485] [<ffffffff81245418>] nlmclnt_proc+0x148/0x5fb
> >> [16364.528516] [<ffffffff81215c81>] ? nfs_put_lock_context+0x69/0x6e
> >> [16364.528550] [<ffffffff812209a2>] nfs3_proc_lock+0x21/0x23
> >> [16364.528581] [<ffffffff812149dd>] do_unlk+0x96/0xb2
> >> [16364.528608] [<ffffffff81214b41>] nfs_flock+0x5a/0x71
> >> [16364.528637] [<ffffffff8119a747>] locks_remove_flock+0x9e/0x113
> >> [16364.528668] [<ffffffff8115cc68>] __fput+0xb6/0x1e6
> >> [16364.528695] [<ffffffff8115cda6>] ____fput+0xe/0x10
> >> [16364.528724] [<ffffffff810998da>] task_work_run+0x7e/0x98
> >> [16364.528754] [<ffffffff81082bc5>] do_exit+0x3cc/0x8fa
> >> [16364.528782] [<ffffffff81083501>] ? SyS_wait4+0xa5/0xc2
> >> [16364.528811] [<ffffffff8108328d>] do_group_exit+0x6f/0xa2
> >> [16364.528843] [<ffffffff810832d7>] SyS_exit_group+0x17/0x17
> >> [16364.528876] [<ffffffff81613e92>] system_call_fastpath+0x16/0x1b
> >> [16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 68 05 00 00 <48> 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 38 48 8b
> >> [16364.529176] RIP [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> >> [16364.529264] RSP <ffff8803501d5c58>
> >> [16364.529283] CR2: 0000000000000008
> >> [16364.539039] ---[ end trace 5a73fddf23441377 ]---
> >
> > What might be most helpful is to figure out exactly where the above
> > panic occurred.
>
> OK. My kernel is non-modular and built without debugging information:
> rebuilt with debugging info, and got this:
>
> 0xffffffff81245418 is in nlmclnt_proc (fs/lockd/clntproc.c:172).
> 167 return -ENOMEM;
> 168 }
> 169 /* Set up the argument struct */
> 170 nlmclnt_setlockargs(call, fl);
> 171
> 172 if (IS_SETLK(cmd) || IS_SETLKW(cmd)) {
> 173 if (fl->fl_type != F_UNLCK) {
> 174 call->a_args.block = IS_SETLKW(cmd) ? 1 : 0;
> 175 status = nlmclnt_lock(call, fl);
> 176 } else
>
> That's decimal 328:
>
> 0xffffffff81245413 <+323>: callq 0xffffffff81245102 <nlmclnt_setlockargs>
> 0xffffffff81245418 <+328>: mov -0x40(%rbp),%eax
> 0xffffffff8124541b <+331>: sub $0x6,%eax
> 0xffffffff8124541e <+334>: cmp $0x1,%eax
>
> nlm_alloc_call() cannot fail (we have a NULL check right there), and fl
> also cannot be NULL because it's dereferenced in nfs_flock(), up the
> call chain from where we are.
>
> Time to stick some printk()s in, I susupect. (Not sure how to keep them
> from utterly flooding the log, though.)
>
> >> This is the same machine on which this panic has been occurring on
> >> shutdown since 3.9.x: Al Viro has previously pointed out the problem and
> >> nothing has happened:
> >>
> >> [50618.993226] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> >> [50618.993904] IP: [<ffffffff81165e76>] path_init+0x11c/0x36f
> >> [50618.994609] PGD 0
> >> [50618.995329] Oops: 0000 [#1] PREEMPT SMP
> >> [50618.996027] Modules linked in: [last unloaded: microcode]
> >> [50618.996758] CPU: 3 PID: 1262 Comm: pulseaudio Not tainted 3.10.4-05315-gf4ce424-dirty #1
> >> [50618.997506] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
> >> [50618.998268] task: ffff88041bf1ad60 ti: ffff88041b19e000 task.ti: ffff88041b19e000
> >> [50618.999017] RIP: 0010:[<ffffffff81165e76>] [<ffffffff81165e76>] path_init+0x11c/0x36f
> >> [50618.999804] RSP: 0018:ffff88041b19f508 EFLAGS: 00010246
> >> [50619.000592] RAX: 0000000000000000 RBX: ffff88041b19f658 RCX: 000000000000005c
> >> [50619.001398] RDX: 0000000000005c5c RSI: ffff880419b3781a RDI: ffffffff81c34a10
> >> [50619.002198] RBP: ffff88041b19f558 R08: ffff88041b19f588 R09: ffff88041b19f7c4
> >> [50619.002999] R10: 00000000ffffff9c R11: ffff88041b19f658 R12: 0000000000000041
> >> [50619.003816] R13: 0000000000000040 R14: ffff880419b3781a R15: ffff88041b19f7c4
> >> [50619.004638] FS: 00007fca19bc2740(0000) GS:ffff88042fac0000(0000) knlGS:0000000000000000
> >> [50619.005465] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [50619.006284] CR2: 0000000000000008 CR3: 0000000001c0b000 CR4: 00000000001407e0
> >> [50619.007092] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> [50619.007922] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> [50619.008750] Stack: [50619.009576] ffff88041b19f518 00000000ffbfaa5e 0000000000000000 ffffffff8151e735
> >> [50619.010437] ffffc900080ae000 ffff88041b19f658 0000000000000041 ffff880419b3781a
> >> [50619.011292] ffff88041b19f628 ffff88041b19f7c4 ffff88041b19f5e8 ffffffff811660fc
> >> [50619.012119] Call Trace:
> >> [50619.012947] [<ffffffff8151e735>] ? skb_checksum+0x4f/0x25b
> >> [50619.013782] [<ffffffff811660fc>] path_lookupat+0x33/0x6c5
> >> [50619.014618] [<ffffffff8152c623>] ? dev_hard_start_xmit+0x2e5/0x50b
> >> [50619.015457] [<ffffffff811667b4>] filename_lookup.isra.27+0x26/0x5c
> >> [50619.016298] [<ffffffff8116687e>] do_path_lookup+0x33/0x35
> >> [50619.017123] [<ffffffff81166aac>] kern_path+0x2a/0x4d
> >> [50619.017973] [<ffffffff815203d8>] ? __alloc_skb+0x75/0x186
> >> [50619.018832] [<ffffffff81520324>] ? __kmalloc_reserve.isra.42+0x2d/0x6c
> >> [50619.019702] [<ffffffff815871a3>] unix_find_other+0x38/0x1b9
> >> [50619.020568] [<ffffffff81589043>] unix_stream_connect+0x102/0x3ed
> >> [50619.021429] [<ffffffff81518cbc>] ? __sock_create+0x168/0x1c0
> >> [50619.022301] [<ffffffff815de7e3>] ? call_refreshresult+0x91/0x91
> >> [50619.023170] [<ffffffff81516531>] kernel_connect+0x10/0x12
> >> [50619.024047] [<ffffffff815e1d36>] xs_local_setup_socket+0x122/0x191
> >> [50619.024945] [<ffffffff815e2f50>] xs_local_connect+0x2c/0x48
> >> [50619.025849] [<ffffffff815e01f6>] xprt_connect+0x112/0x11b
> >> [50619.026756] [<ffffffff815de81c>] call_connect+0x39/0x3b
> >> [50619.027662] [<ffffffff815e4e68>] __rpc_execute+0xe8/0x2ca
> >> [50619.028567] [<ffffffff815e5109>] rpc_execute+0x76/0x9d
> >> [50619.029473] [<ffffffff815debd1>] rpc_run_task+0x78/0x80
> >> [50619.030376] [<ffffffff815ded0f>] rpc_call_sync+0x88/0x9e
> >> [50619.031270] [<ffffffff815ebd2f>] rpcb_register_call+0x1f/0x2e
> >> [50619.032143] [<ffffffff815ec216>] rpcb_v4_register+0xb2/0x13c
> >> [50619.033031] [<ffffffff8108addb>] ? call_timer_fn+0x15e/0x15e
> >> [50619.033918] [<ffffffff815e7816>] svc_unregister.isra.11+0x5a/0xcb
> >> [50619.034804] [<ffffffff815e789b>] svc_rpcb_cleanup+0x14/0x21
> >> [50619.035706] [<ffffffff815e70ef>] svc_shutdown_net+0x2b/0x30
> >> [50619.036586] [<ffffffff812471c5>] lockd_down_net+0x7f/0xa3
> >> [50619.037465] [<ffffffff81247413>] lockd_down+0x30/0xb2
> >> [50619.038346] [<ffffffff8124439f>] nlmclnt_done+0x1f/0x23
> >> [50619.039227] [<ffffffff8120fd72>] ? nfs_start_lockd+0xc8/0xc8
> >> [50619.040086] [<ffffffff8120fd89>] nfs_destroy_server+0x17/0x19
> >> [50619.040962] [<ffffffff8121024b>] nfs_free_server+0xeb/0x15c
> >> [50619.041947] [<ffffffff812172c3>] nfs_kill_super+0x1f/0x23
> >> [50619.042824] [<ffffffff8115da33>] deactivate_locked_super+0x26/0x52
> >> [50619.043696] [<ffffffff8115e73d>] deactivate_super+0x42/0x47
> >> [50619.044562] [<ffffffff8117453e>] mntput_no_expire+0x135/0x13e
> >> [50619.045424] [<ffffffff81174574>] mntput+0x2d/0x2f
> >> [50619.046287] [<ffffffff8115cd78>] __fput+0x1c6/0x1e6
> >> [50619.047111] [<ffffffff8115cda6>] ____fput+0xe/0x10
> >> [50619.047943] [<ffffffff810998da>] task_work_run+0x7e/0x98
> >> [50619.048764] [<ffffffff81082bc5>] do_exit+0x3cc/0x8fa
> >> [50619.049580] [<ffffffff81174449>] ? mntput_no_expire+0x40/0x13e
> >> [50619.050399] [<ffffffff8108ca8b>] ? __dequeue_signal+0x1a/0x118
> >> [50619.051215] [<ffffffff8108328d>] do_group_exit+0x6f/0xa2
> >> [50619.052000] [<ffffffff8108f0e7>] get_signal_to_deliver+0x4f2/0x530
> >> [50619.052797] [<ffffffff81036a39>] do_signal+0x4d/0x4a4
> >> [50619.053577] [<ffffffff810f2810>] ? call_rcu+0x17/0x19
> >> [50619.054344] [<ffffffff81036ebc>] do_notify_resume+0x2c/0x6b
> >> [50619.055084] [<ffffffff81614098>] int_signal+0x12/0x17
> >> [50619.055852] Code: c7 c7 10 4a c3 81 e8 79 c4 f3 ff e8 99 3a f3 ff 48 83 7b 20 00 0f 85 8d 00 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 80 58 05 00 00 <8b> 50 08 f6 c2 01 74 04 f3 90 eb f4 48 8b 48 18 48 89 4b 20 48
> >> [50619.057735] RIP [<ffffffff81165e76>] path_init+0x11c/0x36f
> >> [50619.058586] RSP <ffff88041b19f508>
> >> [50619.059429] CR2: 0000000000000008
> [...]
> >> Why is it done in essentially random process context, anyway? There's such thing
> >> as chroot, after all, which would screw that sucker as hard as NULL ->fs, but in
> >> a less visible way...
> >
> > Having not studied the problem, I can't offer up much of an idea on
> > how to fix it at this point.
>
> What mystifies me is why a v3 server shutdown and unregistration is
> triggering a v4 registration. It's not really that surprising that a
> registration that late in shutdown, after the disconnected fs has been
> destroyed, would cause problems... (I have NFSv4 built in, but am not
> using it yet.)
>

It's not. This is an rpcbind v4 (de)registration -- nothing to do with
NFSv4:

[50619.032143] [<ffffffff815ec216>] rpcb_v4_register+0xb2/0x13c

What's happening here is that we need to take down lockd on the last
reference to the lazily-umounted NFS filesystem. That means that we
need to upcall to rpcbind to tell it to remove the port registration.
That requires opening the unix socket that rpcbind listens on, which
involves walking a path. But, current->fs has already been torn down
and set to NULL at this point since we're doing this at delayed fput
time.

David Howells mentioned to me on IRC that the bug is really in
do_exit() and that we ought to be calling exit_task_work() before
calling exit_fs() (and maybe before exit_task_namespaces() too). I
don't have enough of a feel for the delayed fput code to know whether
that's the best fix, but it sounds plausible to me.

At the same time though, we probably ought to be doing this pathwalk
from some root that is guaranteed to be able to reach that pathname.
It's certainly possible that the socket might not be reachable to the
last task that's accessing the NFS mount...

--
Jeff Layton <[email protected]>

2013-08-05 15:11:14

by Jeff Layton

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Mon, 5 Aug 2013 11:04:27 -0400
Jeff Layton <[email protected]> wrote:

> On Mon, 05 Aug 2013 15:48:01 +0100
> Nix <[email protected]> wrote:
>
> > On 5 Aug 2013, Jeff Layton stated:
> >
> > > On Sun, 04 Aug 2013 16:40:58 +0100
> > > Nix <[email protected]> wrote:
> > >
> > >> I just got this panic on 3.10.4, in the middle of a large parallel
> > >> compilation (of Chromium, as it happens) over NFSv3:
> > >>
> > >> [16364.527516] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > >> [16364.527571] IP: [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> > >> [16364.527611] PGD 0
> > >> [16364.527626] Oops: 0000 [#1] PREEMPT SMP
> > >> [16364.527656] Modules linked in: [last unloaded: microcode]
> > >> [16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted 3.10.4-05315-gf4ce424-dirty #1
> > >> [16364.527730] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
> > >> [16364.527775] task: ffff88041a97ad60 ti: ffff8803501d4000 task.ti: ffff8803501d4000
> > >> [16364.527813] RIP: 0010:[<ffffffff81245157>] [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> > >> [16364.527860] RSP: 0018:ffff8803501d5c58 EFLAGS: 00010282
> > >> [16364.527889] RAX: ffff88041a97ad60 RBX: ffff8803e49c8800 RCX: 0000000000000000
> > >> [16364.527926] RDX: 0000000000000000 RSI: 000000000000004a RDI: ffff8803e49c8b54
> > >> [16364.527962] RBP: ffff8803501d5c68 R08: 0000000000015720 R09: 0000000000000000
> > >> [16364.527998] R10: 00007ffffffff000 R11: ffff8803501d5d58 R12: ffff8803501d5d58
> > >> [16364.528034] R13: ffff88041bd2bc00 R14: 0000000000000000 R15: ffff8803fc9e2900
> > >> [16364.528070] FS: 0000000000000000(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
> > >> [16364.528111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [16364.528142] CR2: 0000000000000008 CR3: 0000000001c0b000 CR4: 00000000001407f0
> > >> [16364.528177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >> [16364.528214] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > >> [16364.528303] Stack:
> > >> [16364.528316] ffff8803501d5d58 ffff8803e49c8800 ffff8803501d5cd8 ffffffff81245418
> > >> [16364.528369] 0000000000000000 ffff8803516f0bc0 ffff8803d7b7b6c0 ffffffff81215c81
> > >> [16364.528418] ffff880300000007 ffff88041bd2bdc8 ffff8801aabe9650 ffff8803fc9e2900
> > >> [16364.528467] Call Trace:
> > >> [16364.528485] [<ffffffff81245418>] nlmclnt_proc+0x148/0x5fb
> > >> [16364.528516] [<ffffffff81215c81>] ? nfs_put_lock_context+0x69/0x6e
> > >> [16364.528550] [<ffffffff812209a2>] nfs3_proc_lock+0x21/0x23
> > >> [16364.528581] [<ffffffff812149dd>] do_unlk+0x96/0xb2
> > >> [16364.528608] [<ffffffff81214b41>] nfs_flock+0x5a/0x71
> > >> [16364.528637] [<ffffffff8119a747>] locks_remove_flock+0x9e/0x113
> > >> [16364.528668] [<ffffffff8115cc68>] __fput+0xb6/0x1e6
> > >> [16364.528695] [<ffffffff8115cda6>] ____fput+0xe/0x10
> > >> [16364.528724] [<ffffffff810998da>] task_work_run+0x7e/0x98
> > >> [16364.528754] [<ffffffff81082bc5>] do_exit+0x3cc/0x8fa
> > >> [16364.528782] [<ffffffff81083501>] ? SyS_wait4+0xa5/0xc2
> > >> [16364.528811] [<ffffffff8108328d>] do_group_exit+0x6f/0xa2
> > >> [16364.528843] [<ffffffff810832d7>] SyS_exit_group+0x17/0x17
> > >> [16364.528876] [<ffffffff81613e92>] system_call_fastpath+0x16/0x1b
> > >> [16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 68 05 00 00 <48> 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 38 48 8b
> > >> [16364.529176] RIP [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> > >> [16364.529264] RSP <ffff8803501d5c58>
> > >> [16364.529283] CR2: 0000000000000008
> > >> [16364.539039] ---[ end trace 5a73fddf23441377 ]---
> > >
> > > What might be most helpful is to figure out exactly where the above
> > > panic occurred.
> >
> > OK. My kernel is non-modular and built without debugging information:
> > rebuilt with debugging info, and got this:
> >
> > 0xffffffff81245418 is in nlmclnt_proc (fs/lockd/clntproc.c:172).
> > 167 return -ENOMEM;
> > 168 }
> > 169 /* Set up the argument struct */
> > 170 nlmclnt_setlockargs(call, fl);
> > 171
> > 172 if (IS_SETLK(cmd) || IS_SETLKW(cmd)) {
> > 173 if (fl->fl_type != F_UNLCK) {
> > 174 call->a_args.block = IS_SETLKW(cmd) ? 1 : 0;
> > 175 status = nlmclnt_lock(call, fl);
> > 176 } else
> >
> > That's decimal 328:
> >
> > 0xffffffff81245413 <+323>: callq 0xffffffff81245102 <nlmclnt_setlockargs>
> > 0xffffffff81245418 <+328>: mov -0x40(%rbp),%eax
> > 0xffffffff8124541b <+331>: sub $0x6,%eax
> > 0xffffffff8124541e <+334>: cmp $0x1,%eax
> >
> > nlm_alloc_call() cannot fail (we have a NULL check right there), and fl
> > also cannot be NULL because it's dereferenced in nfs_flock(), up the
> > call chain from where we are.
> >
> > Time to stick some printk()s in, I susupect. (Not sure how to keep them
> > from utterly flooding the log, though.)
> >

The listing and disassembly from nlmclnt_proc is not terribly
interesting unfortunately. You really want to do the listing and
disassembly of the RIP at panic time (nlmclnt_setlockargs+0x55).

--
Jeff Layton <[email protected]>

2013-08-05 15:50:46

by Nix

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On 5 Aug 2013, Jeff Layton said:

> On Mon, 5 Aug 2013 11:04:27 -0400
> Jeff Layton <[email protected]> wrote:
>
>> On Mon, 05 Aug 2013 15:48:01 +0100
>> Nix <[email protected]> wrote:
>>
>> > On 5 Aug 2013, Jeff Layton stated:
>> >
>> > > On Sun, 04 Aug 2013 16:40:58 +0100
>> > > Nix <[email protected]> wrote:
>> > >
>> > >> I just got this panic on 3.10.4, in the middle of a large parallel
>> > >> compilation (of Chromium, as it happens) over NFSv3:
>> > >>
>> > >> [16364.527516] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> > >> [16364.527571] IP: [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
>> > >> [16364.527611] PGD 0
>> > >> [16364.527626] Oops: 0000 [#1] PREEMPT SMP
>> > >> [16364.527656] Modules linked in: [last unloaded: microcode]
>> > >> [16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted 3.10.4-05315-gf4ce424-dirty #1
>> > >> [16364.527730] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
>> > >> [16364.527775] task: ffff88041a97ad60 ti: ffff8803501d4000 task.ti: ffff8803501d4000
>> > >> [16364.527813] RIP: 0010:[<ffffffff81245157>] [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
>> > >> [16364.527860] RSP: 0018:ffff8803501d5c58 EFLAGS: 00010282
>> > >> [16364.527889] RAX: ffff88041a97ad60 RBX: ffff8803e49c8800 RCX: 0000000000000000
>> > >> [16364.527926] RDX: 0000000000000000 RSI: 000000000000004a RDI: ffff8803e49c8b54
>> > >> [16364.527962] RBP: ffff8803501d5c68 R08: 0000000000015720 R09: 0000000000000000
>> > >> [16364.527998] R10: 00007ffffffff000 R11: ffff8803501d5d58 R12: ffff8803501d5d58
>> > >> [16364.528034] R13: ffff88041bd2bc00 R14: 0000000000000000 R15: ffff8803fc9e2900
>> > >> [16364.528070] FS: 0000000000000000(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
>> > >> [16364.528111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > >> [16364.528142] CR2: 0000000000000008 CR3: 0000000001c0b000 CR4: 00000000001407f0
>> > >> [16364.528177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> > >> [16364.528214] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> > >> [16364.528303] Stack:
>> > >> [16364.528316] ffff8803501d5d58 ffff8803e49c8800 ffff8803501d5cd8 ffffffff81245418
>> > >> [16364.528369] 0000000000000000 ffff8803516f0bc0 ffff8803d7b7b6c0 ffffffff81215c81
>> > >> [16364.528418] ffff880300000007 ffff88041bd2bdc8 ffff8801aabe9650 ffff8803fc9e2900
>> > >> [16364.528467] Call Trace:
>> > >> [16364.528485] [<ffffffff81245418>] nlmclnt_proc+0x148/0x5fb
>> > >> [16364.528516] [<ffffffff81215c81>] ? nfs_put_lock_context+0x69/0x6e
>> > >> [16364.528550] [<ffffffff812209a2>] nfs3_proc_lock+0x21/0x23
>> > >> [16364.528581] [<ffffffff812149dd>] do_unlk+0x96/0xb2
>> > >> [16364.528608] [<ffffffff81214b41>] nfs_flock+0x5a/0x71
>> > >> [16364.528637] [<ffffffff8119a747>] locks_remove_flock+0x9e/0x113
>> > >> [16364.528668] [<ffffffff8115cc68>] __fput+0xb6/0x1e6
>> > >> [16364.528695] [<ffffffff8115cda6>] ____fput+0xe/0x10
>> > >> [16364.528724] [<ffffffff810998da>] task_work_run+0x7e/0x98
>> > >> [16364.528754] [<ffffffff81082bc5>] do_exit+0x3cc/0x8fa
>> > >> [16364.528782] [<ffffffff81083501>] ? SyS_wait4+0xa5/0xc2
>> > >> [16364.528811] [<ffffffff8108328d>] do_group_exit+0x6f/0xa2
>> > >> [16364.528843] [<ffffffff810832d7>] SyS_exit_group+0x17/0x17
>> > >> [16364.528876] [<ffffffff81613e92>] system_call_fastpath+0x16/0x1b
>> > >> [16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 68 05 00 00 <48> 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 38 48 8b
>> > >> [16364.529176] RIP [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
>> > >> [16364.529264] RSP <ffff8803501d5c58>
>> > >> [16364.529283] CR2: 0000000000000008
>> > >> [16364.539039] ---[ end trace 5a73fddf23441377 ]---
[...]
> The listing and disassembly from nlmclnt_proc is not terribly
> interesting unfortunately. You really want to do the listing and
> disassembly of the RIP at panic time (nlmclnt_setlockargs+0x55).

Oh, sorry! Wrong end of the oops :)

0xffffffff81245157 is in nlmclnt_setlockargs (fs/lockd/clntproc.c:131).
126 struct nlm_args *argp = &req->a_args;
127 struct nlm_lock *lock = &argp->lock;
128
129 nlmclnt_next_cookie(&argp->cookie);
130 memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
131 lock->caller = utsname()->nodename;
132 lock->oh.data = req->a_owner;
133 lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
134 (unsigned int)fl->fl_u.nfs_fl.owner->pid,
135 utsname()->nodename);

0xffffffff81245102 <+0>: callq 0xffffffff81613b00 <__fentry__>
0xffffffff81245107 <+5>: push %rbp
0xffffffff81245108 <+6>: mov %rsp,%rbp
0xffffffff8124510b <+9>: push %r12
0xffffffff8124510d <+11>: mov %rsi,%r12
0xffffffff81245110 <+14>: push %rbx
0xffffffff81245111 <+15>: mov %rdi,%rbx
0xffffffff81245114 <+18>: lea 0x10(%rdi),%rdi
0xffffffff81245118 <+22>: callq 0xffffffff812450df <nlmclnt_next_cookie>
0xffffffff8124511d <+27>: mov 0x60(%r12),%rdx
0xffffffff81245122 <+32>: lea 0x44(%rbx),%rax
0xffffffff81245126 <+36>: mov %rax,%rdi
0xffffffff81245129 <+39>: mov $0x82,%ecx
0xffffffff8124512e <+44>: mov %gs:0xb8c0,%rax
0xffffffff81245137 <+53>: mov 0x20(%rdx),%rsi
0xffffffff8124513b <+57>: sub $0x1c0,%rsi
0xffffffff81245142 <+64>: rep movsb %ds:(%rsi),%es:(%rdi)
0xffffffff81245144 <+66>: lea 0x354(%rbx),%rdi
0xffffffff8124514b <+73>: mov $0x4a,%esi
0xffffffff81245150 <+78>: mov 0x568(%rax),%rdx
-> 0xffffffff81245157 <+85>: mov 0x8(%rdx),%rdx
0xffffffff8124515b <+89>: mov %rdi,0xd0(%rbx)
0xffffffff81245162 <+96>: add $0x45,%rdx

(aside: wish GDB reported those offsets in hex, or the kernel reported
them in decimal. Every time I look at these I forget to convert and get
confused...)

I wonder if req is NULL (possible if the assignment to argp at the top
of the function's been pushed down by the optimizer). Time to stick a
printk() in and find out (after work is over so I can reboot this box
like billy-o).

--
NULL && (void)

2013-08-05 16:15:25

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Mon, 2013-08-05 at 16:50 +0100, Nix wrote:
> On 5 Aug 2013, Jeff Layton said:
>
> > On Mon, 5 Aug 2013 11:04:27 -0400
> > Jeff Layton <[email protected]> wrote:
> >
> >> On Mon, 05 Aug 2013 15:48:01 +0100
> >> Nix <[email protected]> wrote:
> >>
> >> > On 5 Aug 2013, Jeff Layton stated:
> >> >
> >> > > On Sun, 04 Aug 2013 16:40:58 +0100
> >> > > Nix <[email protected]> wrote:
> >> > >
> >> > >> I just got this panic on 3.10.4, in the middle of a large parallel
> >> > >> compilation (of Chromium, as it happens) over NFSv3:
> >> > >>
> >> > >> [16364.527516] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> >> > >> [16364.527571] IP: [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> >> > >> [16364.527611] PGD 0
> >> > >> [16364.527626] Oops: 0000 [#1] PREEMPT SMP
> >> > >> [16364.527656] Modules linked in: [last unloaded: microcode]
> >> > >> [16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted 3.10.4-05315-gf4ce424-dirty #1
> >> > >> [16364.527730] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
> >> > >> [16364.527775] task: ffff88041a97ad60 ti: ffff8803501d4000 task.ti: ffff8803501d4000
> >> > >> [16364.527813] RIP: 0010:[<ffffffff81245157>] [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> >> > >> [16364.527860] RSP: 0018:ffff8803501d5c58 EFLAGS: 00010282
> >> > >> [16364.527889] RAX: ffff88041a97ad60 RBX: ffff8803e49c8800 RCX: 0000000000000000
> >> > >> [16364.527926] RDX: 0000000000000000 RSI: 000000000000004a RDI: ffff8803e49c8b54
> >> > >> [16364.527962] RBP: ffff8803501d5c68 R08: 0000000000015720 R09: 0000000000000000
> >> > >> [16364.527998] R10: 00007ffffffff000 R11: ffff8803501d5d58 R12: ffff8803501d5d58
> >> > >> [16364.528034] R13: ffff88041bd2bc00 R14: 0000000000000000 R15: ffff8803fc9e2900
> >> > >> [16364.528070] FS: 0000000000000000(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
> >> > >> [16364.528111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > >> [16364.528142] CR2: 0000000000000008 CR3: 0000000001c0b000 CR4: 00000000001407f0
> >> > >> [16364.528177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> > >> [16364.528214] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> > >> [16364.528303] Stack:
> >> > >> [16364.528316] ffff8803501d5d58 ffff8803e49c8800 ffff8803501d5cd8 ffffffff81245418
> >> > >> [16364.528369] 0000000000000000 ffff8803516f0bc0 ffff8803d7b7b6c0 ffffffff81215c81
> >> > >> [16364.528418] ffff880300000007 ffff88041bd2bdc8 ffff8801aabe9650 ffff8803fc9e2900
> >> > >> [16364.528467] Call Trace:
> >> > >> [16364.528485] [<ffffffff81245418>] nlmclnt_proc+0x148/0x5fb
> >> > >> [16364.528516] [<ffffffff81215c81>] ? nfs_put_lock_context+0x69/0x6e
> >> > >> [16364.528550] [<ffffffff812209a2>] nfs3_proc_lock+0x21/0x23
> >> > >> [16364.528581] [<ffffffff812149dd>] do_unlk+0x96/0xb2
> >> > >> [16364.528608] [<ffffffff81214b41>] nfs_flock+0x5a/0x71
> >> > >> [16364.528637] [<ffffffff8119a747>] locks_remove_flock+0x9e/0x113
> >> > >> [16364.528668] [<ffffffff8115cc68>] __fput+0xb6/0x1e6
> >> > >> [16364.528695] [<ffffffff8115cda6>] ____fput+0xe/0x10
> >> > >> [16364.528724] [<ffffffff810998da>] task_work_run+0x7e/0x98
> >> > >> [16364.528754] [<ffffffff81082bc5>] do_exit+0x3cc/0x8fa
> >> > >> [16364.528782] [<ffffffff81083501>] ? SyS_wait4+0xa5/0xc2
> >> > >> [16364.528811] [<ffffffff8108328d>] do_group_exit+0x6f/0xa2
> >> > >> [16364.528843] [<ffffffff810832d7>] SyS_exit_group+0x17/0x17
> >> > >> [16364.528876] [<ffffffff81613e92>] system_call_fastpath+0x16/0x1b
> >> > >> [16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 68 05 00 00 <48> 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 38 48 8b
> >> > >> [16364.529176] RIP [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> >> > >> [16364.529264] RSP <ffff8803501d5c58>
> >> > >> [16364.529283] CR2: 0000000000000008
> >> > >> [16364.539039] ---[ end trace 5a73fddf23441377 ]---
> [...]
> > The listing and disassembly from nlmclnt_proc is not terribly
> > interesting unfortunately. You really want to do the listing and
> > disassembly of the RIP at panic time (nlmclnt_setlockargs+0x55).
>
> Oh, sorry! Wrong end of the oops :)
>
> 0xffffffff81245157 is in nlmclnt_setlockargs (fs/lockd/clntproc.c:131).
> 126 struct nlm_args *argp = &req->a_args;
> 127 struct nlm_lock *lock = &argp->lock;
> 128
> 129 nlmclnt_next_cookie(&argp->cookie);
> 130 memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
> 131 lock->caller = utsname()->nodename;
> 132 lock->oh.data = req->a_owner;
> 133 lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> 134 (unsigned int)fl->fl_u.nfs_fl.owner->pid,
> 135 utsname()->nodename);
>
> 0xffffffff81245102 <+0>: callq 0xffffffff81613b00 <__fentry__>
> 0xffffffff81245107 <+5>: push %rbp
> 0xffffffff81245108 <+6>: mov %rsp,%rbp
> 0xffffffff8124510b <+9>: push %r12
> 0xffffffff8124510d <+11>: mov %rsi,%r12
> 0xffffffff81245110 <+14>: push %rbx
> 0xffffffff81245111 <+15>: mov %rdi,%rbx
> 0xffffffff81245114 <+18>: lea 0x10(%rdi),%rdi
> 0xffffffff81245118 <+22>: callq 0xffffffff812450df <nlmclnt_next_cookie>
> 0xffffffff8124511d <+27>: mov 0x60(%r12),%rdx
> 0xffffffff81245122 <+32>: lea 0x44(%rbx),%rax
> 0xffffffff81245126 <+36>: mov %rax,%rdi
> 0xffffffff81245129 <+39>: mov $0x82,%ecx
> 0xffffffff8124512e <+44>: mov %gs:0xb8c0,%rax
> 0xffffffff81245137 <+53>: mov 0x20(%rdx),%rsi
> 0xffffffff8124513b <+57>: sub $0x1c0,%rsi
> 0xffffffff81245142 <+64>: rep movsb %ds:(%rsi),%es:(%rdi)
> 0xffffffff81245144 <+66>: lea 0x354(%rbx),%rdi
> 0xffffffff8124514b <+73>: mov $0x4a,%esi
> 0xffffffff81245150 <+78>: mov 0x568(%rax),%rdx
> -> 0xffffffff81245157 <+85>: mov 0x8(%rdx),%rdx
> 0xffffffff8124515b <+89>: mov %rdi,0xd0(%rbx)
> 0xffffffff81245162 <+96>: add $0x45,%rdx
>
> (aside: wish GDB reported those offsets in hex, or the kernel reported
> them in decimal. Every time I look at these I forget to convert and get
> confused...)
>
> I wonder if req is NULL (possible if the assignment to argp at the top
> of the function's been pushed down by the optimizer). Time to stick a
> printk() in and find out (after work is over so I can reboot this box
> like billy-o).
>

Does the attached patch fix the problem?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


Attachments:
0001-LOCKD-Don-t-call-utsname-nodename-from-nlmclnt_setlo.patch (1.87 kB)
0001-LOCKD-Don-t-call-utsname-nodename-from-nlmclnt_setlo.patch

2013-08-05 16:21:20

by Jeff Layton

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Mon, 05 Aug 2013 16:50:37 +0100
Nix <[email protected]> wrote:

> On 5 Aug 2013, Jeff Layton said:
>
> > On Mon, 5 Aug 2013 11:04:27 -0400
> > Jeff Layton <[email protected]> wrote:
> >
> >> On Mon, 05 Aug 2013 15:48:01 +0100
> >> Nix <[email protected]> wrote:
> >>
> >> > On 5 Aug 2013, Jeff Layton stated:
> >> >
> >> > > On Sun, 04 Aug 2013 16:40:58 +0100
> >> > > Nix <[email protected]> wrote:
> >> > >
> >> > >> I just got this panic on 3.10.4, in the middle of a large parallel
> >> > >> compilation (of Chromium, as it happens) over NFSv3:
> >> > >>
> >> > >> [16364.527516] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> >> > >> [16364.527571] IP: [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> >> > >> [16364.527611] PGD 0
> >> > >> [16364.527626] Oops: 0000 [#1] PREEMPT SMP
> >> > >> [16364.527656] Modules linked in: [last unloaded: microcode]
> >> > >> [16364.527690] CPU: 0 PID: 17034 Comm: flock Not tainted 3.10.4-05315-gf4ce424-dirty #1
> >> > >> [16364.527730] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
> >> > >> [16364.527775] task: ffff88041a97ad60 ti: ffff8803501d4000 task.ti: ffff8803501d4000
> >> > >> [16364.527813] RIP: 0010:[<ffffffff81245157>] [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> >> > >> [16364.527860] RSP: 0018:ffff8803501d5c58 EFLAGS: 00010282
> >> > >> [16364.527889] RAX: ffff88041a97ad60 RBX: ffff8803e49c8800 RCX: 0000000000000000
> >> > >> [16364.527926] RDX: 0000000000000000 RSI: 000000000000004a RDI: ffff8803e49c8b54
> >> > >> [16364.527962] RBP: ffff8803501d5c68 R08: 0000000000015720 R09: 0000000000000000
> >> > >> [16364.527998] R10: 00007ffffffff000 R11: ffff8803501d5d58 R12: ffff8803501d5d58
> >> > >> [16364.528034] R13: ffff88041bd2bc00 R14: 0000000000000000 R15: ffff8803fc9e2900
> >> > >> [16364.528070] FS: 0000000000000000(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
> >> > >> [16364.528111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > >> [16364.528142] CR2: 0000000000000008 CR3: 0000000001c0b000 CR4: 00000000001407f0
> >> > >> [16364.528177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> > >> [16364.528214] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> >> > >> [16364.528303] Stack:
> >> > >> [16364.528316] ffff8803501d5d58 ffff8803e49c8800 ffff8803501d5cd8 ffffffff81245418
> >> > >> [16364.528369] 0000000000000000 ffff8803516f0bc0 ffff8803d7b7b6c0 ffffffff81215c81
> >> > >> [16364.528418] ffff880300000007 ffff88041bd2bdc8 ffff8801aabe9650 ffff8803fc9e2900
> >> > >> [16364.528467] Call Trace:
> >> > >> [16364.528485] [<ffffffff81245418>] nlmclnt_proc+0x148/0x5fb
> >> > >> [16364.528516] [<ffffffff81215c81>] ? nfs_put_lock_context+0x69/0x6e
> >> > >> [16364.528550] [<ffffffff812209a2>] nfs3_proc_lock+0x21/0x23
> >> > >> [16364.528581] [<ffffffff812149dd>] do_unlk+0x96/0xb2
> >> > >> [16364.528608] [<ffffffff81214b41>] nfs_flock+0x5a/0x71
> >> > >> [16364.528637] [<ffffffff8119a747>] locks_remove_flock+0x9e/0x113
> >> > >> [16364.528668] [<ffffffff8115cc68>] __fput+0xb6/0x1e6
> >> > >> [16364.528695] [<ffffffff8115cda6>] ____fput+0xe/0x10
> >> > >> [16364.528724] [<ffffffff810998da>] task_work_run+0x7e/0x98
> >> > >> [16364.528754] [<ffffffff81082bc5>] do_exit+0x3cc/0x8fa
> >> > >> [16364.528782] [<ffffffff81083501>] ? SyS_wait4+0xa5/0xc2
> >> > >> [16364.528811] [<ffffffff8108328d>] do_group_exit+0x6f/0xa2
> >> > >> [16364.528843] [<ffffffff810832d7>] SyS_exit_group+0x17/0x17
> >> > >> [16364.528876] [<ffffffff81613e92>] system_call_fastpath+0x16/0x1b
> >> > >> [16364.528907] Code: 00 00 65 48 8b 04 25 c0 b8 00 00 48 8b 72 20 48 81 ee c0 01 00 00 f3 a4 48 8d bb 54 03 00 00 be 4a 00 00 00 48 8b 90 68 05 00 00 <48> 8b 52 08 48 89 bb d0 00 00 00 48 83 c2 45 48 89 53 38 48 8b
> >> > >> [16364.529176] RIP [<ffffffff81245157>] nlmclnt_setlockargs+0x55/0xcf
> >> > >> [16364.529264] RSP <ffff8803501d5c58>
> >> > >> [16364.529283] CR2: 0000000000000008
> >> > >> [16364.539039] ---[ end trace 5a73fddf23441377 ]---
> [...]
> > The listing and disassembly from nlmclnt_proc is not terribly
> > interesting unfortunately. You really want to do the listing and
> > disassembly of the RIP at panic time (nlmclnt_setlockargs+0x55).
>
> Oh, sorry! Wrong end of the oops :)
>
> 0xffffffff81245157 is in nlmclnt_setlockargs (fs/lockd/clntproc.c:131).
> 126 struct nlm_args *argp = &req->a_args;
> 127 struct nlm_lock *lock = &argp->lock;
> 128
> 129 nlmclnt_next_cookie(&argp->cookie);
> 130 memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
> 131 lock->caller = utsname()->nodename;
> 132 lock->oh.data = req->a_owner;
> 133 lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> 134 (unsigned int)fl->fl_u.nfs_fl.owner->pid,
> 135 utsname()->nodename);
>
> 0xffffffff81245102 <+0>: callq 0xffffffff81613b00 <__fentry__>
> 0xffffffff81245107 <+5>: push %rbp
> 0xffffffff81245108 <+6>: mov %rsp,%rbp
> 0xffffffff8124510b <+9>: push %r12
> 0xffffffff8124510d <+11>: mov %rsi,%r12
> 0xffffffff81245110 <+14>: push %rbx
> 0xffffffff81245111 <+15>: mov %rdi,%rbx
> 0xffffffff81245114 <+18>: lea 0x10(%rdi),%rdi
> 0xffffffff81245118 <+22>: callq 0xffffffff812450df <nlmclnt_next_cookie>
> 0xffffffff8124511d <+27>: mov 0x60(%r12),%rdx
> 0xffffffff81245122 <+32>: lea 0x44(%rbx),%rax
> 0xffffffff81245126 <+36>: mov %rax,%rdi
> 0xffffffff81245129 <+39>: mov $0x82,%ecx
> 0xffffffff8124512e <+44>: mov %gs:0xb8c0,%rax
> 0xffffffff81245137 <+53>: mov 0x20(%rdx),%rsi
> 0xffffffff8124513b <+57>: sub $0x1c0,%rsi
> 0xffffffff81245142 <+64>: rep movsb %ds:(%rsi),%es:(%rdi)
> 0xffffffff81245144 <+66>: lea 0x354(%rbx),%rdi
> 0xffffffff8124514b <+73>: mov $0x4a,%esi
> 0xffffffff81245150 <+78>: mov 0x568(%rax),%rdx
> -> 0xffffffff81245157 <+85>: mov 0x8(%rdx),%rdx
> 0xffffffff8124515b <+89>: mov %rdi,0xd0(%rbx)
> 0xffffffff81245162 <+96>: add $0x45,%rdx
>
> (aside: wish GDB reported those offsets in hex, or the kernel reported
> them in decimal. Every time I look at these I forget to convert and get
> confused...)
>
> I wonder if req is NULL (possible if the assignment to argp at the top
> of the function's been pushed down by the optimizer). Time to stick a
> printk() in and find out (after work is over so I can reboot this box
> like billy-o).
>

Ah-ha! That same bug was discussed earlier this week. See the thread with this title:

Subject: fuzz tested user mode linux core dumps in fs/lockd/clntproc.c:131

I haven't followed it too closely, unfortunately, but both oopses look
like similar problems. The kernel is tearing things down in the fput
codepath prior to running exit tasks that require those things...

--
Jeff Layton <[email protected]>

2013-08-05 17:37:52

by Jeff Layton

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Mon, 5 Aug 2013 16:15:01 +0000
"Myklebust, Trond" <[email protected]> wrote:

> From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Mon, 5 Aug 2013 12:06:12 -0400
> Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> nlmclnt_setlockargs
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
> which case we're in entirely the wrong namespace.
> Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
> exit_task_namespaces() outside of exit_notify()) now means that
> exit_task_work() is called after exit_task_namespaces(), which
> triggers an Oops when we're freeing up the locks.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> Cc: Toralf F?rster <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Nix <[email protected]>
> Cc: Jeff Layton <[email protected]>
> ---
> fs/lockd/clntproc.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> index 9760ecb..acd3947 100644
> --- a/fs/lockd/clntproc.c
> +++ b/fs/lockd/clntproc.c
> @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
> {
> struct nlm_args *argp = &req->a_args;
> struct nlm_lock *lock = &argp->lock;
> + char *nodename = req->a_host->h_rpcclnt->cl_nodename;
>
> nlmclnt_next_cookie(&argp->cookie);
> memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
> - lock->caller = utsname()->nodename;
> + lock->caller = nodename;
> lock->oh.data = req->a_owner;
> lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> (unsigned int)fl->fl_u.nfs_fl.owner->pid,
> - utsname()->nodename);
> + nodename);
> lock->svid = fl->fl_u.nfs_fl.owner->pid;
> lock->fl.fl_start = fl->fl_start;
> lock->fl.fl_end = fl->fl_end;

Looks good to me...

Reviewed-by: Jeff Layton <[email protected]>

Trond, any thoughts on the other oops that Nix posted? The issue there
seems to be that we're trying to do the pathwalk to the rpcbind unix
socket from exit_task_work(), but that's happening after we've already
called exit_fs().

The trivial answer seems to be to simply call exit_task_work() before
exit_fs() there, but it seems like we ought to be doing the upcall to
rpcbind in a mount namespace from which we know we can reach the
socket...

2013-08-05 18:18:12

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
> On Mon, 5 Aug 2013 16:15:01 +0000
> "Myklebust, Trond" <[email protected]> wrote:
>
> > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > From: Trond Myklebust <[email protected]>
> > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> > nlmclnt_setlockargs
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
> > which case we're in entirely the wrong namespace.
> > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
> > exit_task_namespaces() outside of exit_notify()) now means that
> > exit_task_work() is called after exit_task_namespaces(), which
> > triggers an Oops when we're freeing up the locks.
> >
> > Signed-off-by: Trond Myklebust <[email protected]>
> > Cc: Toralf Förster <[email protected]>
> > Cc: Oleg Nesterov <[email protected]>
> > Cc: Nix <[email protected]>
> > Cc: Jeff Layton <[email protected]>
> > ---
> > fs/lockd/clntproc.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > index 9760ecb..acd3947 100644
> > --- a/fs/lockd/clntproc.c
> > +++ b/fs/lockd/clntproc.c
> > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
> > {
> > struct nlm_args *argp = &req->a_args;
> > struct nlm_lock *lock = &argp->lock;
> > + char *nodename = req->a_host->h_rpcclnt->cl_nodename;
> >
> > nlmclnt_next_cookie(&argp->cookie);
> > memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
> > - lock->caller = utsname()->nodename;
> > + lock->caller = nodename;
> > lock->oh.data = req->a_owner;
> > lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> > (unsigned int)fl->fl_u.nfs_fl.owner->pid,
> > - utsname()->nodename);
> > + nodename);
> > lock->svid = fl->fl_u.nfs_fl.owner->pid;
> > lock->fl.fl_start = fl->fl_start;
> > lock->fl.fl_end = fl->fl_end;
>
> Looks good to me...
>
> Reviewed-by: Jeff Layton <[email protected]>
>
> Trond, any thoughts on the other oops that Nix posted? The issue there
> seems to be that we're trying to do the pathwalk to the rpcbind unix
> socket from exit_task_work(), but that's happening after we've already
> called exit_fs().
>
> The trivial answer seems to be to simply call exit_task_work() before
> exit_fs() there, but it seems like we ought to be doing the upcall to
> rpcbind in a mount namespace from which we know we can reach the
> socket...

Isn't it enough to just do the same thing as we did for gss proxy? i.e.
set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.

See attachment.
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


Attachments:
0001-SUNRPC-Don-t-auto-disconnect-from-the-local-rpcbind-.patch (1.42 kB)
0001-SUNRPC-Don-t-auto-disconnect-from-the-local-rpcbind-.patch

2013-08-05 18:33:22

by Jeff Layton

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Mon, 5 Aug 2013 18:18:03 +0000
"Myklebust, Trond" <[email protected]> wrote:

> On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
> > On Mon, 5 Aug 2013 16:15:01 +0000
> > "Myklebust, Trond" <[email protected]> wrote:
> >
> > > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > > From: Trond Myklebust <[email protected]>
> > > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> > > nlmclnt_setlockargs
> > > MIME-Version: 1.0
> > > Content-Type: text/plain; charset=UTF-8
> > > Content-Transfer-Encoding: 8bit
> > >
> > > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
> > > which case we're in entirely the wrong namespace.
> > > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
> > > exit_task_namespaces() outside of exit_notify()) now means that
> > > exit_task_work() is called after exit_task_namespaces(), which
> > > triggers an Oops when we're freeing up the locks.
> > >
> > > Signed-off-by: Trond Myklebust <[email protected]>
> > > Cc: Toralf F?rster <[email protected]>
> > > Cc: Oleg Nesterov <[email protected]>
> > > Cc: Nix <[email protected]>
> > > Cc: Jeff Layton <[email protected]>
> > > ---
> > > fs/lockd/clntproc.c | 5 +++--
> > > 1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > > index 9760ecb..acd3947 100644
> > > --- a/fs/lockd/clntproc.c
> > > +++ b/fs/lockd/clntproc.c
> > > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
> > > {
> > > struct nlm_args *argp = &req->a_args;
> > > struct nlm_lock *lock = &argp->lock;
> > > + char *nodename = req->a_host->h_rpcclnt->cl_nodename;
> > >
> > > nlmclnt_next_cookie(&argp->cookie);
> > > memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
> > > - lock->caller = utsname()->nodename;
> > > + lock->caller = nodename;
> > > lock->oh.data = req->a_owner;
> > > lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> > > (unsigned int)fl->fl_u.nfs_fl.owner->pid,
> > > - utsname()->nodename);
> > > + nodename);
> > > lock->svid = fl->fl_u.nfs_fl.owner->pid;
> > > lock->fl.fl_start = fl->fl_start;
> > > lock->fl.fl_end = fl->fl_end;
> >
> > Looks good to me...
> >
> > Reviewed-by: Jeff Layton <[email protected]>
> >
> > Trond, any thoughts on the other oops that Nix posted? The issue there
> > seems to be that we're trying to do the pathwalk to the rpcbind unix
> > socket from exit_task_work(), but that's happening after we've already
> > called exit_fs().
> >
> > The trivial answer seems to be to simply call exit_task_work() before
> > exit_fs() there, but it seems like we ought to be doing the upcall to
> > rpcbind in a mount namespace from which we know we can reach the
> > socket...
>
> Isn't it enough to just do the same thing as we did for gss proxy? i.e.
> set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.
>
> See attachment.

Yeah, that looks like a reasonable thing to do...

OTOH, Is there any other way for a unix socket to end up disconnected
other than if we were to close it? Maybe if rpcbind stopped, the socket
unlinked and recreated and then started again?

If so then you still could potentially end up in this situation even if
you didn't autoclose it.

--
Jeff Layton <[email protected]>

2013-08-05 18:33:21

by Nix

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On 5 Aug 2013, Trond Myklebust told this:
> Does the attached patch fix the problem?

> From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <[email protected]>
> Date: Mon, 5 Aug 2013 12:06:12 -0400
> Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> nlmclnt_setlockargs
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit

It makes it worse. Much, much worse. From a crash every so often when
I'm doing compilations over NFS, I get an immediate panic on startx,
long long before I even try to replicate the earlier panic:

[ 83.432358] task: ffff88041aaa5ac0 ti: ffff8804199e2000 task.ti: ffff8804199e2000
[ 83.432428] RIP: 0010:[<ffffffff8124af69>] [<ffffffff8124af69>] encode_nlm4_lock+0x26/0xbe
[ 83.432512] RSP: 0018:ffff8804199e3a78 EFLAGS: 00010286
[ 83.432564] RAX: 0000000000000000 RBX: ffff88041a577038 RCX: ffffffffffffffff
[ 83.432630] RDX: ffff8804193b3098 RSI: ffff88041a577038 RDI: 000000000000008c
[ 83.432697] RBP: ffff8804199e3aa8 R08: ffff8804193b3098 R09: 0000000000000001
[ 83.432763] R10: ffff88042fa12980 R11: ffff88042fa12980 R12: ffff8804199e3ae8
[ 83.432830] R13: 000000000000008c R14: ffff8804199e3fd8 R15: ffffffff815de80e
[ 83.432898] FS: 00007f594b40c740(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
[ 83.432974] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 83.433028] CR2: 000000000000008c CR3: 000000041ab3d000 CR4: 00000000001407f0
[ 83.433095] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 83.433176] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 83.433255] Stack:
[ 83.433276] ffff88041a44fb70 ffff880400000004 ffff8804199e3ae8 ffff88041a577010
[ 83.433360] ffff8804188e0e00 ffff8804199e3fd8 ffff8804199e3ac8 ffffffff8124b0d7
[ 83.433443] ffff8804188e0e00 ffffffff8124b086 ffff8804199e3b38 ffffffff815e6032
[ 83.433616] Call Trace:
[ 83.433646] [<ffffffff8124b0d7>] nlm4_xdr_enc_lockargs+0x51/0x76
[ 83.433707] [<ffffffff8124b086>] ? nlm4_xdr_enc_cancargs+0x56/0x56
[ 83.433769] [<ffffffff815e6032>] rpcauth_wrap_req+0x57/0x62
[ 83.433826] [<ffffffff815de98a>] call_transmit+0x17c/0x1f9
[ 83.433880] [<ffffffff815e4e58>] __rpc_execute+0xe8/0x2ca
[ 83.433935] [<ffffffff815e50f9>] rpc_execute+0x76/0x9d
[ 83.433986] [<ffffffff815debc1>] rpc_run_task+0x78/0x80
[ 83.434039] [<ffffffff815decff>] rpc_call_sync+0x88/0x9e
[ 83.434092] [<ffffffff81244b3c>] nlmclnt_call+0xb5/0x240
[ 83.434146] [<ffffffff812454f0>] nlmclnt_proc+0x226/0x5fb
[ 83.434226] [<ffffffff812209a2>] nfs3_proc_lock+0x21/0x23
[ 83.434280] [<ffffffff81214a5e>] do_setlk+0x65/0xee
[ 83.434329] [<ffffffff81214ca6>] nfs_lock+0x14e/0x162
[ 83.434382] [<ffffffff81199661>] vfs_lock_file+0x29/0x35
[ 83.434435] [<ffffffff8119a51d>] fcntl_setlk+0x139/0x2c5
[ 83.434490] [<ffffffff81169621>] SyS_fcntl+0x2b6/0x47d
[ 83.434543] [<ffffffff81613e92>] system_call_fastpath+0x16/0x1b
[ 83.434600] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 31 c0 48 83 c9 ff 48 89 e5 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 10 4c 8b 2e 4c 89 ef <f2> ae 4c 89 e7 48 f7 d1 4c 8d 71 ff 41 8d 76 04 e8 9f 16 3a 00
[ 83.435077] RIP [<ffffffff8124af69>] encode_nlm4_lock+0x26/0xbe
[ 83.435140] RSP <ffff8804199e3a78>
[ 83.435197] CR2: 000000000000008c

That's here:

(gdb) list *(encode_nlm4_lock+0x26)
0xffffffff8124af69 is in encode_nlm4_lock (fs/lockd/clnt4xdr.c:329).
324 * string caller_name<LM_MAXSTRLEN>;
325 */
326 static void encode_caller_name(struct xdr_stream *xdr, const char *name)
327 {
328 /* NB: client-side does not set lock->len */
329 u32 length = strlen(name);
330 __be32 *p;
331
332 p = xdr_reserve_space(xdr, 4 + length);
333 xdr_encode_opaque(p, name, length);

0xffffffff8124af69 <+38>: repnz scas %es:(%rdi),%al

Pretty clearly, "name" can be NULL after this patch...

--
NULL && (void)

2013-08-05 19:12:59

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Mon, 2013-08-05 at 19:33 +0100, Nix wrote:
> On 5 Aug 2013, Trond Myklebust told this:
> > Does the attached patch fix the problem?
>
> > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > From: Trond Myklebust <[email protected]>
> > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> > nlmclnt_setlockargs
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
>
> It makes it worse. Much, much worse. From a crash every so often when
> I'm doing compilations over NFS, I get an immediate panic on startx,
> long long before I even try to replicate the earlier panic:
>
> [ 83.432358] task: ffff88041aaa5ac0 ti: ffff8804199e2000 task.ti: ffff8804199e2000
> [ 83.432428] RIP: 0010:[<ffffffff8124af69>] [<ffffffff8124af69>] encode_nlm4_lock+0x26/0xbe
> [ 83.432512] RSP: 0018:ffff8804199e3a78 EFLAGS: 00010286
> [ 83.432564] RAX: 0000000000000000 RBX: ffff88041a577038 RCX: ffffffffffffffff
> [ 83.432630] RDX: ffff8804193b3098 RSI: ffff88041a577038 RDI: 000000000000008c
> [ 83.432697] RBP: ffff8804199e3aa8 R08: ffff8804193b3098 R09: 0000000000000001
> [ 83.432763] R10: ffff88042fa12980 R11: ffff88042fa12980 R12: ffff8804199e3ae8
> [ 83.432830] R13: 000000000000008c R14: ffff8804199e3fd8 R15: ffffffff815de80e
> [ 83.432898] FS: 00007f594b40c740(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
> [ 83.432974] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 83.433028] CR2: 000000000000008c CR3: 000000041ab3d000 CR4: 00000000001407f0
> [ 83.433095] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 83.433176] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 83.433255] Stack:
> [ 83.433276] ffff88041a44fb70 ffff880400000004 ffff8804199e3ae8 ffff88041a577010
> [ 83.433360] ffff8804188e0e00 ffff8804199e3fd8 ffff8804199e3ac8 ffffffff8124b0d7
> [ 83.433443] ffff8804188e0e00 ffffffff8124b086 ffff8804199e3b38 ffffffff815e6032
> [ 83.433616] Call Trace:
> [ 83.433646] [<ffffffff8124b0d7>] nlm4_xdr_enc_lockargs+0x51/0x76
> [ 83.433707] [<ffffffff8124b086>] ? nlm4_xdr_enc_cancargs+0x56/0x56
> [ 83.433769] [<ffffffff815e6032>] rpcauth_wrap_req+0x57/0x62
> [ 83.433826] [<ffffffff815de98a>] call_transmit+0x17c/0x1f9
> [ 83.433880] [<ffffffff815e4e58>] __rpc_execute+0xe8/0x2ca
> [ 83.433935] [<ffffffff815e50f9>] rpc_execute+0x76/0x9d
> [ 83.433986] [<ffffffff815debc1>] rpc_run_task+0x78/0x80
> [ 83.434039] [<ffffffff815decff>] rpc_call_sync+0x88/0x9e
> [ 83.434092] [<ffffffff81244b3c>] nlmclnt_call+0xb5/0x240
> [ 83.434146] [<ffffffff812454f0>] nlmclnt_proc+0x226/0x5fb
> [ 83.434226] [<ffffffff812209a2>] nfs3_proc_lock+0x21/0x23
> [ 83.434280] [<ffffffff81214a5e>] do_setlk+0x65/0xee
> [ 83.434329] [<ffffffff81214ca6>] nfs_lock+0x14e/0x162
> [ 83.434382] [<ffffffff81199661>] vfs_lock_file+0x29/0x35
> [ 83.434435] [<ffffffff8119a51d>] fcntl_setlk+0x139/0x2c5
> [ 83.434490] [<ffffffff81169621>] SyS_fcntl+0x2b6/0x47d
> [ 83.434543] [<ffffffff81613e92>] system_call_fastpath+0x16/0x1b
> [ 83.434600] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 31 c0 48 83 c9 ff 48 89 e5 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 10 4c 8b 2e 4c 89 ef <f2> ae 4c 89 e7 48 f7 d1 4c 8d 71 ff 41 8d 76 04 e8 9f 16 3a 00
> [ 83.435077] RIP [<ffffffff8124af69>] encode_nlm4_lock+0x26/0xbe
> [ 83.435140] RSP <ffff8804199e3a78>
> [ 83.435197] CR2: 000000000000008c
>
> That's here:
>
> (gdb) list *(encode_nlm4_lock+0x26)
> 0xffffffff8124af69 is in encode_nlm4_lock (fs/lockd/clnt4xdr.c:329).
> 324 * string caller_name<LM_MAXSTRLEN>;
> 325 */
> 326 static void encode_caller_name(struct xdr_stream *xdr, const char *name)
> 327 {
> 328 /* NB: client-side does not set lock->len */
> 329 u32 length = strlen(name);
> 330 __be32 *p;
> 331
> 332 p = xdr_reserve_space(xdr, 4 + length);
> 333 xdr_encode_opaque(p, name, length);
>
> 0xffffffff8124af69 <+38>: repnz scas %es:(%rdi),%al
>
> Pretty clearly, "name" can be NULL after this patch...
>
Yes. This scheme will only work if we make sure that host->h_rpcclnt is
initialised at mount time. Here is a v2 patch that should do the right
thing.
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


Attachments:
0001-LOCKD-Don-t-call-utsname-nodename-from-nlmclnt_setlo.patch (2.92 kB)
0001-LOCKD-Don-t-call-utsname-nodename-from-nlmclnt_setlo.patch

2013-08-06 02:21:38

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Mon, 2013-08-05 at 14:33 -0400, Jeff Layton wrote:
> On Mon, 5 Aug 2013 18:18:03 +0000
> "Myklebust, Trond" <[email protected]> wrote:
>
> > On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
> > > On Mon, 5 Aug 2013 16:15:01 +0000
> > > "Myklebust, Trond" <[email protected]> wrote:
> > >
> > > > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > > > From: Trond Myklebust <[email protected]>
> > > > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > > > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> > > > nlmclnt_setlockargs
> > > > MIME-Version: 1.0
> > > > Content-Type: text/plain; charset=UTF-8
> > > > Content-Transfer-Encoding: 8bit
> > > >
> > > > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
> > > > which case we're in entirely the wrong namespace.
> > > > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
> > > > exit_task_namespaces() outside of exit_notify()) now means that
> > > > exit_task_work() is called after exit_task_namespaces(), which
> > > > triggers an Oops when we're freeing up the locks.
> > > >
> > > > Signed-off-by: Trond Myklebust <[email protected]>
> > > > Cc: Toralf Förster <[email protected]>
> > > > Cc: Oleg Nesterov <[email protected]>
> > > > Cc: Nix <[email protected]>
> > > > Cc: Jeff Layton <[email protected]>
> > > > ---
> > > > fs/lockd/clntproc.c | 5 +++--
> > > > 1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > > > index 9760ecb..acd3947 100644
> > > > --- a/fs/lockd/clntproc.c
> > > > +++ b/fs/lockd/clntproc.c
> > > > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
> > > > {
> > > > struct nlm_args *argp = &req->a_args;
> > > > struct nlm_lock *lock = &argp->lock;
> > > > + char *nodename = req->a_host->h_rpcclnt->cl_nodename;
> > > >
> > > > nlmclnt_next_cookie(&argp->cookie);
> > > > memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
> > > > - lock->caller = utsname()->nodename;
> > > > + lock->caller = nodename;
> > > > lock->oh.data = req->a_owner;
> > > > lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> > > > (unsigned int)fl->fl_u.nfs_fl.owner->pid,
> > > > - utsname()->nodename);
> > > > + nodename);
> > > > lock->svid = fl->fl_u.nfs_fl.owner->pid;
> > > > lock->fl.fl_start = fl->fl_start;
> > > > lock->fl.fl_end = fl->fl_end;
> > >
> > > Looks good to me...
> > >
> > > Reviewed-by: Jeff Layton <[email protected]>
> > >
> > > Trond, any thoughts on the other oops that Nix posted? The issue there
> > > seems to be that we're trying to do the pathwalk to the rpcbind unix
> > > socket from exit_task_work(), but that's happening after we've already
> > > called exit_fs().
> > >
> > > The trivial answer seems to be to simply call exit_task_work() before
> > > exit_fs() there, but it seems like we ought to be doing the upcall to
> > > rpcbind in a mount namespace from which we know we can reach the
> > > socket...
> >
> > Isn't it enough to just do the same thing as we did for gss proxy? i.e.
> > set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.
> >
> > See attachment.
>
> Yeah, that looks like a reasonable thing to do...
>
> OTOH, Is there any other way for a unix socket to end up disconnected
> other than if we were to close it? Maybe if rpcbind stopped, the socket
> unlinked and recreated and then started again?
>
> If so then you still could potentially end up in this situation even if
> you didn't autoclose it.

True. How about something like the following instead. Note the change to
the original patch...
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


Attachments:
0001-SUNRPC-Don-t-auto-disconnect-from-the-local-rpcbind-.patch (1.39 kB)
0001-SUNRPC-Don-t-auto-disconnect-from-the-local-rpcbind-.patch
0002-SUNRPC-If-the-rpcbind-channel-is-disconnected-fail-t.patch (6.72 kB)
0002-SUNRPC-If-the-rpcbind-channel-is-disconnected-fail-t.patch
Download all attachments

2013-08-06 09:24:20

by Jeff Layton

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Tue, 6 Aug 2013 02:21:35 +0000
"Myklebust, Trond" <[email protected]> wrote:

> On Mon, 2013-08-05 at 14:33 -0400, Jeff Layton wrote:
> > On Mon, 5 Aug 2013 18:18:03 +0000
> > "Myklebust, Trond" <[email protected]> wrote:
> >
> > > On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
> > > > On Mon, 5 Aug 2013 16:15:01 +0000
> > > > "Myklebust, Trond" <[email protected]> wrote:
> > > >
> > > > > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > > > > From: Trond Myklebust <[email protected]>
> > > > > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > > > > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> > > > > nlmclnt_setlockargs
> > > > > MIME-Version: 1.0
> > > > > Content-Type: text/plain; charset=UTF-8
> > > > > Content-Transfer-Encoding: 8bit
> > > > >
> > > > > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
> > > > > which case we're in entirely the wrong namespace.
> > > > > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
> > > > > exit_task_namespaces() outside of exit_notify()) now means that
> > > > > exit_task_work() is called after exit_task_namespaces(), which
> > > > > triggers an Oops when we're freeing up the locks.
> > > > >
> > > > > Signed-off-by: Trond Myklebust <[email protected]>
> > > > > Cc: Toralf F?rster <[email protected]>
> > > > > Cc: Oleg Nesterov <[email protected]>
> > > > > Cc: Nix <[email protected]>
> > > > > Cc: Jeff Layton <[email protected]>
> > > > > ---
> > > > > fs/lockd/clntproc.c | 5 +++--
> > > > > 1 file changed, 3 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > > > > index 9760ecb..acd3947 100644
> > > > > --- a/fs/lockd/clntproc.c
> > > > > +++ b/fs/lockd/clntproc.c
> > > > > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
> > > > > {
> > > > > struct nlm_args *argp = &req->a_args;
> > > > > struct nlm_lock *lock = &argp->lock;
> > > > > + char *nodename = req->a_host->h_rpcclnt->cl_nodename;
> > > > >
> > > > > nlmclnt_next_cookie(&argp->cookie);
> > > > > memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
> > > > > - lock->caller = utsname()->nodename;
> > > > > + lock->caller = nodename;
> > > > > lock->oh.data = req->a_owner;
> > > > > lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> > > > > (unsigned int)fl->fl_u.nfs_fl.owner->pid,
> > > > > - utsname()->nodename);
> > > > > + nodename);
> > > > > lock->svid = fl->fl_u.nfs_fl.owner->pid;
> > > > > lock->fl.fl_start = fl->fl_start;
> > > > > lock->fl.fl_end = fl->fl_end;
> > > >
> > > > Looks good to me...
> > > >
> > > > Reviewed-by: Jeff Layton <[email protected]>
> > > >
> > > > Trond, any thoughts on the other oops that Nix posted? The issue there
> > > > seems to be that we're trying to do the pathwalk to the rpcbind unix
> > > > socket from exit_task_work(), but that's happening after we've already
> > > > called exit_fs().
> > > >
> > > > The trivial answer seems to be to simply call exit_task_work() before
> > > > exit_fs() there, but it seems like we ought to be doing the upcall to
> > > > rpcbind in a mount namespace from which we know we can reach the
> > > > socket...
> > >
> > > Isn't it enough to just do the same thing as we did for gss proxy? i.e.
> > > set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.
> > >
> > > See attachment.
> >
> > Yeah, that looks like a reasonable thing to do...
> >
> > OTOH, Is there any other way for a unix socket to end up disconnected
> > other than if we were to close it? Maybe if rpcbind stopped, the socket
> > unlinked and recreated and then started again?
> >
> > If so then you still could potentially end up in this situation even if
> > you didn't autoclose it.
>
> True. How about something like the following instead. Note the change to
> the original patch...

Looks good to me.

Acked-by: Jeff Layton <[email protected]>

2013-08-06 20:46:16

by Nix

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On 5 Aug 2013, Trond Myklebust uttered the following:
> Yes. This scheme will only work if we make sure that host->h_rpcclnt is
> initialised at mount time. Here is a v2 patch that should do the right
> thing.

Confirmed, that fixes it! I'll try your shutdown crash fix next.

--
NULL && (void)

2013-08-07 10:19:09

by Nix

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On 6 Aug 2013, Trond Myklebust verbalised:
> True. How about something like the following instead. Note the change to
> the original patch...

Well, with those applied I could reboot without a panic for the first
time since 3.8.x: looking good. I'll give it a reboot or two with a
system that's not hot from booting though.

--
NULL && (void)

2013-08-07 15:27:36

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Wed, 2013-08-07 at 11:18 +0100, Nix wrote:
> On 6 Aug 2013, Trond Myklebust verbalised:
> > True. How about something like the following instead. Note the change to
> > the original patch...
>
> Well, with those applied I could reboot without a panic for the first
> time since 3.8.x: looking good. I'll give it a reboot or two with a
> system that's not hot from booting though.
>

Could you please also try applying only the 1/2 patch, to see if that
suffices to quell the shutdown panic?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-08-07 21:01:37

by Nix

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On 7 Aug 2013, Trond Myklebust said:

> On Wed, 2013-08-07 at 11:18 +0100, Nix wrote:
>> On 6 Aug 2013, Trond Myklebust verbalised:
>> > True. How about something like the following instead. Note the change to
>> > the original patch...
>>
>> Well, with those applied I could reboot without a panic for the first
>> time since 3.8.x: looking good. I'll give it a reboot or two with a
>> system that's not hot from booting though.
>
> Could you please also try applying only the 1/2 patch, to see if that
> suffices to quell the shutdown panic?

It doesn't suffice. I see this severely truncated oops:

[ 115.799092] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 115.800284] IP: [<ffffffff81165ec6>] path_init+0x11c/0x36f
[ 115.801463] PGD 0
[ 115.802625] Oops: 0000 [#1] PREEMPT SMP
[ 115.803805] Modules linked in: [last unloaded: microcode]
[ 115.804995] CPU: 3 PID: 1191 Comm: sleep Not tainted 3.10.5-05317-g3c9f6fa-dirty #2
[ 115.806207] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
[ 115.807453] task: ffff8804189a0000 ti: ffff8803f74d6000 task.ti: ffff8803f74d6000

--
NULL && (void)

2013-08-07 21:10:00

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

On Wed, 2013-08-07 at 22:01 +0100, Nix wrote:
> On 7 Aug 2013, Trond Myklebust said:
>
> > On Wed, 2013-08-07 at 11:18 +0100, Nix wrote:
> >> On 6 Aug 2013, Trond Myklebust verbalised:
> >> > True. How about something like the following instead. Note the change to
> >> > the original patch...
> >>
> >> Well, with those applied I could reboot without a panic for the first
> >> time since 3.8.x: looking good. I'll give it a reboot or two with a
> >> system that's not hot from booting though.
> >
> > Could you please also try applying only the 1/2 patch, to see if that
> > suffices to quell the shutdown panic?
>
> It doesn't suffice. I see this severely truncated oops:
>
> [ 115.799092] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [ 115.800284] IP: [<ffffffff81165ec6>] path_init+0x11c/0x36f
> [ 115.801463] PGD 0
> [ 115.802625] Oops: 0000 [#1] PREEMPT SMP
> [ 115.803805] Modules linked in: [last unloaded: microcode]
> [ 115.804995] CPU: 3 PID: 1191 Comm: sleep Not tainted 3.10.5-05317-g3c9f6fa-dirty #2
> [ 115.806207] Hardware name: System manufacturer System Product Name/P8H61-MX USB3, BIOS 0506 08/10/2012
> [ 115.807453] task: ffff8804189a0000 ti: ffff8803f74d6000 task.ti: ffff8803f74d6000
>
OK. Then I'll mark them both for stable inclusion in 3.9+.

Thanks for testing!
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?