2012-06-10 09:11:04

by Jamie Heilman

[permalink] [raw]
Subject: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!

Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
reproduce this reliably:

------------[ cut here ]------------
kernel BUG at fs/nfsd/nfs4state.c:1044!
invalid opcode: 0000 [#1]
Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button

Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
EIP is at free_client.isra.47+0x3/0x5 [nfsd]
EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
Stack:
e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
Call Trace:
[<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
[<f86a6d36>] nfsd4_setclientid_confirm+0x12e/0x1c7 [nfsd]
[<f869f63c>] nfsd4_proc_compound+0x1fd/0x354 [nfsd]
[<f86946e3>] nfsd_dispatch+0x89/0x134 [nfsd]
[<f86c6d64>] svc_process+0x327/0x4e3 [sunrpc]
[<f86940d2>] nfsd+0xd2/0x10b [nfsd]
[<f8694000>] ? 0xf8693fff
[<c102d53a>] kthread+0x66/0x6b
[<c102d4d4>] ? flush_kthread_worker+0x74/0x74
[<c123193e>] kernel_thread_helper+0x6/0xd
Code: 01 80 3e 00 74 04 89 f2 eb a2 8b 4d e8 8b 55 ec 8b 45 f0 0f c8 89 19 89 02 31 c0 83 c4 18 5b 5e 5f 5d c3 55 89 e5 0f 0b 55 89 e5 <0f> 0b 55 89 e5 57 56 89 d6 53 89 c3 8b 78 34 89 d0 e8 b4 8f ff
EIP: [<f86ac683>] free_client.isra.47+0x3/0x5 [nfsd] SS:ESP 0068:e9cf9eb4
---[ end trace bf1613e00e6f3bbe ]---

System is a 32-bit Via C7; to reproduce I need only establish a NFSv4
mount from a client, then reboot the client. The BUG happens when the
client attempts to reconnect. At this point nfs service on the server
becomes something of a lost cause. Userspace is Debian stable
(nfs-utils 1.2.2 based). I haven't had a chance to bisect the issue
yet, I'll give that shot in the coming week if nobody knows of any
obvious fixes. Let me know if there's any other info I can provide
that's useful.

--
Jamie Heilman http://audible.transient.net/~jamie/


2012-06-11 03:42:21

by Jamie Heilman

[permalink] [raw]
Subject: Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!

Jamie Heilman wrote:
> Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
> reproduce this reliably:
>
> ------------[ cut here ]------------
> kernel BUG at fs/nfsd/nfs4state.c:1044!
> invalid opcode: 0000 [#1]
> Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
>
> Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
> EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
> EIP is at free_client.isra.47+0x3/0x5 [nfsd]
> EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
> ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
> DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
> Stack:
> e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
> e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
> f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
> Call Trace:
> [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
> [<f86a6d36>] nfsd4_setclientid_confirm+0x12e/0x1c7 [nfsd]
> [<f869f63c>] nfsd4_proc_compound+0x1fd/0x354 [nfsd]
> [<f86946e3>] nfsd_dispatch+0x89/0x134 [nfsd]
> [<f86c6d64>] svc_process+0x327/0x4e3 [sunrpc]
> [<f86940d2>] nfsd+0xd2/0x10b [nfsd]
> [<f8694000>] ? 0xf8693fff
> [<c102d53a>] kthread+0x66/0x6b
> [<c102d4d4>] ? flush_kthread_worker+0x74/0x74
> [<c123193e>] kernel_thread_helper+0x6/0xd
> Code: 01 80 3e 00 74 04 89 f2 eb a2 8b 4d e8 8b 55 ec 8b 45 f0 0f c8 89 19 89 02 31 c0 83 c4 18 5b 5e 5f 5d c3 55 89 e5 0f 0b 55 89 e5 <0f> 0b 55 89 e5 57 56 89 d6 53 89 c3 8b 78 34 89 d0 e8 b4 8f ff
> EIP: [<f86ac683>] free_client.isra.47+0x3/0x5 [nfsd] SS:ESP 0068:e9cf9eb4
> ---[ end trace bf1613e00e6f3bbe ]---
>
> System is a 32-bit Via C7; to reproduce I need only establish a NFSv4
> mount from a client, then reboot the client. The BUG happens when the
> client attempts to reconnect. At this point nfs service on the server
> becomes something of a lost cause. Userspace is Debian stable
> (nfs-utils 1.2.2 based). I haven't had a chance to bisect the issue
> yet, I'll give that shot in the coming week if nobody knows of any
> obvious fixes. Let me know if there's any other info I can provide
> that's useful.

Turns out bisection isn't going to be straightforward as between v3.3
and v3.4-rc1 there was some other issue introduced and fixed that
prevents my test system from booting at all, it freezes up after the
MSR00000000: 0000000000000000 ... spew, and I haven't figured out what
it was. I tried 3.5-rc2 to see if the issue had been fixed by further
nfs tree merges, and no it hasn't been. I was also able to trigger
this a couple of times, just by attempting to mv a ~4M file onto a
recovered nfs volume after an NFS server reboot:

RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
RPC: fragment too large: 0x000800cc
------------[ cut here ]------------
kernel BUG at fs/nfsd/nfs4state.c:1083!
invalid opcode: 0000 [#1]
Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs fscache auth_rpcgss lockd sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep tpm_tis tpm tpm_bios snd_pcm snd_timer snd soundcore via_rhine evdev snd_page_alloc mii via_agp via_velocity crc_ccitt agpgart button

Pid: 515, comm: kworker/u:5 Not tainted 3.5.0-rc2 #7 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
EIP: 0060:[<f8740eb5>] EFLAGS: 00010246 CPU: 0
EIP is at free_client.isra.46+0x3/0x5 [nfsd]
EAX: 00000000 EBX: f5be9000 ECX: f5ba5f58 EDX: f5ba5f58
ESI: f5be9010 EDI: f5ba5f28 EBP: f5ba5f1c ESP: f5ba5f1c
DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: b7732000 CR3: 3513b000 CR4: 000006b0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process kworker/u:5 (pid: 515, ti=f5ba4000 task=f59e5d00 task.ti=f5ba4000)
Stack:
f5ba5f3c f873a72b f5be9028 f5ba5f28 f5ba5f28 00000029 f5ba5f58 f5be9000
f5ba5f6c f873b434 f59e5d00 00000000 f5be9030 4fd5563d f5ba5f58 f5ba5f58
f5ba5f58 f59ffb00 f8746368 00000000 f5ba5fa0 c102b950 0000007b 00000000
Call Trace:
[<f873a72b>] expire_client+0xb1/0xb9 [nfsd]
[<f873b434>] laundromat_main+0x132/0x1fa [nfsd]
[<c102b950>] process_one_work+0xe0/0x1aa
[<f873b302>] ? nfsd4_exchange_id+0x292/0x292 [nfsd]
[<c102bbd3>] worker_thread+0xa4/0x12c
[<c102bb2f>] ? rescuer_thread+0xf4/0xf4
[<c102edda>] kthread+0x66/0x6b
[<c102ed74>] ? flush_kthread_worker+0x74/0x74
[<c12392fe>] kernel_thread_helper+0x6/0xd
Code: 03 8d 73 01 8a 06 84 c0 75 84 8b 45 e4 89 10 8b 55 ec 8b 45 f0 0f c8 89 02 31 c0 83 c4 20 5b 5e 5f 5d c3 55 89 e5 0f 0b 55 89 e5 <0f> 0b 55 89 e5 57 56 89 d6 53 53 89 c3 8b 78 34 89 d0 e8 cb 82
EIP: [<f8740eb5>] free_client.isra.46+0x3/0x5 [nfsd] SS:ESP 0068:f5ba5f1c
---[ end trace 59132c98eb4d8731 ]---
RPC: fragment too large: 0x000800cc
BUG: unable to handle kernel paging request at fffffffc
IP: [<c102ede9>] kthread_data+0xa/0xe
*pdpt = 0000000001397001 *pde = 000000000139b067 *pte = 0000000000000000
Oops: 0000 [#2]
Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs fscache auth_rpcgss lockd sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via snd_hda_intel snd_hda_codec snd_hwdep tpm_tis tpm tpm_bios snd_pcm snd_timer snd soundcore via_rhine evdev snd_page_alloc mii via_agp via_velocity crc_ccitt agpgart button

Pid: 515, comm: kworker/u:5 Tainted: G D 3.5.0-rc2 #7 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
EIP: 0060:[<c102ede9>] EFLAGS: 00010046 CPU: 0
EIP is at kthread_data+0xa/0xe
EAX: 00000000 EBX: c13b49a4 ECX: ffd23940 EDX: 00000000
ESI: 00000000 EDI: f59e5e70 EBP: f5ba5d20 ESP: f5ba5d14
DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: fffffffc CR3: 34da6000 CR4: 000006b0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process kworker/u:5 (pid: 515, ti=f5ba4000 task=f59e5d00 task.ti=f5ba4000)
Stack:
c102c884 f59e5d00 f5ba5ce4 f5ba5d9c c1237e5d f5fe9600 00000046 f5ba5d58
c102c587 00000000 f59e5d00 f514b2f8 f5bee600 f59e5d00 00000000 00000000
00000202 f5ba5d64 c102c5d2 f4a07c80 f5ba5d6c c102c5fe f5ba5d74 00000202
Call Trace:
[<c102c884>] ? wq_worker_sleeping+0x11/0x5b
[<c1237e5d>] __schedule+0xa8/0x3bf
[<c102c587>] ? __queue_work+0x163/0x181
[<c102c5d2>] ? queue_work_on+0x13/0x1f
[<c102c5fe>] ? queue_work+0xe/0x10
[<c1100bc7>] ? put_io_context+0x3e/0x55
[<c1100c55>] ? put_io_context_active+0x40/0x45
[<c1238219>] schedule+0x51/0x53
[<c10208c7>] do_exit+0x5c6/0x5c8
[<c100430d>] oops_end+0x6b/0x70
[<c100444c>] die+0x54/0x5c
[<c10025c6>] do_trap+0x8a/0xa3
[<c1002814>] ? do_bounds+0x69/0x69
[<c100289c>] do_invalid_op+0x88/0x92
[<f8740eb5>] ? free_client.isra.46+0x3/0x5 [nfsd]
[<c1238219>] ? schedule+0x51/0x53
[<c1237540>] ? schedule_timeout+0x15/0x96
[<c10346a8>] ? check_preempt_curr+0x27/0x62
[<c1237cea>] ? wait_for_common+0x5c/0xa4
[<c1034753>] ? try_to_wake_up+0x70/0x70
[<c1237db3>] ? wait_for_completion+0x12/0x14
[<c1238b45>] error_code+0x65/0x6c
[<f8740eb5>] ? free_client.isra.46+0x3/0x5 [nfsd]
[<f873a72b>] expire_client+0xb1/0xb9 [nfsd]
[<f873b434>] laundromat_main+0x132/0x1fa [nfsd]
[<c102b950>] process_one_work+0xe0/0x1aa
[<f873b302>] ? nfsd4_exchange_id+0x292/0x292 [nfsd]
[<c102bbd3>] worker_thread+0xa4/0x12c
[<c102bb2f>] ? rescuer_thread+0xf4/0xf4
[<c102edda>] kthread+0x66/0x6b
[<c102ed74>] ? flush_kthread_worker+0x74/0x74
[<c12392fe>] kernel_thread_helper+0x6/0xd
Code: 8d 43 10 e8 fe 53 00 00 e8 fd 93 20 00 b8 fc ff ff ff 83 7d e0 00 75 04 89 f0 ff d7 e8 22 15 ff ff 55 8b 80 44 01 00 00 89 e5 5d <8b> 40 fc c3 55 31 c0 89 e5 5d c3 55 ba fa 11 2b c1 89 e5 57 56
EIP: [<c102ede9>] kthread_data+0xa/0xe SS:ESP 0068:f5ba5d14
CR2: 00000000fffffffc
---[ end trace 59132c98eb4d8732 ]---
Fixing recursive fault but reboot is needed!
BUG: unable to handle kernel paging request at fffffffc
IP: [<c102ede9>] kthread_data+0xa/0xe
*pdpt = 0000000001397001 *pde = 000000000139b067 *pte = 0000000000000000
Oops: 0000 [#3]

and at this point the system gets thrown into a loop and spews errors
until it eventually reboots, which may be due to my watchdog going
off or something, I'm not sure, I couldn't capture all of the output.


--
Jamie Heilman http://audible.transient.net/~jamie/

2012-06-12 11:18:07

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!

On Tue, Jun 12, 2012 at 10:13:39AM +0000, Jamie Heilman wrote:
> J. Bruce Fields wrote:
> > On Sun, Jun 10, 2012 at 09:03:42AM +0000, Jamie Heilman wrote:
> > > Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
> > > reproduce this reliably:
> > >
> > > ------------[ cut here ]------------
> > > kernel BUG at fs/nfsd/nfs4state.c:1044!
> >
> > That's
> >
> > free_client(struct nfs4_client *clp)
> > {
> > ---> BUG_ON(!spin_is_locked(&client_lock));
> >
> >
> > > invalid opcode: 0000 [#1]
> > > Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
> > >
> > > Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
> > > EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
> > > EIP is at free_client.isra.47+0x3/0x5 [nfsd]
> > > EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
> > > ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
> > > DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> > > CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
> > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > DR6: ffff0ff0 DR7: 00000400
> > > Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
> > > Stack:
> > > e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
> > > e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
> > > f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
> > > Call Trace:
> > > [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
> >
> > And the only free_client call there is:
> >
> > spin_lock(&client_lock);
> > unhash_client_locked(clp);
> > if (atomic_read(&clp->cl_refcount) == 0)
> > ---> free_client(clp);
> > spin_unlock(&client_lock);
> >
> > So, that's strange.
>
> OK, other observations which may or may not have any value:
> - I can't trigger the BUG if I build with CONFIG_SMP (which obviously
> changes the RCU as well)
> - I can't trigger the BUG if I build with CONFIG_DEBUG_SPINLOCK

Oh, OK, so probably spin_is_locked() just isn't reliable in the
!CONFIG_SMP case. That would make sense.

So at a minimum we should change that BUG_ON to a WARN_ON_ONCE.

--b.

> - the "RPC: fragment too large" errors I mentioned appear to only occur
> with a 3.5-rc client; I can't repro them with a 3.4 client
>
> --
> Jamie Heilman http://audible.transient.net/~jamie/

2012-06-11 16:14:28

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!

On Sun, Jun 10, 2012 at 09:03:42AM +0000, Jamie Heilman wrote:
> Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
> reproduce this reliably:
>
> ------------[ cut here ]------------
> kernel BUG at fs/nfsd/nfs4state.c:1044!

That's

free_client(struct nfs4_client *clp)
{
---> BUG_ON(!spin_is_locked(&client_lock));


> invalid opcode: 0000 [#1]
> Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
>
> Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
> EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
> EIP is at free_client.isra.47+0x3/0x5 [nfsd]
> EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
> ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
> DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
> Stack:
> e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
> e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
> f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
> Call Trace:
> [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]

And the only free_client call there is:

spin_lock(&client_lock);
unhash_client_locked(clp);
if (atomic_read(&clp->cl_refcount) == 0)
---> free_client(clp);
spin_unlock(&client_lock);

So, that's strange.

--b.


> [<f86a6d36>] nfsd4_setclientid_confirm+0x12e/0x1c7 [nfsd]
> [<f869f63c>] nfsd4_proc_compound+0x1fd/0x354 [nfsd]
> [<f86946e3>] nfsd_dispatch+0x89/0x134 [nfsd]
> [<f86c6d64>] svc_process+0x327/0x4e3 [sunrpc]
> [<f86940d2>] nfsd+0xd2/0x10b [nfsd]
> [<f8694000>] ? 0xf8693fff
> [<c102d53a>] kthread+0x66/0x6b
> [<c102d4d4>] ? flush_kthread_worker+0x74/0x74
> [<c123193e>] kernel_thread_helper+0x6/0xd
> Code: 01 80 3e 00 74 04 89 f2 eb a2 8b 4d e8 8b 55 ec 8b 45 f0 0f c8 89 19 89 02 31 c0 83 c4 18 5b 5e 5f 5d c3 55 89 e5 0f 0b 55 89 e5 <0f> 0b 55 89 e5 57 56 89 d6 53 89 c3 8b 78 34 89 d0 e8 b4 8f ff
> EIP: [<f86ac683>] free_client.isra.47+0x3/0x5 [nfsd] SS:ESP 0068:e9cf9eb4
> ---[ end trace bf1613e00e6f3bbe ]---
>
> System is a 32-bit Via C7; to reproduce I need only establish a NFSv4
> mount from a client, then reboot the client. The BUG happens when the
> client attempts to reconnect. At this point nfs service on the server
> becomes something of a lost cause. Userspace is Debian stable
> (nfs-utils 1.2.2 based). I haven't had a chance to bisect the issue
> yet, I'll give that shot in the coming week if nobody knows of any
> obvious fixes. Let me know if there's any other info I can provide
> that's useful.
>
> --
> Jamie Heilman http://audible.transient.net/~jamie/

2012-06-13 09:27:07

by Jamie Heilman

[permalink] [raw]
Subject: RPC: fragment too large with transition to 3.4

Jamie Heilman wrote:
> It's looking like my issues with "RPC: fragment too large" may be
> something else entirely at this point, I've noticed other weird
> network behavior that I'm gonna have to try to isolate before I keep
> blaming nfs changes. Though for some reason my
> /proc/fs/nfsd/max_block_size ends up only 128KiB w/3.4 where it was
> 512KiB w/3.3.

OK, I get it now. 32-bit PAE system w/4G of RAM (minus a chunk for
the IGP video etc.) for my NFS server, and the max_block_size
calculation changed significantly in commit
508f92275624fc755104b17945bdc822936f1918 to account for rpc buffers
only being in low memory. That means whereas in 3.3 the math came out
to having a target size of roughly 843241 my new target size in 3.4 is
only 219959-ish, so choosing 128KiB is understandable. The problem
was that all my clients had negotiated their nfs mounts against the
v3.3 value of 512KiB, and when I rebooted into 3.4... they hit the
wall attempting larger transfers and become uselessly stuck at that
point. If I remount everything before doing any large transfers, then
it negotiates a lower wsize and things work fine. So everything is
working as planned I suppose... the transition between 3.3 and 3.4 is
just a bit rough.

--
Jamie Heilman http://audible.transient.net/~jamie/

2012-06-19 06:15:51

by Benny Halevy

[permalink] [raw]
Subject: Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!

On 2012-06-12 15:36, J. Bruce Fields wrote:
> On Tue, Jun 12, 2012 at 07:18:03AM -0400, J. Bruce Fields wrote:
>> On Tue, Jun 12, 2012 at 10:13:39AM +0000, Jamie Heilman wrote:
>>> J. Bruce Fields wrote:
>>>> On Sun, Jun 10, 2012 at 09:03:42AM +0000, Jamie Heilman wrote:
>>>>> Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
>>>>> reproduce this reliably:
>>>>>
>>>>> ------------[ cut here ]------------
>>>>> kernel BUG at fs/nfsd/nfs4state.c:1044!
>>>>
>>>> That's
>>>>
>>>> free_client(struct nfs4_client *clp)
>>>> {
>>>> ---> BUG_ON(!spin_is_locked(&client_lock));
>>>>
>>>>
>>>>> invalid opcode: 0000 [#1]
>>>>> Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
>>>>>
>>>>> Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
>>>>> EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
>>>>> EIP is at free_client.isra.47+0x3/0x5 [nfsd]
>>>>> EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
>>>>> ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
>>>>> DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
>>>>> CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
>>>>> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>>>>> DR6: ffff0ff0 DR7: 00000400
>>>>> Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
>>>>> Stack:
>>>>> e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
>>>>> e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
>>>>> f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
>>>>> Call Trace:
>>>>> [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
>>>>
>>>> And the only free_client call there is:
>>>>
>>>> spin_lock(&client_lock);
>>>> unhash_client_locked(clp);
>>>> if (atomic_read(&clp->cl_refcount) == 0)
>>>> ---> free_client(clp);
>>>> spin_unlock(&client_lock);
>>>>
>>>> So, that's strange.
>>>
>>> OK, other observations which may or may not have any value:
>>> - I can't trigger the BUG if I build with CONFIG_SMP (which obviously
>>> changes the RCU as well)
>>> - I can't trigger the BUG if I build with CONFIG_DEBUG_SPINLOCK
>>
>> Oh, OK, so probably spin_is_locked() just isn't reliable in the
>> !CONFIG_SMP case. That would make sense.
>>
>> So at a minimum we should change that BUG_ON to a WARN_ON_ONCE.
>
> Doh. Googline around--looks like we want something like this.
>
> I've got a couple other bugfixes to collect and then I'll try to get
> this to Linus by the end of the week.
>
> --b.
>
>
> commit 4cdc12ca178dfa74a9619451be41419617424a07
> Author: J. Bruce Fields <[email protected]>
> Date: Tue Jun 12 08:28:48 2012 -0400
>
> nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
>
> Most frequent symptom was a BUG triggering in expire_client, with the
> server locking up shortly thereafter.
>
> Introduced by 508dc6e110c6dbdc0bbe84298ccfe22de7538486 "nfsd41:
> free_session/free_client must be called under the client_lock".
>
> Cc: [email protected]
> Cc: Benny Halevy <[email protected]>

Ack. Thanks!

Benny

> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 8fdc9ec..94effd5 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -900,7 +900,7 @@ static void free_session(struct kref *kref)
> struct nfsd4_session *ses;
> int mem;
>
> - BUG_ON(!spin_is_locked(&client_lock));
> + lockdep_assert_held(&client_lock);
> ses = container_of(kref, struct nfsd4_session, se_ref);
> nfsd4_del_conns(ses);
> spin_lock(&nfsd_drc_lock);
> @@ -1080,7 +1080,7 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name)
> static inline void
> free_client(struct nfs4_client *clp)
> {
> - BUG_ON(!spin_is_locked(&client_lock));
> + lockdep_assert_held(&client_lock);
> while (!list_empty(&clp->cl_sessions)) {
> struct nfsd4_session *ses;
> ses = list_entry(clp->cl_sessions.next, struct nfsd4_session,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-06-13 15:56:52

by J. Bruce Fields

[permalink] [raw]
Subject: Re: RPC: fragment too large with transition to 3.4

On Wed, Jun 13, 2012 at 09:27:03AM +0000, Jamie Heilman wrote:
> Jamie Heilman wrote:
> > It's looking like my issues with "RPC: fragment too large" may be
> > something else entirely at this point, I've noticed other weird
> > network behavior that I'm gonna have to try to isolate before I keep
> > blaming nfs changes. Though for some reason my
> > /proc/fs/nfsd/max_block_size ends up only 128KiB w/3.4 where it was
> > 512KiB w/3.3.
>
> OK, I get it now. 32-bit PAE system w/4G of RAM (minus a chunk for
> the IGP video etc.) for my NFS server, and the max_block_size
> calculation changed significantly in commit
> 508f92275624fc755104b17945bdc822936f1918 to account for rpc buffers
> only being in low memory. That means whereas in 3.3 the math came out
> to having a target size of roughly 843241 my new target size in 3.4 is
> only 219959-ish, so choosing 128KiB is understandable. The problem
> was that all my clients had negotiated their nfs mounts against the
> v3.3 value of 512KiB, and when I rebooted into 3.4... they hit the
> wall attempting larger transfers and become uselessly stuck at that
> point. If I remount everything before doing any large transfers, then
> it negotiates a lower wsize and things work fine. So everything is
> working as planned I suppose... the transition between 3.3 and 3.4 is
> just a bit rough.

Oh, got it, thanks. Yes, now I remember I've seen that problem before.

Perhaps we should be more careful about tweaks to that calculation that
may result in a decreased r/wsize. You could also get into the same
situation if you took the server down to change the amount of RAM, but
only if you were *removing* memory, which is probably unusual.

Best might be if distributions set max_block_size--it should be easier
for userspace to remember the value across reboots.

While we're at it we also want to create an /etc/nfsd.conf that rpc.nfsd
could read, for setting this, and number of threads, and a few other
things. The systemd people would prefer that to the current practice of
sourcing a shell script in /etc/sysconfig or /etc/default.

We could warn about this problem ("don't decrease max_block_size on a
server without unmounting clients first") next to that variable in the
config file.

I think we can do the same calculation as nfsd_create_serv() does from
userspace to set an initial default. I don't know if that should happen
on package install or on first run of rpc.nfsd.

For now that's a project looking for a volunteer, though.

--b.

2012-06-12 10:13:41

by Jamie Heilman

[permalink] [raw]
Subject: Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!

J. Bruce Fields wrote:
> On Sun, Jun 10, 2012 at 09:03:42AM +0000, Jamie Heilman wrote:
> > Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
> > reproduce this reliably:
> >
> > ------------[ cut here ]------------
> > kernel BUG at fs/nfsd/nfs4state.c:1044!
>
> That's
>
> free_client(struct nfs4_client *clp)
> {
> ---> BUG_ON(!spin_is_locked(&client_lock));
>
>
> > invalid opcode: 0000 [#1]
> > Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
> >
> > Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
> > EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
> > EIP is at free_client.isra.47+0x3/0x5 [nfsd]
> > EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
> > ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
> > DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> > CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
> > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > DR6: ffff0ff0 DR7: 00000400
> > Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
> > Stack:
> > e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
> > e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
> > f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
> > Call Trace:
> > [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
>
> And the only free_client call there is:
>
> spin_lock(&client_lock);
> unhash_client_locked(clp);
> if (atomic_read(&clp->cl_refcount) == 0)
> ---> free_client(clp);
> spin_unlock(&client_lock);
>
> So, that's strange.

OK, other observations which may or may not have any value:
- I can't trigger the BUG if I build with CONFIG_SMP (which obviously
changes the RCU as well)
- I can't trigger the BUG if I build with CONFIG_DEBUG_SPINLOCK
- the "RPC: fragment too large" errors I mentioned appear to only occur
with a 3.5-rc client; I can't repro them with a 3.4 client

--
Jamie Heilman http://audible.transient.net/~jamie/

2012-06-12 12:36:45

by J. Bruce Fields

[permalink] [raw]
Subject: Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!

On Tue, Jun 12, 2012 at 07:18:03AM -0400, J. Bruce Fields wrote:
> On Tue, Jun 12, 2012 at 10:13:39AM +0000, Jamie Heilman wrote:
> > J. Bruce Fields wrote:
> > > On Sun, Jun 10, 2012 at 09:03:42AM +0000, Jamie Heilman wrote:
> > > > Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
> > > > reproduce this reliably:
> > > >
> > > > ------------[ cut here ]------------
> > > > kernel BUG at fs/nfsd/nfs4state.c:1044!
> > >
> > > That's
> > >
> > > free_client(struct nfs4_client *clp)
> > > {
> > > ---> BUG_ON(!spin_is_locked(&client_lock));
> > >
> > >
> > > > invalid opcode: 0000 [#1]
> > > > Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
> > > >
> > > > Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
> > > > EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
> > > > EIP is at free_client.isra.47+0x3/0x5 [nfsd]
> > > > EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
> > > > ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
> > > > DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> > > > CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
> > > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > > DR6: ffff0ff0 DR7: 00000400
> > > > Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
> > > > Stack:
> > > > e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
> > > > e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
> > > > f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
> > > > Call Trace:
> > > > [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
> > >
> > > And the only free_client call there is:
> > >
> > > spin_lock(&client_lock);
> > > unhash_client_locked(clp);
> > > if (atomic_read(&clp->cl_refcount) == 0)
> > > ---> free_client(clp);
> > > spin_unlock(&client_lock);
> > >
> > > So, that's strange.
> >
> > OK, other observations which may or may not have any value:
> > - I can't trigger the BUG if I build with CONFIG_SMP (which obviously
> > changes the RCU as well)
> > - I can't trigger the BUG if I build with CONFIG_DEBUG_SPINLOCK
>
> Oh, OK, so probably spin_is_locked() just isn't reliable in the
> !CONFIG_SMP case. That would make sense.
>
> So at a minimum we should change that BUG_ON to a WARN_ON_ONCE.

Doh. Googline around--looks like we want something like this.

I've got a couple other bugfixes to collect and then I'll try to get
this to Linus by the end of the week.

--b.


commit 4cdc12ca178dfa74a9619451be41419617424a07
Author: J. Bruce Fields <[email protected]>
Date: Tue Jun 12 08:28:48 2012 -0400

nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels

Most frequent symptom was a BUG triggering in expire_client, with the
server locking up shortly thereafter.

Introduced by 508dc6e110c6dbdc0bbe84298ccfe22de7538486 "nfsd41:
free_session/free_client must be called under the client_lock".

Cc: [email protected]
Cc: Benny Halevy <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 8fdc9ec..94effd5 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -900,7 +900,7 @@ static void free_session(struct kref *kref)
struct nfsd4_session *ses;
int mem;

- BUG_ON(!spin_is_locked(&client_lock));
+ lockdep_assert_held(&client_lock);
ses = container_of(kref, struct nfsd4_session, se_ref);
nfsd4_del_conns(ses);
spin_lock(&nfsd_drc_lock);
@@ -1080,7 +1080,7 @@ static struct nfs4_client *alloc_client(struct xdr_netobj name)
static inline void
free_client(struct nfs4_client *clp)
{
- BUG_ON(!spin_is_locked(&client_lock));
+ lockdep_assert_held(&client_lock);
while (!list_empty(&clp->cl_sessions)) {
struct nfsd4_session *ses;
ses = list_entry(clp->cl_sessions.next, struct nfsd4_session,

2012-06-13 06:06:30

by Jamie Heilman

[permalink] [raw]
Subject: Re: NFSv4 regression, kernel BUG at fs/nfsd/nfs4state.c:1044!

J. Bruce Fields wrote:
> On Tue, Jun 12, 2012 at 07:18:03AM -0400, J. Bruce Fields wrote:
> > On Tue, Jun 12, 2012 at 10:13:39AM +0000, Jamie Heilman wrote:
> > > J. Bruce Fields wrote:
> > > > On Sun, Jun 10, 2012 at 09:03:42AM +0000, Jamie Heilman wrote:
> > > > > Upgrading my NFSv4 server from 3.3.8 to 3.4.2 I've managed to
> > > > > reproduce this reliably:
> > > > >
> > > > > ------------[ cut here ]------------
> > > > > kernel BUG at fs/nfsd/nfs4state.c:1044!
> > > >
> > > > That's
> > > >
> > > > free_client(struct nfs4_client *clp)
> > > > {
> > > > ---> BUG_ON(!spin_is_locked(&client_lock));
> > > >
> > > >
> > > > > invalid opcode: 0000 [#1]
> > > > > Modules linked in: md5 cpufreq_conservative cpufreq_stats cpufreq_powersave cpufreq_ondemand autofs4 quota_v2 quota_tree nfsd nfs_acl exportfs nfs lockd fscache auth_rpcgss sunrpc xt_mark xt_dscp cls_fw sch_htb iptable_nat nf_nat ipt_REJECT xt_multiport xt_mac xt_tcpudp iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_LOG xt_limit iptable_filter ip_tables x_tables dm_crypt dm_mod snd_hda_codec_via tpm_tis tpm via_rhine snd_hda_intel snd_hda_codec tpm_bios mii via_velocity snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc evdev crc_ccitt via_agp agpgart button
> > > > >
> > > > > Pid: 1804, comm: nfsd Not tainted 3.4.2 #3 To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
> > > > > EIP: 0060:[<f86ac683>] EFLAGS: 00010246 CPU: 0
> > > > > EIP is at free_client.isra.47+0x3/0x5 [nfsd]
> > > > > EAX: 00000000 EBX: e9d48800 ECX: e9d48030 EDX: f86b20f8
> > > > > ESI: e9d48810 EDI: e9cf9ec0 EBP: e9cf9eb4 ESP: e9cf9eb4
> > > > > DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> > > > > CR0: 8005003b CR2: b760bee0 CR3: 34d04000 CR4: 000006b0
> > > > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > > > DR6: ffff0ff0 DR7: 00000400
> > > > > Process nfsd (pid: 1804, ti=e9cf8000 task=e9cf4000 task.ti=e9cf8000)
> > > > > Stack:
> > > > > e9cf9ed4 f86a6c00 e9d48828 e9cf9ec0 e9cf9ec0 e9d48800 e9d48000 00000000
> > > > > e9cf9f00 f86a6d36 00000000 000555c0 f53c0001 4fd43f69 00000001 7ad0b2a0
> > > > > f4c45000 f4c44060 00000360 e9cf9f3c f869f63c f86aff20 e9cb9000 00000000
> > > > > Call Trace:
> > > > > [<f86a6c00>] expire_client+0xb1/0xb9 [nfsd]
> > > >
> > > > And the only free_client call there is:
> > > >
> > > > spin_lock(&client_lock);
> > > > unhash_client_locked(clp);
> > > > if (atomic_read(&clp->cl_refcount) == 0)
> > > > ---> free_client(clp);
> > > > spin_unlock(&client_lock);
> > > >
> > > > So, that's strange.
> > >
> > > OK, other observations which may or may not have any value:
> > > - I can't trigger the BUG if I build with CONFIG_SMP (which obviously
> > > changes the RCU as well)
> > > - I can't trigger the BUG if I build with CONFIG_DEBUG_SPINLOCK
> >
> > Oh, OK, so probably spin_is_locked() just isn't reliable in the
> > !CONFIG_SMP case. That would make sense.
> >
> > So at a minimum we should change that BUG_ON to a WARN_ON_ONCE.
>
> Doh. Googline around--looks like we want something like this.
>
> I've got a couple other bugfixes to collect and then I'll try to get
> this to Linus by the end of the week.
>
> --b.
>
>
> commit 4cdc12ca178dfa74a9619451be41419617424a07
> Author: J. Bruce Fields <[email protected]>
> Date: Tue Jun 12 08:28:48 2012 -0400
>
> nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
>
> Most frequent symptom was a BUG triggering in expire_client, with the
> server locking up shortly thereafter.
>
> Introduced by 508dc6e110c6dbdc0bbe84298ccfe22de7538486 "nfsd41:
> free_session/free_client must be called under the client_lock".
>
> Cc: [email protected]
> Cc: Benny Halevy <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>

Yeah, this fixes the BUG for me. It's looking like my issues with
"RPC: fragment too large" may be something else entirely at this
point, I've noticed other weird network behavior that I'm gonna have
to try to isolate before I keep blaming nfs changes. Though for some
reason my /proc/fs/nfsd/max_block_size ends up only 128KiB w/3.4 where
it was 512KiB w/3.3. But all that aside, this patch fixes the BUG I was
seeing, so thanks!

--
Jamie Heilman http://audible.transient.net/~jamie/