2013-03-24 17:07:46

by Toralf Förster

[permalink] [raw]
Subject: kernel 3.8.4 : kernel BUG at fs/locks.c:2093! part #2

The following bug was triggered twice under a stable 3.8.4 kernel if I export a NFS (v4) directory
and mount it within a user mode linux (running at the same system) and run within that UML trinity
(16 UML trinity processes at a 4-core host CPU) with victim files+directories within that NFS directory.

I could not trigger that bug at the host kernel if I run trinity within UML without victim files
- therefore the issue seems to be tied to NFS. The syslog messages are :

2013-03-24T17:33:16.380+01:00 n22 kernel: ------------[ cut here ]------------
2013-03-24T17:33:16.380+01:00 n22 kernel: kernel BUG at fs/locks.c:2093!
2013-03-24T17:33:16.380+01:00 n22 kernel: invalid opcode: 0000 [#1] SMP
2013-03-24T17:33:16.380+01:00 n22 kernel: Modules linked in: rc_dib0700_rc5 dvb_usb_dib0700 dib3000mc dib8000 dvb_usb dib0070 dib7000m dib7000p dvb_core dibx000_common dib0090 rc_core md5 nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun msr i915 cfbfillrect cfbimgblt i2c_algo_bit cfbcopyarea intel_agp coretemp intel_gtt fbcon snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_pcm sdhci_pci acpi_cpufreq bitblit mperf softcursor font drm_kms_helper uvcvideo thinkpad_acpi drm arc4 iwldvm mac80211 sdhci videobuf2_vmalloc videobuf2_memops videobuf2_core agpgart videodev kvm_intel kvm processor i2c_i801 usblp iwlwifi cfg80211 thermal video nvram snd_page_alloc mmc_core snd_timer thermal_sys fb fbdev rfkill i2c_core snd hwmon e1000e 8250_pci
8250_core serial_core soundcore tpm_tis tpm psmouse tpm_bios button ac evdev battery wmi aesni_intel ablk_helper cryptd lrw aes_i586 xts gf128mul cbc sha256_generic fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
2013-03-24T17:33:16.380+01:00 n22 kernel: Pid: 58, comm: kworker/3:1 Not tainted 3.8.4 #7 LENOVO 4180F65/4180F65
2013-03-24T17:33:16.380+01:00 n22 kernel: EIP: 0060:[<c11600c9>] EFLAGS: 00010246 CPU: 3
2013-03-24T17:33:16.380+01:00 n22 kernel: EIP is at locks_remove_flock+0xe9/0xf0
2013-03-24T17:33:16.380+01:00 n22 kernel: EAX: 00000001 EBX: e6c7f3c0 ECX: 0000002a EDX: 00002a2a
2013-03-24T17:33:16.380+01:00 n22 kernel: ESI: f1f6de88 EDI: e241f880 EBP: f190bec0 ESP: f190be44
2013-03-24T17:33:16.380+01:00 n22 kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
2013-03-24T17:33:16.382+01:00 n22 kernel: CR0: 80050033 CR2: 402f6060 CR3: 2140e000 CR4: 000407f0
2013-03-24T17:33:16.382+01:00 n22 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
2013-03-24T17:33:16.382+01:00 n22 kernel: DR6: ffff0ff0 DR7: 00000400
2013-03-24T17:33:16.382+01:00 n22 kernel: Process kworker/3:1 (pid: 58, ti=f190a000 task=f2124ec0 task.ti=f190a000)
2013-03-24T17:33:16.382+01:00 n22 kernel: Stack:
2013-03-24T17:33:16.382+01:00 n22 kernel: f1f6ddc8 f190be70 c1065355 f190be84 f37e0100 000062de 00000000 eddeba20
2013-03-24T17:33:16.382+01:00 n22 kernel: f37e0100 f2124ec0 f37e0100 0000000a ec84fa28 c13d6fc0 e17e6074 ec84fa00
2013-03-24T17:33:16.382+01:00 n22 kernel: e17e607c f190be9c c1066011 ec84fa00 e17e6074 e6c7f3c8 f1f6ddc8 00000008
2013-03-24T17:33:16.382+01:00 n22 kernel: Call Trace:
2013-03-24T17:33:16.382+01:00 n22 kernel: [<c1065355>] ? sched_clock_cpu+0xf5/0x160
2013-03-24T17:33:16.383+01:00 n22 kernel: [<c1066011>] ? set_next_entity+0x31/0x80
2013-03-24T17:33:16.383+01:00 n22 kernel: [<c111df59>] __fput+0x79/0x1f0
2013-03-24T17:33:16.383+01:00 n22 kernel: [<c111e15c>] delayed_fput+0x7c/0x90
2013-03-24T17:33:16.383+01:00 n22 kernel: [<c104d992>] process_one_work+0x132/0x3a0
2013-03-24T17:33:16.383+01:00 n22 kernel: [<c104f211>] worker_thread+0x121/0x350
2013-03-24T17:33:16.383+01:00 n22 kernel: [<c104f0f0>] ? manage_workers+0x270/0x270
2013-03-24T17:33:16.383+01:00 n22 kernel: [<c1053974>] kthread+0x94/0xa0
2013-03-24T17:33:16.383+01:00 n22 kernel: [<c13d0677>] ret_from_kernel_thread+0x1b/0x28
2013-03-24T17:33:16.383+01:00 n22 kernel: [<c10538e0>] ? flush_kthread_work+0xd0/0xd0
2013-03-24T17:33:16.383+01:00 n22 kernel: Code: f0 e8 6c de ff ff 8b 06 85 c0 75 de 8d b6 00 00 00 00 80 05 84 56 62 c1 01 83 c4 70 5b 5e 5f 5d c3 90 89 f0 e8 f9 dd ff ff eb b9 <0f> 0b 90 90 90 90 90 55 89 e5 53 81 ec 8c 00 00 00 66 66 66 66
2013-03-24T17:33:16.384+01:00 n22 kernel: EIP: [<c11600c9>] locks_remove_flock+0xe9/0xf0 SS:ESP 0068:f190be44
2013-03-24T17:33:16.384+01:00 n22 kernel: ---[ end trace 77babad7e3f33561 ]---
2013-03-24T17:34:16.406+01:00 n22 kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=60000 jiffies g=753704 c=753703 q=3091)
2013-03-24T17:34:16.406+01:00 n22 kernel: Pid: 11464, comm: nfsd Tainted: G D 3.8.4 #7
2013-03-24T17:34:16.406+01:00 n22 kernel: Call Trace:
2013-03-24T17:34:16.406+01:00 n22 kernel: [<c10a8d5b>] rcu_check_callbacks+0x1cb/0x680
2013-03-24T17:34:16.406+01:00 n22 kernel: [<c104384b>] update_process_times+0x3b/0x70
2013-03-24T17:34:16.406+01:00 n22 kernel: [<c1081745>] tick_sched_timer+0x65/0xa0
2013-03-24T17:34:16.406+01:00 n22 kernel: [<c1057450>] ? __remove_hrtimer+0x40/0xa0
2013-03-24T17:34:16.406+01:00 n22 kernel: [<c1057903>] __run_hrtimer+0x73/0x1a0
2013-03-24T17:34:16.406+01:00 n22 kernel: [<c10816e0>] ? tick_nohz_handler+0xe0/0xe0
2013-03-24T17:34:16.406+01:00 n22 kernel: [<c1058538>] hrtimer_interrupt+0xf8/0x2d0
2013-03-24T17:34:16.406+01:00 n22 kernel: [<c13d0e24>] smp_apic_timer_interrupt+0x54/0x88
2013-03-24T17:34:16.406+01:00 n22 kernel: [<c103bbcc>] ? irq_exit+0x5c/0xa0
2013-03-24T17:34:16.406+01:00 n22 kernel: [<c13d0200>] apic_timer_interrupt+0x34/0x3c
2013-03-24T17:34:16.406+01:00 n22 kernel: [<c13cfd52>] ? _raw_spin_lock+0x22/0x30
2013-03-24T17:34:16.406+01:00 n22 kernel: [<c115db42>] lock_flocks+0x12/0x20
2013-03-24T17:34:16.406+01:00 n22 kernel: [<f8c31c23>] check_for_locks.isra.64+0x13/0x60 [nfsd]
2013-03-24T17:34:16.406+01:00 n22 kernel: [<f8c377f8>] nfsd4_release_lockowner+0x168/0x1f0 [nfsd]
2013-03-24T17:34:16.406+01:00 n22 kernel: [<f8c37690>] ? nfsd4_locku+0x2e0/0x2e0 [nfsd]
2013-03-24T17:34:16.406+01:00 n22 kernel: [<f8c2894b>] nfsd4_proc_compound+0x41b/0x530 [nfsd]
2013-03-24T17:34:16.406+01:00 n22 kernel: [<f8c1aba0>] nfsd_dispatch+0xa0/0x1b0 [nfsd]
2013-03-24T17:34:16.407+01:00 n22 kernel: [<f857b8a8>] svc_process+0x3f8/0x6d0 [sunrpc]
2013-03-24T17:34:16.407+01:00 n22 kernel: [<f8586d5e>] ? svc_xprt_received+0x3e/0x80 [sunrpc]
2013-03-24T17:34:16.407+01:00 n22 kernel: [<f85879e1>] ? svc_recv+0x2d1/0x3c0 [sunrpc]
2013-03-24T17:34:16.407+01:00 n22 kernel: [<f8c1a65d>] nfsd+0xad/0x110 [nfsd]
2013-03-24T17:34:16.407+01:00 n22 kernel: [<f8c1a5b0>] ? nfsd_destroy+0x70/0x70 [nfsd]
2013-03-24T17:34:16.407+01:00 n22 kernel: [<c1053974>] kthread+0x94/0xa0
2013-03-24T17:34:16.407+01:00 n22 kernel: [<c13d0677>] ret_from_kernel_thread+0x1b/0x28
2013-03-24T17:34:16.407+01:00 n22 kernel: [<c10538e0>] ? flush_kthread_work+0xd0/0xd0


--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3


2013-03-25 22:01:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: kernel 3.8.4 : kernel BUG at fs/locks.c:2093! part #2

On Sun, Mar 24, 2013 at 06:02:39PM +0100, Toralf Förster wrote:
> The following bug was triggered twice under a stable 3.8.4 kernel if I export a NFS (v4) directory
> and mount it within a user mode linux (running at the same system) and run within that UML trinity
> (16 UML trinity processes at a 4-core host CPU) with victim files+directories within that NFS directory.
>
> I could not trigger that bug at the host kernel if I run trinity within UML without victim files
> - therefore the issue seems to be tied to NFS. The syslog messages are :

Thanks!

fs/locks.c:2093 says that we did the final fput of a file that still had
posix locks held on it.

I can't see how that would happen, but admittedly the nfsd4 code here is
much too complicated for its own good.

If it were possible it would be useful to know what lead up to this. A
network trace (tcpdump -s0 -wtmp.pcap, then send me tmp.pcap) would be
useful for that, but I fear it's probably too huge by the time you hit
the bug.

(The following warning ("rcu_sched self-detected stall") looks like the
result of BUGing while holding file_lock_lock. That BUG should probably
be downgraded to a WARN_ON_ONCE.)

--b.

>
> 2013-03-24T17:33:16.380+01:00 n22 kernel: ------------[ cut here ]------------
> 2013-03-24T17:33:16.380+01:00 n22 kernel: kernel BUG at fs/locks.c:2093!
> 2013-03-24T17:33:16.380+01:00 n22 kernel: invalid opcode: 0000 [#1] SMP
> 2013-03-24T17:33:16.380+01:00 n22 kernel: Modules linked in: rc_dib0700_rc5 dvb_usb_dib0700 dib3000mc dib8000 dvb_usb dib0070 dib7000m dib7000p dvb_core dibx000_common dib0090 rc_core md5 nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun msr i915 cfbfillrect cfbimgblt i2c_algo_bit cfbcopyarea intel_agp coretemp intel_gtt fbcon snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_pcm sdhci_pci acpi_cpufreq bitblit mperf softcursor font drm_kms_helper uvcvideo thinkpad_acpi drm arc4 iwldvm mac80211 sdhci videobuf2_vmalloc videobuf2_memops videobuf2_core agpgart videodev kvm_intel kvm processor i2c_i801 usblp iwlwifi cfg80211 thermal video nvram snd_page_alloc mmc_core snd_timer thermal_sys fb fbdev rfkill i2c_core snd hwmon e1000e 8250_pci
> 8250_core serial_core soundcore tpm_tis tpm psmouse tpm_bios button ac evdev battery wmi aesni_intel ablk_helper cryptd lrw aes_i586 xts gf128mul cbc sha256_generic fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
> 2013-03-24T17:33:16.380+01:00 n22 kernel: Pid: 58, comm: kworker/3:1 Not tainted 3.8.4 #7 LENOVO 4180F65/4180F65
> 2013-03-24T17:33:16.380+01:00 n22 kernel: EIP: 0060:[<c11600c9>] EFLAGS: 00010246 CPU: 3
> 2013-03-24T17:33:16.380+01:00 n22 kernel: EIP is at locks_remove_flock+0xe9/0xf0
> 2013-03-24T17:33:16.380+01:00 n22 kernel: EAX: 00000001 EBX: e6c7f3c0 ECX: 0000002a EDX: 00002a2a
> 2013-03-24T17:33:16.380+01:00 n22 kernel: ESI: f1f6de88 EDI: e241f880 EBP: f190bec0 ESP: f190be44
> 2013-03-24T17:33:16.380+01:00 n22 kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> 2013-03-24T17:33:16.382+01:00 n22 kernel: CR0: 80050033 CR2: 402f6060 CR3: 2140e000 CR4: 000407f0
> 2013-03-24T17:33:16.382+01:00 n22 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> 2013-03-24T17:33:16.382+01:00 n22 kernel: DR6: ffff0ff0 DR7: 00000400
> 2013-03-24T17:33:16.382+01:00 n22 kernel: Process kworker/3:1 (pid: 58, ti=f190a000 task=f2124ec0 task.ti=f190a000)
> 2013-03-24T17:33:16.382+01:00 n22 kernel: Stack:
> 2013-03-24T17:33:16.382+01:00 n22 kernel: f1f6ddc8 f190be70 c1065355 f190be84 f37e0100 000062de 00000000 eddeba20
> 2013-03-24T17:33:16.382+01:00 n22 kernel: f37e0100 f2124ec0 f37e0100 0000000a ec84fa28 c13d6fc0 e17e6074 ec84fa00
> 2013-03-24T17:33:16.382+01:00 n22 kernel: e17e607c f190be9c c1066011 ec84fa00 e17e6074 e6c7f3c8 f1f6ddc8 00000008
> 2013-03-24T17:33:16.382+01:00 n22 kernel: Call Trace:
> 2013-03-24T17:33:16.382+01:00 n22 kernel: [<c1065355>] ? sched_clock_cpu+0xf5/0x160
> 2013-03-24T17:33:16.383+01:00 n22 kernel: [<c1066011>] ? set_next_entity+0x31/0x80
> 2013-03-24T17:33:16.383+01:00 n22 kernel: [<c111df59>] __fput+0x79/0x1f0
> 2013-03-24T17:33:16.383+01:00 n22 kernel: [<c111e15c>] delayed_fput+0x7c/0x90
> 2013-03-24T17:33:16.383+01:00 n22 kernel: [<c104d992>] process_one_work+0x132/0x3a0
> 2013-03-24T17:33:16.383+01:00 n22 kernel: [<c104f211>] worker_thread+0x121/0x350
> 2013-03-24T17:33:16.383+01:00 n22 kernel: [<c104f0f0>] ? manage_workers+0x270/0x270
> 2013-03-24T17:33:16.383+01:00 n22 kernel: [<c1053974>] kthread+0x94/0xa0
> 2013-03-24T17:33:16.383+01:00 n22 kernel: [<c13d0677>] ret_from_kernel_thread+0x1b/0x28
> 2013-03-24T17:33:16.383+01:00 n22 kernel: [<c10538e0>] ? flush_kthread_work+0xd0/0xd0
> 2013-03-24T17:33:16.383+01:00 n22 kernel: Code: f0 e8 6c de ff ff 8b 06 85 c0 75 de 8d b6 00 00 00 00 80 05 84 56 62 c1 01 83 c4 70 5b 5e 5f 5d c3 90 89 f0 e8 f9 dd ff ff eb b9 <0f> 0b 90 90 90 90 90 55 89 e5 53 81 ec 8c 00 00 00 66 66 66 66
> 2013-03-24T17:33:16.384+01:00 n22 kernel: EIP: [<c11600c9>] locks_remove_flock+0xe9/0xf0 SS:ESP 0068:f190be44
> 2013-03-24T17:33:16.384+01:00 n22 kernel: ---[ end trace 77babad7e3f33561 ]---
> 2013-03-24T17:34:16.406+01:00 n22 kernel: INFO: rcu_sched self-detected stall on CPU { 3} (t=60000 jiffies g=753704 c=753703 q=3091)
> 2013-03-24T17:34:16.406+01:00 n22 kernel: Pid: 11464, comm: nfsd Tainted: G D 3.8.4 #7
> 2013-03-24T17:34:16.406+01:00 n22 kernel: Call Trace:
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<c10a8d5b>] rcu_check_callbacks+0x1cb/0x680
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<c104384b>] update_process_times+0x3b/0x70
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<c1081745>] tick_sched_timer+0x65/0xa0
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<c1057450>] ? __remove_hrtimer+0x40/0xa0
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<c1057903>] __run_hrtimer+0x73/0x1a0
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<c10816e0>] ? tick_nohz_handler+0xe0/0xe0
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<c1058538>] hrtimer_interrupt+0xf8/0x2d0
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<c13d0e24>] smp_apic_timer_interrupt+0x54/0x88
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<c103bbcc>] ? irq_exit+0x5c/0xa0
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<c13d0200>] apic_timer_interrupt+0x34/0x3c
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<c13cfd52>] ? _raw_spin_lock+0x22/0x30
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<c115db42>] lock_flocks+0x12/0x20
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<f8c31c23>] check_for_locks.isra.64+0x13/0x60 [nfsd]
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<f8c377f8>] nfsd4_release_lockowner+0x168/0x1f0 [nfsd]
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<f8c37690>] ? nfsd4_locku+0x2e0/0x2e0 [nfsd]
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<f8c2894b>] nfsd4_proc_compound+0x41b/0x530 [nfsd]
> 2013-03-24T17:34:16.406+01:00 n22 kernel: [<f8c1aba0>] nfsd_dispatch+0xa0/0x1b0 [nfsd]
> 2013-03-24T17:34:16.407+01:00 n22 kernel: [<f857b8a8>] svc_process+0x3f8/0x6d0 [sunrpc]
> 2013-03-24T17:34:16.407+01:00 n22 kernel: [<f8586d5e>] ? svc_xprt_received+0x3e/0x80 [sunrpc]
> 2013-03-24T17:34:16.407+01:00 n22 kernel: [<f85879e1>] ? svc_recv+0x2d1/0x3c0 [sunrpc]
> 2013-03-24T17:34:16.407+01:00 n22 kernel: [<f8c1a65d>] nfsd+0xad/0x110 [nfsd]
> 2013-03-24T17:34:16.407+01:00 n22 kernel: [<f8c1a5b0>] ? nfsd_destroy+0x70/0x70 [nfsd]
> 2013-03-24T17:34:16.407+01:00 n22 kernel: [<c1053974>] kthread+0x94/0xa0
> 2013-03-24T17:34:16.407+01:00 n22 kernel: [<c13d0677>] ret_from_kernel_thread+0x1b/0x28
> 2013-03-24T17:34:16.407+01:00 n22 kernel: [<c10538e0>] ? flush_kthread_work+0xd0/0xd0
>
>
> --
> MfG/Sincerely
> Toralf Förster
> pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-03-26 14:46:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: kernel 3.8.4 : kernel BUG at fs/locks.c:2093! part #2

On Mon, Mar 25, 2013 at 06:01:43PM -0400, bfields wrote:
> On Sun, Mar 24, 2013 at 06:02:39PM +0100, Toralf Förster wrote:
> > The following bug was triggered twice under a stable 3.8.4 kernel if I export a NFS (v4) directory
> > and mount it within a user mode linux (running at the same system) and run within that UML trinity
> > (16 UML trinity processes at a 4-core host CPU) with victim files+directories within that NFS directory.
> >
> > I could not trigger that bug at the host kernel if I run trinity within UML without victim files
> > - therefore the issue seems to be tied to NFS. The syslog messages are :
>
> Thanks!
>
> fs/locks.c:2093 says that we did the final fput of a file that still had
> posix locks held on it.
>
> I can't see how that would happen, but admittedly the nfsd4 code here is
> much too complicated for its own good.
>
> If it were possible it would be useful to know what lead up to this. A
> network trace (tcpdump -s0 -wtmp.pcap, then send me tmp.pcap) would be
> useful for that, but I fear it's probably too huge by the time you hit
> the bug.
>
> (The following warning ("rcu_sched self-detected stall") looks like the
> result of BUGing while holding file_lock_lock. That BUG should probably
> be downgraded to a WARN_ON_ONCE.)

Can you run with test patches?

This just makes nfsd's fput calls synchronous so that we see in the
backtrace who called them.

(But assuming it does turn out to be one of these callers, we may then
need some more printk's in the nfsd code...).

--b.

commit 27a3395a72e1d9be121f5493e7a330d60b3d1de3
Author: J. Bruce Fields <[email protected]>
Date: Tue Mar 26 09:37:07 2013 -0400

nfsd4: convert fput to fput_sync for debugging purposes

Just a temporary hack to get backtraces out of __fput failures.

Also, demote a BUG to a WARN_ON_ONCE.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/file_table.c b/fs/file_table.c
index de9e965..79cb999 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -337,6 +337,8 @@ void __fput_sync(struct file *file)
}
}

+EXPORT_SYMBOL(__fput_sync);
+
EXPORT_SYMBOL(fput);

void put_filp(struct file *file)
diff --git a/fs/locks.c b/fs/locks.c
index a94e331..c962161 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2090,7 +2090,7 @@ void locks_remove_flock(struct file *filp)
continue;
}
/* What? */
- BUG();
+ WARN_ON_ONCE(1);
}
before = &fl->fl_next;
}
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 9d1c5db..c01c60c 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -951,7 +951,7 @@ nfsd4_write(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
write->wr_offset, rqstp->rq_vec, nvecs,
&cnt, &write->wr_how_written);
if (filp)
- fput(filp);
+ __fput_sync(filp);

write->wr_bytes_written = cnt;

@@ -1325,7 +1325,7 @@ encode_op:
}
/* XXX Ugh, we need to get rid of this kind of special case: */
if (op->opnum == OP_READ && op->u.read.rd_filp)
- fput(op->u.read.rd_filp);
+ __fput_sync(op->u.read.rd_filp);

nfsd4_increment_op_stats(op->opnum);
}
diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index ba6fdd4..dfdf54c 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -546,7 +546,7 @@ nfsd4_shutdown_recdir(struct nfsd_net *nn)
{
if (!nn->rec_file)
return;
- fput(nn->rec_file);
+ __fput_sync(nn->rec_file);
nn->rec_file = NULL;
}

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a8309c6..0f38b39 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -201,7 +201,7 @@ static void nfs4_file_get_access(struct nfs4_file *fp, int oflag)
static void nfs4_file_put_fd(struct nfs4_file *fp, int oflag)
{
if (fp->fi_fds[oflag]) {
- fput(fp->fi_fds[oflag]);
+ __fput_sync(fp->fi_fds[oflag]);
fp->fi_fds[oflag] = NULL;
}
}
@@ -353,7 +353,7 @@ static void nfs4_put_deleg_lease(struct nfs4_file *fp)
if (atomic_dec_and_test(&fp->fi_delegees)) {
vfs_setlease(fp->fi_deleg_file, F_UNLCK, &fp->fi_lease);
fp->fi_lease = NULL;
- fput(fp->fi_deleg_file);
+ __fput_sync(fp->fi_deleg_file);
fp->fi_deleg_file = NULL;
}
}
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index d586117..76bb67c 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -826,7 +826,7 @@ out:
void
nfsd_close(struct file *filp)
{
- fput(filp);
+ __fput_sync(filp);
}

/*

2013-03-26 17:37:53

by Al Viro

[permalink] [raw]
Subject: Re: kernel 3.8.4 : kernel BUG at fs/locks.c:2093! part #2

On Tue, Mar 26, 2013 at 10:46:40AM -0400, J. Bruce Fields wrote:
> > fs/locks.c:2093 says that we did the final fput of a file that still had
> > posix locks held on it.
> >
> > I can't see how that would happen, but admittedly the nfsd4 code here is
> > much too complicated for its own good.
> >
> > If it were possible it would be useful to know what lead up to this. A
> > network trace (tcpdump -s0 -wtmp.pcap, then send me tmp.pcap) would be
> > useful for that, but I fear it's probably too huge by the time you hit
> > the bug.
> >
> > (The following warning ("rcu_sched self-detected stall") looks like the
> > result of BUGing while holding file_lock_lock. That BUG should probably
> > be downgraded to a WARN_ON_ONCE.)
>
> Can you run with test patches?
>
> This just makes nfsd's fput calls synchronous so that we see in the
> backtrace who called them.
>
> (But assuming it does turn out to be one of these callers, we may then
> need some more printk's in the nfsd code...).

Might also be a good idea to dump the offending struct file_lock, while
we are at it...

2013-03-26 17:46:46

by Toralf Förster

[permalink] [raw]
Subject: Re: kernel 3.8.4 : kernel BUG at fs/locks.c:2093! part #2

On 03/26/2013 03:46 PM, J. Bruce Fields wrote:
> Can you run with test patches?
>
> This just makes nfsd's fput calls synchronous so that we see in the
> backtrace who called them.

Well - the patched 3.8.4 host kernel now survives the stress test of the UML system.
What I get at the host is now :

2013-03-26T18:32:17.487+01:00 n22 kernel: ------------[ cut here ]------------
2013-03-26T18:32:17.487+01:00 n22 kernel: WARNING: at mm/page_alloc.c:2376 __alloc_pages_nodemask+0x262/0x7c0()
2013-03-26T18:32:17.487+01:00 n22 kernel: Hardware name: 4180F65
2013-03-26T18:32:17.487+01:00 n22 kernel: Modules linked in: md5 nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun msr snd_hda_codec_conexant snd_hda_intel uvcvideo i915 usblp snd_hda_codec sdhci_pci snd_pcm cfbfillrect videobuf2_vmalloc arc4 iwldvm cfbimgblt i2c_algo_bit cfbcopyarea intel_agp videobuf2_memops sdhci coretemp videobuf2_core mac80211 snd_page_alloc kvm_intel acpi_cpufreq fbcon mperf bitblit softcursor font intel_gtt videodev mmc_core snd_timer kvm drm_kms_helper thinkpad_acpi drm nvram agpgart fb snd i2c_i801 video i2c_core tpm_tis soundcore processor 8250_pci 8250_core iwlwifi thermal battery cfg80211 fbdev ac tpm rfkill psmouse serial_core e1000e evdev thermal_sys button tpm_bios wmi hwmon aesni_intel ablk_helper cryptd lr
w aes_i586 xts gf128mul cbc sha256_generic fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
2013-03-26T18:32:17.487+01:00 n22 kernel: Pid: 6614, comm: nfsd Not tainted 3.8.4 #9
2013-03-26T18:32:17.487+01:00 n22 kernel: Call Trace:
2013-03-26T18:32:17.487+01:00 n22 kernel: [<c1033ba2>] warn_slowpath_common+0x72/0xa0
2013-03-26T18:32:17.487+01:00 n22 kernel: [<c10db802>] ? __alloc_pages_nodemask+0x262/0x7c0
2013-03-26T18:32:17.487+01:00 n22 kernel: [<c10db802>] ? __alloc_pages_nodemask+0x262/0x7c0
2013-03-26T18:32:17.487+01:00 n22 kernel: [<c1033bf2>] warn_slowpath_null+0x22/0x30
2013-03-26T18:32:17.487+01:00 n22 kernel: [<c10db802>] __alloc_pages_nodemask+0x262/0x7c0
2013-03-26T18:32:17.487+01:00 n22 kernel: [<c1069758>] ? dequeue_task_fair+0x258/0x6f0
2013-03-26T18:32:17.487+01:00 n22 kernel: [<c13a8803>] ? inet_recvmsg+0x73/0x90
2013-03-26T18:32:17.487+01:00 n22 kernel: [<c10dbd7c>] __get_free_pages+0x1c/0x30
2013-03-26T18:32:17.487+01:00 n22 kernel: [<c110e0f9>] kmalloc_order_trace+0x29/0xb0
2013-03-26T18:32:17.487+01:00 n22 kernel: [<f857e480>] ? ip_map_request+0xd0/0xd0 [sunrpc]
2013-03-26T18:32:17.487+01:00 n22 kernel: [<f85824d6>] ? cache_revisit_request+0x26/0xd0 [sunrpc]
2013-03-26T18:32:17.487+01:00 n22 kernel: [<c110f149>] __kmalloc+0x1b9/0x1e0
2013-03-26T18:32:17.487+01:00 n22 kernel: [<f85835bf>] ? cache_check+0x22f/0x340 [sunrpc]
2013-03-26T18:32:17.487+01:00 n22 kernel: [<f90a0b1c>] nfs4_acl_new+0x1c/0x30 [nfsd]
2013-03-26T18:32:17.488+01:00 n22 kernel: [<f9095482>] nfsd4_decode_fattr+0x302/0x6c0 [nfsd]
2013-03-26T18:32:17.488+01:00 n22 kernel: [<c110f068>] ? __kmalloc+0xd8/0x1e0
2013-03-26T18:32:17.488+01:00 n22 kernel: [<f9095871>] nfsd4_decode_setattr+0x31/0x40 [nfsd]
2013-03-26T18:32:17.488+01:00 n22 kernel: [<f909804d>] nfs4svc_decode_compoundargs+0x23d/0x280 [nfsd]
2013-03-26T18:32:17.488+01:00 n22 kernel: [<f9097e10>] ? nfsd4_release_compoundargs+0x70/0x70 [nfsd]
2013-03-26T18:32:17.488+01:00 n22 kernel: [<f9081b3f>] nfsd_dispatch+0x3f/0x1b0 [nfsd]
2013-03-26T18:32:17.488+01:00 n22 kernel: [<f857b8a8>] svc_process+0x3f8/0x6d0 [sunrpc]
2013-03-26T18:32:17.488+01:00 n22 kernel: [<f8586d5e>] ? svc_xprt_received+0x3e/0x80 [sunrpc]
2013-03-26T18:32:17.488+01:00 n22 kernel: [<f85879e1>] ? svc_recv+0x2d1/0x3c0 [sunrpc]
2013-03-26T18:32:17.488+01:00 n22 kernel: [<f908165d>] nfsd+0xad/0x110 [nfsd]
2013-03-26T18:32:17.488+01:00 n22 kernel: [<f90815b0>] ? nfsd_destroy+0x70/0x70 [nfsd]
2013-03-26T18:32:17.488+01:00 n22 kernel: [<c1053b94>] kthread+0x94/0xa0
2013-03-26T18:32:17.488+01:00 n22 kernel: [<c13d0f77>] ret_from_kernel_thread+0x1b/0x28
2013-03-26T18:32:17.488+01:00 n22 kernel: [<c1053b00>] ? flush_kthread_work+0xd0/0xd0
2013-03-26T18:32:17.488+01:00 n22 kernel: ---[ end trace 82b96ff8bcd5453f ]---


--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2013-03-26 18:17:20

by J. Bruce Fields

[permalink] [raw]
Subject: Re: kernel 3.8.4 : kernel BUG at fs/locks.c:2093! part #2

On Tue, Mar 26, 2013 at 06:46:31PM +0100, Toralf Förster wrote:
> On 03/26/2013 03:46 PM, J. Bruce Fields wrote:
> > Can you run with test patches?
> >
> > This just makes nfsd's fput calls synchronous so that we see in the
> > backtrace who called them.
>
> Well - the patched 3.8.4 host kernel now survives the stress test of the UML system.

Bah, too bad. That patch was definitely not a fix, so there may be some
race here.

> What I get at the host is now :
>
> 2013-03-26T18:32:17.487+01:00 n22 kernel: ------------[ cut here ]------------
> 2013-03-26T18:32:17.487+01:00 n22 kernel: WARNING: at mm/page_alloc.c:2376 __alloc_pages_nodemask+0x262/0x7c0()
...
> 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c110f149>] __kmalloc+0x1b9/0x1e0
> 2013-03-26T18:32:17.487+01:00 n22 kernel: [<f85835bf>] ? cache_check+0x22f/0x340 [sunrpc]
> 2013-03-26T18:32:17.487+01:00 n22 kernel: [<f90a0b1c>] nfs4_acl_new+0x1c/0x30 [nfsd]
> 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f9095482>] nfsd4_decode_fattr+0x302/0x6c0 [nfsd]
...

A different bug, but thanks for catching it, I suspect the following is
all we need.

--b.

commit 814d9d4f9164c3d778dadd093a54bb55d9a0c576
Author: J. Bruce Fields <[email protected]>
Date: Tue Mar 26 14:11:13 2013 -0400

nfsd4: reject "negative" acl lengths

Since we only enforce an upper bound, not a lower bound, a "negative"
length can get through here.

The symptom seen was a warning when we attempt to a kmalloc with an
excessive size.

Reported-by: Toralf Förster <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 0dc1158..d1dd710 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -264,7 +264,7 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval,
iattr->ia_valid |= ATTR_SIZE;
}
if (bmval[0] & FATTR4_WORD0_ACL) {
- int nace;
+ u32 nace;
struct nfs4_ace *ace;

READ_BUF(4); len += 4;

2013-03-27 17:25:53

by Toralf Förster

[permalink] [raw]
Subject: Re: kernel 3.8.4 : kernel BUG at fs/locks.c:2093! part #2

On 03/26/2013 07:17 PM, J. Bruce Fields wrote:
>
> Bah, too bad. That patch was definitely not a fix, so there may be some
> race here.
>
>> What I get at the host is now :
>>
>> 2013-03-26T18:32:17.487+01:00 n22 kernel: ------------[ cut here ]------------
>> 2013-03-26T18:32:17.487+01:00 n22 kernel: WARNING: at mm/page_alloc.c:2376 __alloc_pages_nodemask+0x262/0x7c0()
> ...
>> 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c110f149>] __kmalloc+0x1b9/0x1e0
>> 2013-03-26T18:32:17.487+01:00 n22 kernel: [<f85835bf>] ? cache_check+0x22f/0x340 [sunrpc]
>> 2013-03-26T18:32:17.487+01:00 n22 kernel: [<f90a0b1c>] nfs4_acl_new+0x1c/0x30 [nfsd]
>> 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f9095482>] nfsd4_decode_fattr+0x302/0x6c0 [nfsd]
> ...
>
> A different bug, but thanks for catching it, I suspect the following is
> all we need.
>
> --b.
>
> commit 814d9d4f9164c3d778dadd093a54bb55d9a0c576
> Author: J. Bruce Fields <[email protected]>
> Date: Tue Mar 26 14:11:13 2013 -0400
>
> nfsd4: reject "negative" acl lengths
>
> Since we only enforce an upper bound, not a lower bound, a "negative"
> length can get through here.
>
> The symptom seen was a warning when we attempt to a kmalloc with an
> excessive size.
>
> Reported-by: Toralf Förster <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 0dc1158..d1dd710 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -264,7 +264,7 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval,
> iattr->ia_valid |= ATTR_SIZE;
> }
> if (bmval[0] & FATTR4_WORD0_ACL) {
> - int nace;
> + u32 nace;
> struct nfs4_ace *ace;
>
> READ_BUF(4); len += 4;
>

I applied that patach on top of 3.8.4 and wonders now, whether the
following is the consequence :

$ df -m /tmp/forT/victims/
Filesystem 1M-blocks Used Available Use% Mounted on
/dev/sdb3 183851 34907 139599 21% /

$ sudo ls -lh --color /tmp/forT/victims/f062
---xr-S--T 2 tfoerste users 985G Mar 27 18:15 /tmp/forT/victims/f062

ls shows a 1 TB file within a partition where just 34 MB are used at all
(its only one partition in that system and a separate small /boot too).

--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2013-03-27 17:46:29

by J. Bruce Fields

[permalink] [raw]
Subject: Re: kernel 3.8.4 : kernel BUG at fs/locks.c:2093! part #2

On Wed, Mar 27, 2013 at 06:20:42PM +0100, Toralf Förster wrote:
> On 03/26/2013 07:17 PM, J. Bruce Fields wrote:
> >
> > Bah, too bad. That patch was definitely not a fix, so there may be some
> > race here.
> >
> >> What I get at the host is now :
> >>
> >> 2013-03-26T18:32:17.487+01:00 n22 kernel: ------------[ cut here ]------------
> >> 2013-03-26T18:32:17.487+01:00 n22 kernel: WARNING: at mm/page_alloc.c:2376 __alloc_pages_nodemask+0x262/0x7c0()
> > ...
> >> 2013-03-26T18:32:17.487+01:00 n22 kernel: [<c110f149>] __kmalloc+0x1b9/0x1e0
> >> 2013-03-26T18:32:17.487+01:00 n22 kernel: [<f85835bf>] ? cache_check+0x22f/0x340 [sunrpc]
> >> 2013-03-26T18:32:17.487+01:00 n22 kernel: [<f90a0b1c>] nfs4_acl_new+0x1c/0x30 [nfsd]
> >> 2013-03-26T18:32:17.488+01:00 n22 kernel: [<f9095482>] nfsd4_decode_fattr+0x302/0x6c0 [nfsd]
> > ...
> >
> > A different bug, but thanks for catching it, I suspect the following is
> > all we need.
> >
> > --b.
> >
> > commit 814d9d4f9164c3d778dadd093a54bb55d9a0c576
> > Author: J. Bruce Fields <[email protected]>
> > Date: Tue Mar 26 14:11:13 2013 -0400
> >
> > nfsd4: reject "negative" acl lengths
> >
> > Since we only enforce an upper bound, not a lower bound, a "negative"
> > length can get through here.
> >
> > The symptom seen was a warning when we attempt to a kmalloc with an
> > excessive size.
> >
> > Reported-by: Toralf Förster <[email protected]>
> > Signed-off-by: J. Bruce Fields <[email protected]>
> >
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 0dc1158..d1dd710 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -264,7 +264,7 @@ nfsd4_decode_fattr(struct nfsd4_compoundargs *argp, u32 *bmval,
> > iattr->ia_valid |= ATTR_SIZE;
> > }
> > if (bmval[0] & FATTR4_WORD0_ACL) {
> > - int nace;
> > + u32 nace;
> > struct nfs4_ace *ace;
> >
> > READ_BUF(4); len += 4;
> >
>
> I applied that patach on top of 3.8.4 and wonders now, whether the
> following is the consequence :
>
> $ df -m /tmp/forT/victims/
> Filesystem 1M-blocks Used Available Use% Mounted on
> /dev/sdb3 183851 34907 139599 21% /
>
> $ sudo ls -lh --color /tmp/forT/victims/f062
> ---xr-S--T 2 tfoerste users 985G Mar 27 18:15 /tmp/forT/victims/f062
>
> ls shows a 1 TB file within a partition where just 34 MB are used at all
> (its only one partition in that system and a separate small /boot too).

It's completely normal for a large file to only take up a small amount
of space, if the file is sparse. Is that what you're asking about?

Otherwise, are you seeing any problems or any backtraces in the logs?

--b.

2013-03-27 21:35:25

by Toralf Förster

[permalink] [raw]
Subject: Re: kernel 3.8.4 : kernel BUG at fs/locks.c:2093! part #2

On 03/26/2013 03:46 PM, J. Bruce Fields wrote:
> Can you run with test patches?

applied - today I got a warning instead a bug at the host kernel (3.8.4):


2013-03-27T22:28:43.480+01:00 n22 kernel: ------------[ cut here ]------------
2013-03-27T22:28:43.480+01:00 n22 kernel: WARNING: at fs/locks.c:2093 locks_remove_flock+0xe5/0x110()
2013-03-27T22:28:43.480+01:00 n22 kernel: Hardware name: 4180F65
2013-03-27T22:28:43.480+01:00 n22 kernel: Modules linked in: loop md5 nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun msr i915 cfbfillrect cfbimgblt i2c_algo_bit cfbcopyarea intel_agp intel_gtt sdhci_pci fbcon bitblit softcursor font drm_kms_helper sdhci usblp drm arc4 mmc_core iwldvm mac80211 uvcvideo videobuf2_vmalloc snd_hda_codec_conexant videobuf2_memops videobuf2_core coretemp videodev acpi_cpufreq iwlwifi mperf cfg80211 snd_hda_intel agpgart fb fbdev i2c_i801 thinkpad_acpi processor video thermal ac snd_hda_codec snd_pcm snd_page_alloc 8250_pci i2c_core kvm_intel snd_timer nvram snd soundcore 8250_core e1000e serial_core button tpm_tis tpm kvm rfkill tpm_bios psmouse thermal_sys battery wmi evdev hwmon aesni_intel ablk_helper cryp
td lrw aes_i586 xts gf128mul cbc sha256_generic fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
2013-03-27T22:28:43.480+01:00 n22 kernel: Pid: 7442, comm: nfsd Not tainted 3.8.4 #13
2013-03-27T22:28:43.480+01:00 n22 kernel: Call Trace:
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1033ba2>] warn_slowpath_common+0x72/0xa0
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1160885>] ? locks_remove_flock+0xe5/0x110
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1160885>] ? locks_remove_flock+0xe5/0x110
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1033bf2>] warn_slowpath_null+0x22/0x30
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1160885>] locks_remove_flock+0xe5/0x110
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c110eb53>] ? kmem_cache_alloc+0x73/0x140
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c111e719>] __fput+0x79/0x1f0
2013-03-27T22:28:43.480+01:00 n22 kernel: [<c111ed1b>] __fput_sync+0x4b/0x60
2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bb8b17>] __nfs4_file_put_access+0x57/0x80 [nfsd]
2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bb8b73>] nfs4_file_put_access+0x33/0x40 [nfsd]
2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bbd4f6>] nfsd4_open_downgrade+0x1b6/0x210 [nfsd]
2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bbd340>] ? nfsd4_open_confirm+0x130/0x130 [nfsd]
2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8baf94b>] nfsd4_proc_compound+0x41b/0x530 [nfsd]
2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bbf180>] ? nfsd4_set_lockstateid+0x50/0x50 [nfsd]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8ba1ba0>] nfsd_dispatch+0xa0/0x1b0 [nfsd]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<f857b8a8>] svc_process+0x3f8/0x6d0 [sunrpc]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8586d5e>] ? svc_xprt_received+0x3e/0x80 [sunrpc]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<f85879e1>] ? svc_recv+0x2d1/0x3c0 [sunrpc]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8ba165d>] nfsd+0xad/0x110 [nfsd]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8ba15b0>] ? nfsd_destroy+0x70/0x70 [nfsd]
2013-03-27T22:28:43.481+01:00 n22 kernel: [<c1053b94>] kthread+0x94/0xa0
2013-03-27T22:28:43.481+01:00 n22 kernel: [<c13d0f77>] ret_from_kernel_thread+0x1b/0x28
2013-03-27T22:28:43.481+01:00 n22 kernel: [<c1053b00>] ? flush_kthread_work+0xd0/0xd0
2013-03-27T22:28:43.481+01:00 n22 kernel: ---[ end trace 66f6fe36b0d13e8f ]---


The UML kernel was 3.9.0-rc4-00144-ga8c4528, and the trinity command just :
$> trinity --children 2 --victims /mnt/nfs/n22/victims -x mremap

--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2013-03-27 21:57:29

by J. Bruce Fields

[permalink] [raw]
Subject: Re: kernel 3.8.4 : kernel BUG at fs/locks.c:2093! part #2

On Wed, Mar 27, 2013 at 10:35:12PM +0100, Toralf Förster wrote:
> On 03/26/2013 03:46 PM, J. Bruce Fields wrote:
> > Can you run with test patches?
>
> applied - today I got a warning instead a bug at the host kernel (3.8.4):

Excellent! Thanks, I think I see the bug now....

--b.

> 2013-03-27T22:28:43.480+01:00 n22 kernel: ------------[ cut here ]------------
> 2013-03-27T22:28:43.480+01:00 n22 kernel: WARNING: at fs/locks.c:2093 locks_remove_flock+0xe5/0x110()
> 2013-03-27T22:28:43.480+01:00 n22 kernel: Hardware name: 4180F65
> 2013-03-27T22:28:43.480+01:00 n22 kernel: Modules linked in: loop md5 nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun msr i915 cfbfillrect cfbimgblt i2c_algo_bit cfbcopyarea intel_agp intel_gtt sdhci_pci fbcon bitblit softcursor font drm_kms_helper sdhci usblp drm arc4 mmc_core iwldvm mac80211 uvcvideo videobuf2_vmalloc snd_hda_codec_conexant videobuf2_memops videobuf2_core coretemp videodev acpi_cpufreq iwlwifi mperf cfg80211 snd_hda_intel agpgart fb fbdev i2c_i801 thinkpad_acpi processor video thermal ac snd_hda_codec snd_pcm snd_page_alloc 8250_pci i2c_core kvm_intel snd_timer nvram snd soundcore 8250_core e1000e serial_core button tpm_tis tpm kvm rfkill tpm_bios psmouse thermal_sys battery wmi evdev hwmon aesni_intel ablk_helper cryp
> td lrw aes_i586 xts gf128mul cbc sha256_generic fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
> 2013-03-27T22:28:43.480+01:00 n22 kernel: Pid: 7442, comm: nfsd Not tainted 3.8.4 #13
> 2013-03-27T22:28:43.480+01:00 n22 kernel: Call Trace:
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1033ba2>] warn_slowpath_common+0x72/0xa0
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1160885>] ? locks_remove_flock+0xe5/0x110
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1160885>] ? locks_remove_flock+0xe5/0x110
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1033bf2>] warn_slowpath_null+0x22/0x30
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c1160885>] locks_remove_flock+0xe5/0x110
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c110eb53>] ? kmem_cache_alloc+0x73/0x140
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c111e719>] __fput+0x79/0x1f0
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<c111ed1b>] __fput_sync+0x4b/0x60
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bb8b17>] __nfs4_file_put_access+0x57/0x80 [nfsd]
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bb8b73>] nfs4_file_put_access+0x33/0x40 [nfsd]
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bbd4f6>] nfsd4_open_downgrade+0x1b6/0x210 [nfsd]
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bbd340>] ? nfsd4_open_confirm+0x130/0x130 [nfsd]
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8baf94b>] nfsd4_proc_compound+0x41b/0x530 [nfsd]
> 2013-03-27T22:28:43.480+01:00 n22 kernel: [<f8bbf180>] ? nfsd4_set_lockstateid+0x50/0x50 [nfsd]
> 2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8ba1ba0>] nfsd_dispatch+0xa0/0x1b0 [nfsd]
> 2013-03-27T22:28:43.481+01:00 n22 kernel: [<f857b8a8>] svc_process+0x3f8/0x6d0 [sunrpc]
> 2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8586d5e>] ? svc_xprt_received+0x3e/0x80 [sunrpc]
> 2013-03-27T22:28:43.481+01:00 n22 kernel: [<f85879e1>] ? svc_recv+0x2d1/0x3c0 [sunrpc]
> 2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8ba165d>] nfsd+0xad/0x110 [nfsd]
> 2013-03-27T22:28:43.481+01:00 n22 kernel: [<f8ba15b0>] ? nfsd_destroy+0x70/0x70 [nfsd]
> 2013-03-27T22:28:43.481+01:00 n22 kernel: [<c1053b94>] kthread+0x94/0xa0
> 2013-03-27T22:28:43.481+01:00 n22 kernel: [<c13d0f77>] ret_from_kernel_thread+0x1b/0x28
> 2013-03-27T22:28:43.481+01:00 n22 kernel: [<c1053b00>] ? flush_kthread_work+0xd0/0xd0
> 2013-03-27T22:28:43.481+01:00 n22 kernel: ---[ end trace 66f6fe36b0d13e8f ]---
>
>
> The UML kernel was 3.9.0-rc4-00144-ga8c4528, and the trinity command just :
> $> trinity --children 2 --victims /mnt/nfs/n22/victims -x mremap
>
> --
> MfG/Sincerely
> Toralf Förster
> pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2013-03-29 18:43:35

by J. Bruce Fields

[permalink] [raw]
Subject: Re: kernel 3.8.4 : kernel BUG at fs/locks.c:2093! part #2

On Wed, Mar 27, 2013 at 05:57:27PM -0400, J. Bruce Fields wrote:
> On Wed, Mar 27, 2013 at 10:35:12PM +0100, Toralf Förster wrote:
> > On 03/26/2013 03:46 PM, J. Bruce Fields wrote:
> > > Can you run with test patches?
> >
> > applied - today I got a warning instead a bug at the host kernel (3.8.4):
>
> Excellent! Thanks, I think I see the bug now....

This logic is just terrible; I'll see what if I can make this more
sensible.

For now, though, this should fix the bug.

Thanks again for this testing.

--b.

commit 01463db0a864d43bd2c9a69ff37d5a75b7822a49
Author: J. Bruce Fields <[email protected]>
Date: Thu Mar 28 20:37:14 2013 -0400

nfsd4: don't close read-write opens too soon

Don't actually close any opens until we don't need them at all.

This means being left with write access when it's not really necessary,
but that's better than putting a file that might still have posix locks
held on it, as we have been.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index aac878e..285a0c8 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -210,13 +210,7 @@ static void __nfs4_file_put_access(struct nfs4_file *fp, int oflag)
{
if (atomic_dec_and_test(&fp->fi_access[oflag])) {
nfs4_file_put_fd(fp, oflag);
- /*
- * It's also safe to get rid of the RDWR open *if*
- * we no longer have need of the other kind of access
- * or if we already have the other kind of open:
- */
- if (fp->fi_fds[1-oflag]
- || atomic_read(&fp->fi_access[1 - oflag]) == 0)
+ if (atomic_read(&fp->fi_access[1 - oflag]) == 0)
nfs4_file_put_fd(fp, O_RDWR);
}
}

2013-04-01 20:26:35

by Toralf Förster

[permalink] [raw]
Subject: Re: kernel 3.8.4 : kernel BUG at fs/locks.c:2093! part #2

On 03/29/2013 07:43 PM, J. Bruce Fields wrote:
> For now, though, this should fix the bug.
>
> Thanks again for this testing.
With this patch applied on top of 3.8.5 the bug could not be reproduced
any longer (till now).

@Dave
Just Cc:'ed you for your trinity score card
:-)

--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3