2015-07-27 15:20:29

by Andrew W Elble

[permalink] [raw]
Subject: list_del corruption / unhash_ol_stateid()


We've seen this a few times now. This will eventually crash.

This is 4.1.3

I'm still trying to get more information.

Any ideas/things to try?

Thanks,

Andy

[16562.212036] ------------[ cut here ]------------
[16562.217481] WARNING: CPU: 46 PID: 32242 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
[16562.227126] list_del corruption, ffff8814774ff2b0->next is LIST_POISON1 (dead000000100100)
[16562.236637] Modules linked in: gfs2 dlm sctp drbd lru_cache cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp l
ibiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conn
track_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables dm_multipath x86_pkg_temp_thermal coretemp kvm_intel kvm nfsd ipmi_devintf
crct10dif_pclmul iTCO_wdt crc32_pclmul crc32c_intel iTCO_vendor_support ghash_clmulni_intel auth_rpcgss lpc_ich aesni_intel ipmi_si lrw gf128
mul glue_helper ablk_helper ipmi_msghandler cryptd mfd_core nfs_acl dcdbas wmi acpi_pad mei_me sb_edac mei edac_core pcspkr shpchp lockd acpi
_power_meter grace sunrpc binfmt_misc xfs mgag200 syscopyarea sr_mod sysfillrect sysimgblt cdrom i2c_algo_bit
[16562.319032] sd_mod drm_kms_helper ttm ixgbe drm ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_h
ash dm_log dm_mod
[16562.334398] CPU: 46 PID: 32242 Comm: nfsd Tainted: G W 4.1.3 #1
[16562.342844] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.4.3 07/09/2014
[16562.351830] 0000000000000000 000000003f77bdfd ffff881f419afba8 ffffffff81683d19
[16562.360778] 0000000000000000 ffff881f419afc00 ffff881f419afbe8 ffffffff8107d15a
[16562.369715] ffff881f419afbc8 ffff8814774ff2b0 ffff8814777b4c64 ffff881f419afcd8
[16562.378659] Call Trace:
[16562.382054] [<ffffffff81683d19>] dump_stack+0x45/0x57
[16562.388472] [<ffffffff8107d15a>] warn_slowpath_common+0x8a/0xc0
[16562.395895] [<ffffffff8107d1e5>] warn_slowpath_fmt+0x55/0x70
[16562.403045] [<ffffffff8132a983>] __list_del_entry+0x63/0xd0
[16562.410065] [<ffffffff8132aa01>] list_del+0x11/0x40
[16562.416403] [<ffffffffa060eaa8>] unhash_ol_stateid+0x28/0x40 [nfsd]
[16562.424245] [<ffffffffa061660d>] nfsd4_close+0xdd/0x2c0 [nfsd]
[16562.431642] [<ffffffffa0604147>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
[16562.439897] [<ffffffffa05efff3>] nfsd_dispatch+0xc3/0x210 [nfsd]
[16562.447534] [<ffffffffa019d483>] svc_process_common+0x453/0x6f0 [sunrpc]
[16562.455915] [<ffffffffa019d833>] svc_process+0x113/0x1b0 [sunrpc]
[16562.463613] [<ffffffffa05ef9ff>] nfsd+0xff/0x170 [nfsd]
[16562.470356] [<ffffffffa05ef900>] ? nfsd_destroy+0x80/0x80 [nfsd]
[16562.477987] [<ffffffff8109c8d8>] kthread+0xd8/0xf0
[16562.484254] [<ffffffff8109c800>] ? kthread_create_on_node+0x1b0/0x1b0
[16562.492412] [<ffffffff8168b7a2>] ret_from_fork+0x42/0x70
[16562.499294] [<ffffffff8109c800>] ? kthread_create_on_node+0x1b0/0x1b0
[16562.507488] ---[ end trace cc575d5b980d16bb ]---
[16562.513551] ------------[ cut here ]------------

This was preceded by:

[12492.267703] ------------[ cut here ]------------
[12492.273425] WARNING: CPU: 0 PID: 32238 at fs/nfsd/nfs4state.c:3937 nfsd4_process_open2+0x120d/0x1230 [nfsd]()
[12492.285010] Modules linked in: gfs2 dlm sctp drbd lru_cache cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp l
ibiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conn
track_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables dm_multipath x86_pkg_temp_thermal coretemp kvm_intel kvm nfsd ipmi_devintf
crct10dif_pclmul iTCO_wdt crc32_pclmul crc32c_intel iTCO_vendor_support ghash_clmulni_intel auth_rpcgss lpc_ich aesni_intel ipmi_si lrw gf128
mul glue_helper ablk_helper ipmi_msghandler cryptd mfd_core nfs_acl dcdbas wmi acpi_pad mei_me sb_edac mei edac_core pcspkr shpchp lockd acpi
_power_meter grace sunrpc binfmt_misc xfs mgag200 syscopyarea sr_mod sysfillrect sysimgblt cdrom i2c_algo_bit
[12492.368457] sd_mod drm_kms_helper ttm ixgbe drm ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_h
ash dm_log dm_mod
[12492.383931] CPU: 0 PID: 32238 Comm: nfsd Not tainted 4.1.3 #1
[12492.391019] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.4.3 07/09/2014
[12492.400011] 0000000000000000 000000001d7c2753 ffff881f47d5fbb8 ffffffff81683d19
[12492.408969] 0000000000000000 0000000000000000 ffff881f47d5fbf8 ffffffff8107d15a
[12492.417973] ffff881f3f151dc0 ffff881f36d303e0 ffff881f3f151dc0 ffff881f42393398
[12492.426916] Call Trace:
[12492.430380] [<ffffffff81683d19>] dump_stack+0x45/0x57
[12492.436857] [<ffffffff8107d15a>] warn_slowpath_common+0x8a/0xc0
[12492.444343] [<ffffffff8107d28a>] warn_slowpath_null+0x1a/0x20
[12492.451615] [<ffffffffa06150fd>] nfsd4_process_open2+0x120d/0x1230 [nfsd]
[12492.460049] [<ffffffffa0603972>] nfsd4_open+0x532/0x830 [nfsd]
[12492.467366] [<ffffffffa0604147>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
[12492.475664] [<ffffffffa05efff3>] nfsd_dispatch+0xc3/0x210 [nfsd]
[12492.483169] [<ffffffffa019d483>] svc_process_common+0x453/0x6f0 [sunrpc]
[12492.491520] [<ffffffffa019d833>] svc_process+0x113/0x1b0 [sunrpc]
[12492.499232] [<ffffffffa05ef9ff>] nfsd+0xff/0x170 [nfsd]
[12492.505932] [<ffffffffa05ef900>] ? nfsd_destroy+0x80/0x80 [nfsd]
[12492.513494] [<ffffffff8109c8d8>] kthread+0xd8/0xf0
[12492.519728] [<ffffffff8109c800>] ? kthread_create_on_node+0x1b0/0x1b0
[12492.527791] [<ffffffff8168b7a2>] ret_from_fork+0x42/0x70
[12492.534535] [<ffffffff8109c800>] ? kthread_create_on_node+0x1b0/0x1b0
[12492.542504] ---[ end trace cc575d5b980d16ba ]---


--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912


2015-07-27 18:06:26

by Andrew W Elble

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()


> [12492.273425] WARNING: CPU: 0 PID: 32238 at fs/nfsd/nfs4state.c:3937
> nfsd4_process_open2+0x120d/0x1230 [nfsd]()

3931 fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
3932 if (!fl)
3933 return -ENOMEM;
3934 filp = find_readable_file(fp);
3935 if (!filp) {
3936 /* We should always have a readable file here */
3937 WARN_ON_ONCE(1);
3938 return -EBADF;
3939 }

We're at least leaking fl on return @3938 here? Can't yet speak to the
trigger from find_readable_file().

1007 static void unhash_ol_stateid(struct nfs4_ol_stateid *stp)
1008 {
1009 struct nfs4_file *fp = stp->st_stid.sc_file;
1010
1011 lockdep_assert_held(&stp->st_stateowner->so_client->cl_lock);
1012
1013 spin_lock(&fp->fi_lock);
1014 list_del(&stp->st_perfile);
1015 spin_unlock(&fp->fi_lock);
1016 list_del(&stp->st_perstateowner);
1017 }

The list_del corruption warning is triggered from here:

1014 list_del(&stp->st_perfile);

Actual crash looks like so:

PID: 32237 TASK: ffff881f391cdef0 CPU: 22 COMMAND: "nfsd"
#0 [ffff881f48ed36f0] machine_kexec at ffffffff8105bf3b
#1 [ffff881f48ed3760] crash_kexec at ffffffff81109b52
#2 [ffff881f48ed3830] oops_end at ffffffff81019768
#3 [ffff881f48ed3860] no_context at ffffffff8167e502
#4 [ffff881f48ed38c0] __bad_area_nosemaphore at ffffffff8167e5ed
#5 [ffff881f48ed3910] bad_area_nosemaphore at ffffffff8167e759
#6 [ffff881f48ed3920] __do_page_fault at ffffffff810687e6
#7 [ffff881f48ed3990] do_page_fault at ffffffff81068bb0
#8 [ffff881f48ed39d0] page_fault at ffffffff8168d398
[exception RIP: __kmalloc+150]
RIP: ffffffff811dab66 RSP: ffff881f48ed3a88 RFLAGS: 00010286
RAX: 0000000000000000 RBX: 000000000000000a RCX: 00000000009f26fa
RDX: 00000000009f26f9 RSI: 0000000000000000 RDI: ffffffff8124cfc0
RBP: ffff881f48ed3ac8 R8: 000000000001ab00 R9: 0000000000000000
R10: ffff881f48ed3918 R11: ffffffffa0852070 R12: 0000000000000050
R13: 0000000000000068 R14: ffff881fff403900 R15: 00000000ffffffff
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffff881f48ed3ad0] posix_acl_alloc at ffffffff8124cfc0
#10 [ffff881f48ed3af0] posix_acl_from_xattr at ffffffff8124da44
#11 [ffff881f48ed3b40] gfs2_get_acl at ffffffffa0852064 [gfs2]
#12 [ffff881f48ed3b70] get_acl at ffffffff8124d557
#13 [ffff881f48ed3b90] generic_permission at ffffffff811fb4a2
#14 [ffff881f48ed3bd0] gfs2_permission at ffffffffa086d98d [gfs2]
#15 [ffff881f48ed3c70] __inode_permission at ffffffff811fb572
#16 [ffff881f48ed3ca0] inode_permission at ffffffff811fb5e8
#17 [ffff881f48ed3cb0] nfsd_permission at ffffffffa05f6552 [nfsd]
#18 [ffff881f48ed3ce0] nfsd_access at ffffffffa05f77a8 [nfsd]
#19 [ffff881f48ed3d40] nfsd4_access at ffffffffa06022ec [nfsd]
#20 [ffff881f48ed3d50] nfsd4_proc_compound at ffffffffa0604147 [nfsd]
#21 [ffff881f48ed3db0] nfsd_dispatch at ffffffffa05efff3 [nfsd]
#22 [ffff881f48ed3df0] svc_process_common at ffffffffa019d483 [sunrpc]
#23 [ffff881f48ed3e60] svc_process at ffffffffa019d833 [sunrpc]
#24 [ffff881f48ed3e90] nfsd at ffffffffa05ef9ff [nfsd]
#25 [ffff881f48ed3ec0] kthread at ffffffff8109c8d8
#26 [ffff881f48ed3f50] ret_from_fork at ffffffff8168b7a2

Thanks,

Andy

--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2015-07-27 20:40:26

by J. Bruce Fields

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()

This looks a lot like the same thing Anna's been hitting, which I
haven't been able to reliably reproduce yet. How are you hitting this?

--b.

On Mon, Jul 27, 2015 at 02:06:25PM -0400, Andrew W Elble wrote:
>
> > [12492.273425] WARNING: CPU: 0 PID: 32238 at fs/nfsd/nfs4state.c:3937
> > nfsd4_process_open2+0x120d/0x1230 [nfsd]()
>
> 3931 fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
> 3932 if (!fl)
> 3933 return -ENOMEM;
> 3934 filp = find_readable_file(fp);
> 3935 if (!filp) {
> 3936 /* We should always have a readable file here */
> 3937 WARN_ON_ONCE(1);
> 3938 return -EBADF;
> 3939 }
>
> We're at least leaking fl on return @3938 here? Can't yet speak to the
> trigger from find_readable_file().
>
> 1007 static void unhash_ol_stateid(struct nfs4_ol_stateid *stp)
> 1008 {
> 1009 struct nfs4_file *fp = stp->st_stid.sc_file;
> 1010
> 1011 lockdep_assert_held(&stp->st_stateowner->so_client->cl_lock);
> 1012
> 1013 spin_lock(&fp->fi_lock);
> 1014 list_del(&stp->st_perfile);
> 1015 spin_unlock(&fp->fi_lock);
> 1016 list_del(&stp->st_perstateowner);
> 1017 }
>
> The list_del corruption warning is triggered from here:
>
> 1014 list_del(&stp->st_perfile);
>
> Actual crash looks like so:
>
> PID: 32237 TASK: ffff881f391cdef0 CPU: 22 COMMAND: "nfsd"
> #0 [ffff881f48ed36f0] machine_kexec at ffffffff8105bf3b
> #1 [ffff881f48ed3760] crash_kexec at ffffffff81109b52
> #2 [ffff881f48ed3830] oops_end at ffffffff81019768
> #3 [ffff881f48ed3860] no_context at ffffffff8167e502
> #4 [ffff881f48ed38c0] __bad_area_nosemaphore at ffffffff8167e5ed
> #5 [ffff881f48ed3910] bad_area_nosemaphore at ffffffff8167e759
> #6 [ffff881f48ed3920] __do_page_fault at ffffffff810687e6
> #7 [ffff881f48ed3990] do_page_fault at ffffffff81068bb0
> #8 [ffff881f48ed39d0] page_fault at ffffffff8168d398
> [exception RIP: __kmalloc+150]
> RIP: ffffffff811dab66 RSP: ffff881f48ed3a88 RFLAGS: 00010286
> RAX: 0000000000000000 RBX: 000000000000000a RCX: 00000000009f26fa
> RDX: 00000000009f26f9 RSI: 0000000000000000 RDI: ffffffff8124cfc0
> RBP: ffff881f48ed3ac8 R8: 000000000001ab00 R9: 0000000000000000
> R10: ffff881f48ed3918 R11: ffffffffa0852070 R12: 0000000000000050
> R13: 0000000000000068 R14: ffff881fff403900 R15: 00000000ffffffff
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #9 [ffff881f48ed3ad0] posix_acl_alloc at ffffffff8124cfc0
> #10 [ffff881f48ed3af0] posix_acl_from_xattr at ffffffff8124da44
> #11 [ffff881f48ed3b40] gfs2_get_acl at ffffffffa0852064 [gfs2]
> #12 [ffff881f48ed3b70] get_acl at ffffffff8124d557
> #13 [ffff881f48ed3b90] generic_permission at ffffffff811fb4a2
> #14 [ffff881f48ed3bd0] gfs2_permission at ffffffffa086d98d [gfs2]
> #15 [ffff881f48ed3c70] __inode_permission at ffffffff811fb572
> #16 [ffff881f48ed3ca0] inode_permission at ffffffff811fb5e8
> #17 [ffff881f48ed3cb0] nfsd_permission at ffffffffa05f6552 [nfsd]
> #18 [ffff881f48ed3ce0] nfsd_access at ffffffffa05f77a8 [nfsd]
> #19 [ffff881f48ed3d40] nfsd4_access at ffffffffa06022ec [nfsd]
> #20 [ffff881f48ed3d50] nfsd4_proc_compound at ffffffffa0604147 [nfsd]
> #21 [ffff881f48ed3db0] nfsd_dispatch at ffffffffa05efff3 [nfsd]
> #22 [ffff881f48ed3df0] svc_process_common at ffffffffa019d483 [sunrpc]
> #23 [ffff881f48ed3e60] svc_process at ffffffffa019d833 [sunrpc]
> #24 [ffff881f48ed3e90] nfsd at ffffffffa05ef9ff [nfsd]
> #25 [ffff881f48ed3ec0] kthread at ffffffff8109c8d8
> #26 [ffff881f48ed3f50] ret_from_fork at ffffffff8168b7a2
>
> Thanks,
>
> Andy
>
> --
> Andrew W. Elble
> [email protected]
> Infrastructure Engineer, Communications Technical Lead
> Rochester Institute of Technology
> PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-07-27 21:03:44

by Andrew W Elble

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()


Well, the primary load on the nfs server is from 4.1.3 nfs clients
(mounted vers=4.1) running Apache against the exported filesystems.
There is contending load being simultaneously placed on the same
filesystems that are being exported on the server itself. (i.e. running
git adds on the web homedirs on the nfs server itself). We were
reliably duplicating "it" every 2 hours this morning - although when
not under actual load it may take weeks to manifest/may not actually crash.

We will probably try some debug_slub things tomorrow morning and will
try some load generation to see if we can duplicate without the
production traffic.

"J. Bruce Fields" <[email protected]> writes:

> This looks a lot like the same thing Anna's been hitting, which I
> haven't been able to reliably reproduce yet. How are you hitting this?
>
> --b.
>
> On Mon, Jul 27, 2015 at 02:06:25PM -0400, Andrew W Elble wrote:
>>
>> > [12492.273425] WARNING: CPU: 0 PID: 32238 at fs/nfsd/nfs4state.c:3937
>> > nfsd4_process_open2+0x120d/0x1230 [nfsd]()
>>
>> 3931 fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
>> 3932 if (!fl)
>> 3933 return -ENOMEM;
>> 3934 filp = find_readable_file(fp);
>> 3935 if (!filp) {
>> 3936 /* We should always have a readable file here */
>> 3937 WARN_ON_ONCE(1);
>> 3938 return -EBADF;
>> 3939 }
>>
>> We're at least leaking fl on return @3938 here? Can't yet speak to the
>> trigger from find_readable_file().
>>
>> 1007 static void unhash_ol_stateid(struct nfs4_ol_stateid *stp)
>> 1008 {
>> 1009 struct nfs4_file *fp = stp->st_stid.sc_file;
>> 1010
>> 1011 lockdep_assert_held(&stp->st_stateowner->so_client->cl_lock);
>> 1012
>> 1013 spin_lock(&fp->fi_lock);
>> 1014 list_del(&stp->st_perfile);
>> 1015 spin_unlock(&fp->fi_lock);
>> 1016 list_del(&stp->st_perstateowner);
>> 1017 }
>>
>> The list_del corruption warning is triggered from here:
>>
>> 1014 list_del(&stp->st_perfile);
>>
>> Actual crash looks like so:
>>
>> PID: 32237 TASK: ffff881f391cdef0 CPU: 22 COMMAND: "nfsd"
>> #0 [ffff881f48ed36f0] machine_kexec at ffffffff8105bf3b
>> #1 [ffff881f48ed3760] crash_kexec at ffffffff81109b52
>> #2 [ffff881f48ed3830] oops_end at ffffffff81019768
>> #3 [ffff881f48ed3860] no_context at ffffffff8167e502
>> #4 [ffff881f48ed38c0] __bad_area_nosemaphore at ffffffff8167e5ed
>> #5 [ffff881f48ed3910] bad_area_nosemaphore at ffffffff8167e759
>> #6 [ffff881f48ed3920] __do_page_fault at ffffffff810687e6
>> #7 [ffff881f48ed3990] do_page_fault at ffffffff81068bb0
>> #8 [ffff881f48ed39d0] page_fault at ffffffff8168d398
>> [exception RIP: __kmalloc+150]
>> RIP: ffffffff811dab66 RSP: ffff881f48ed3a88 RFLAGS: 00010286
>> RAX: 0000000000000000 RBX: 000000000000000a RCX: 00000000009f26fa
>> RDX: 00000000009f26f9 RSI: 0000000000000000 RDI: ffffffff8124cfc0
>> RBP: ffff881f48ed3ac8 R8: 000000000001ab00 R9: 0000000000000000
>> R10: ffff881f48ed3918 R11: ffffffffa0852070 R12: 0000000000000050
>> R13: 0000000000000068 R14: ffff881fff403900 R15: 00000000ffffffff
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>> #9 [ffff881f48ed3ad0] posix_acl_alloc at ffffffff8124cfc0
>> #10 [ffff881f48ed3af0] posix_acl_from_xattr at ffffffff8124da44
>> #11 [ffff881f48ed3b40] gfs2_get_acl at ffffffffa0852064 [gfs2]
>> #12 [ffff881f48ed3b70] get_acl at ffffffff8124d557
>> #13 [ffff881f48ed3b90] generic_permission at ffffffff811fb4a2
>> #14 [ffff881f48ed3bd0] gfs2_permission at ffffffffa086d98d [gfs2]
>> #15 [ffff881f48ed3c70] __inode_permission at ffffffff811fb572
>> #16 [ffff881f48ed3ca0] inode_permission at ffffffff811fb5e8
>> #17 [ffff881f48ed3cb0] nfsd_permission at ffffffffa05f6552 [nfsd]
>> #18 [ffff881f48ed3ce0] nfsd_access at ffffffffa05f77a8 [nfsd]
>> #19 [ffff881f48ed3d40] nfsd4_access at ffffffffa06022ec [nfsd]
>> #20 [ffff881f48ed3d50] nfsd4_proc_compound at ffffffffa0604147 [nfsd]
>> #21 [ffff881f48ed3db0] nfsd_dispatch at ffffffffa05efff3 [nfsd]
>> #22 [ffff881f48ed3df0] svc_process_common at ffffffffa019d483 [sunrpc]
>> #23 [ffff881f48ed3e60] svc_process at ffffffffa019d833 [sunrpc]
>> #24 [ffff881f48ed3e90] nfsd at ffffffffa05ef9ff [nfsd]
>> #25 [ffff881f48ed3ec0] kthread at ffffffff8109c8d8
>> #26 [ffff881f48ed3f50] ret_from_fork at ffffffff8168b7a2
>>
>> Thanks,
>>
>> Andy
>>
>> --
>> Andrew W. Elble
>> [email protected]
>> Infrastructure Engineer, Communications Technical Lead
>> Rochester Institute of Technology
>> PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2015-07-28 13:02:12

by Jeff Layton

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()

On Mon, 27 Jul 2015 14:06:25 -0400
Andrew W Elble <[email protected]> wrote:

>
> > [12492.273425] WARNING: CPU: 0 PID: 32238 at fs/nfsd/nfs4state.c:3937
> > nfsd4_process_open2+0x120d/0x1230 [nfsd]()
>
> 3931 fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
> 3932 if (!fl)
> 3933 return -ENOMEM;
> 3934 filp = find_readable_file(fp);
> 3935 if (!filp) {
> 3936 /* We should always have a readable file here */
> 3937 WARN_ON_ONCE(1);
> 3938 return -EBADF;
> 3939 }
>
> We're at least leaking fl on return @3938 here? Can't yet speak to the
> trigger from find_readable_file().
>

Yeah, I think Kinglong fixed the file_lock leak recently. There's a
patch for it in Bruce's tree, at least.

Still, if you're going down that error path, then something seems quite
wrong (hence the WARN_ON_ONCE). We've got a nfs4_file, and we should
have already taken references to the read filp associated with it
before trying to get a delegation.

It might not hurt to roll up some extra some sanity checking before we
get to this point and see if we can nail down whether we're failing to
get a reference to the read filp altogether, or whether it's going away
for some reason.

> 1007 static void unhash_ol_stateid(struct nfs4_ol_stateid *stp)
> 1008 {
> 1009 struct nfs4_file *fp = stp->st_stid.sc_file;
> 1010
> 1011 lockdep_assert_held(&stp->st_stateowner->so_client->cl_lock);
> 1012
> 1013 spin_lock(&fp->fi_lock);
> 1014 list_del(&stp->st_perfile);
> 1015 spin_unlock(&fp->fi_lock);
> 1016 list_del(&stp->st_perstateowner);
> 1017 }
>
> The list_del corruption warning is triggered from here:
>
> 1014 list_del(&stp->st_perfile);
>

Which probably means that this thing got list_del'ed twice. Hard to say
whether that's related to the problem above though.

> Actual crash looks like so:
>
> PID: 32237 TASK: ffff881f391cdef0 CPU: 22 COMMAND: "nfsd"
> #0 [ffff881f48ed36f0] machine_kexec at ffffffff8105bf3b
> #1 [ffff881f48ed3760] crash_kexec at ffffffff81109b52
> #2 [ffff881f48ed3830] oops_end at ffffffff81019768
> #3 [ffff881f48ed3860] no_context at ffffffff8167e502
> #4 [ffff881f48ed38c0] __bad_area_nosemaphore at ffffffff8167e5ed
> #5 [ffff881f48ed3910] bad_area_nosemaphore at ffffffff8167e759
> #6 [ffff881f48ed3920] __do_page_fault at ffffffff810687e6
> #7 [ffff881f48ed3990] do_page_fault at ffffffff81068bb0
> #8 [ffff881f48ed39d0] page_fault at ffffffff8168d398
> [exception RIP: __kmalloc+150]
> RIP: ffffffff811dab66 RSP: ffff881f48ed3a88 RFLAGS: 00010286
> RAX: 0000000000000000 RBX: 000000000000000a RCX: 00000000009f26fa
> RDX: 00000000009f26f9 RSI: 0000000000000000 RDI: ffffffff8124cfc0
> RBP: ffff881f48ed3ac8 R8: 000000000001ab00 R9: 0000000000000000
> R10: ffff881f48ed3918 R11: ffffffffa0852070 R12: 0000000000000050
> R13: 0000000000000068 R14: ffff881fff403900 R15: 00000000ffffffff
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #9 [ffff881f48ed3ad0] posix_acl_alloc at ffffffff8124cfc0
> #10 [ffff881f48ed3af0] posix_acl_from_xattr at ffffffff8124da44
> #11 [ffff881f48ed3b40] gfs2_get_acl at ffffffffa0852064 [gfs2]
> #12 [ffff881f48ed3b70] get_acl at ffffffff8124d557
> #13 [ffff881f48ed3b90] generic_permission at ffffffff811fb4a2
> #14 [ffff881f48ed3bd0] gfs2_permission at ffffffffa086d98d [gfs2]
> #15 [ffff881f48ed3c70] __inode_permission at ffffffff811fb572
> #16 [ffff881f48ed3ca0] inode_permission at ffffffff811fb5e8
> #17 [ffff881f48ed3cb0] nfsd_permission at ffffffffa05f6552 [nfsd]
> #18 [ffff881f48ed3ce0] nfsd_access at ffffffffa05f77a8 [nfsd]
> #19 [ffff881f48ed3d40] nfsd4_access at ffffffffa06022ec [nfsd]
> #20 [ffff881f48ed3d50] nfsd4_proc_compound at ffffffffa0604147 [nfsd]
> #21 [ffff881f48ed3db0] nfsd_dispatch at ffffffffa05efff3 [nfsd]
> #22 [ffff881f48ed3df0] svc_process_common at ffffffffa019d483 [sunrpc]
> #23 [ffff881f48ed3e60] svc_process at ffffffffa019d833 [sunrpc]
> #24 [ffff881f48ed3e90] nfsd at ffffffffa05ef9ff [nfsd]
> #25 [ffff881f48ed3ec0] kthread at ffffffff8109c8d8
> #26 [ffff881f48ed3f50] ret_from_fork at ffffffff8168b7a2
>

My guess is that that may be collateral damage from the double
unhashing of the stateid. It certainly doesn't look directly related to
anything in knfsd -- it's a crash down deep in the POSIX ACL code,
apparently.

--
Jeff Layton <[email protected]>

2015-07-28 15:02:00

by Andrew W Elble

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()


Still trying to develop some kind of test to reliably trigger this.

> My guess is that that may be collateral damage from the double
> unhashing of the stateid. It certainly doesn't look directly related to
> anything in knfsd -- it's a crash down deep in the POSIX ACL code,
> apparently.

Thinking that it's collateral - here's the backtrace from a similar
event.

(was preceded by both the WARN_ON_ONCE() and the list_del corruption messages)

running 3.18.9:

[846121.007040] general protection fault: 0000 [#1] SMP
[846121.013461] Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth gfs2 dlm sctp sg dm_service_time cts iscsi_tcp libiscsi_tcp libiscsi rpcsec_gss_
krb5 nfsv4 dns_resolver nfs scsi_transport_iscsi fscache 8021q nf_log_ipv4 nf_log_common garp xt_LOG mrp stp llc bonding xt_conntrack cfg80211 iptable_filter nf_conn
track_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack rfkill ip_tables dm_multipath x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32
c_intel ghash_clmulni_intel mei_me mei aesni_intel lrw gf128mul glue_helper nfsd ablk_helper sb_edac iTCO_wdt ipmi_devintf pcspkr acpi_power_meter iTCO_vendor_suppor
t dcdbas ipmi_si edac_core cryptd lpc_ich mfd_core wmi shpchp ipmi_msghandler acpi_pad auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc xfs sr_mod cdrom mgag200 sy
scopyarea sd_mod sysfillrect sysimgblt i2c_algo_bit ixgbe drm_kms_helper mdio tg3 ttm ahci dca drm libahci ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[846121.119250] CPU: 8 PID: 46216 Comm: nfsd Tainted: G W 3.18.9 #1
[846121.127982] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.4.3 07/09/2014
[846121.137268] task: ffff881ecd854330 ti: ffff881ecebf4000 task.ti: ffff881ecebf4000
[846121.146540] RIP: 0010:[<ffffffff811c7595>] [<ffffffff811c7595>] kmem_cache_alloc+0x65/0x140
[846121.156839] RSP: 0018:ffff881ecebf7c28 EFLAGS: 00010286
[846121.163626] RAX: 0000000000000000 RBX: ffff8811bd0cf3e0 RCX: 0000000000003a0c
[846121.172438] RDX: 0000000000003a0b RSI: 00000000000080d0 RDI: ffff881fff489000
[846121.181267] RBP: ffff881ecebf7c68 R08: 0000000000016ae0 R09: 00000000b9acddf8
[846121.190049] R10: ffff883fe43df000 R11: ffff881856038437 R12: ff88185603834800
[846121.198826] R13: 00000000000080d0 R14: ffff881fff489000 R15: ffffffffa0517b79
[846121.207623] FS: 0000000000000000(0000) GS:ffff881fffa80000(0000) knlGS:0000000000000000
[846121.217475] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[846121.224720] CR2: 00000000006e8360 CR3: 00000000019bc000 CR4: 00000000001407e0
[846121.233525] Stack:
[846121.236601] 0000000000000014 ffff881fff403c00 ffff881ecebf7c88 ffff8811bd0cf3e0
[846121.245723] ffff883fe43df000 ffff8835d38afa80 0000000000000000 ffff881fff489000
[846121.254854] ffff881ecebf7ca8 ffffffffa0517b79 0000000000000000 ffff8811bd0cf3e0
[846121.263984] Call Trace:
[846121.267559] [<ffffffffa0517b79>] nfs4_alloc_stid+0x29/0xc0 [nfsd]
[846121.275355] [<ffffffffa051fcf8>] nfsd4_lock+0x6f8/0xda0 [nfsd]
[846121.282870] [<ffffffffa050cf07>] nfsd4_proc_compound+0x4d7/0x7f0 [nfsd]
[846121.291191] [<ffffffffa04f8e43>] nfsd_dispatch+0xc3/0x210 [nfsd]
[846121.298869] [<ffffffffa01c8e02>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
[846121.307590] [<ffffffffa01c7d23>] svc_process_common+0x453/0x6f0 [sunrpc]
[846121.316018] [<ffffffffa01c80c7>] svc_process+0x107/0x170 [sunrpc]
[846121.323764] [<ffffffffa04f87c7>] nfsd+0xf7/0x160 [nfsd]
[846121.330541] [<ffffffffa04f86d0>] ? nfsd_destroy+0x80/0x80 [nfsd]
[846121.338226] [<ffffffff81093861>] kthread+0xe1/0x100
[846121.344624] [<ffffffff81093780>] ? kthread_create_on_node+0x1b0/0x1b0
[846121.352776] [<ffffffff816912fc>] ret_from_fork+0x7c/0xb0
[846121.359661] [<ffffffff81093780>] ? kthread_create_on_node+0x1b0/0x1b0
[846121.367961] Code: cc 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 a7 00 00 00 48 85 c0 0f 84 9e 00 00 00 49 63 46 20 48 8d 4a 01 4d 8b 06 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
[846121.391248] RIP [<ffffffff811c7595>] kmem_cache_alloc+0x65/0x140
[846121.398936] RSP <ffff881ecebf7c28>

--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2015-07-28 15:49:39

by Jeff Layton

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()

On Tue, 28 Jul 2015 11:01:59 -0400
Andrew W Elble <[email protected]> wrote:

>
> Still trying to develop some kind of test to reliably trigger this.
>
> > My guess is that that may be collateral damage from the double
> > unhashing of the stateid. It certainly doesn't look directly related to
> > anything in knfsd -- it's a crash down deep in the POSIX ACL code,
> > apparently.
>
> Thinking that it's collateral - here's the backtrace from a similar
> event.
>
> (was preceded by both the WARN_ON_ONCE() and the list_del corruption messages)
>
> running 3.18.9:
>
> [846121.007040] general protection fault: 0000 [#1] SMP
> [846121.013461] Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth gfs2 dlm sctp sg dm_service_time cts iscsi_tcp libiscsi_tcp libiscsi rpcsec_gss_
> krb5 nfsv4 dns_resolver nfs scsi_transport_iscsi fscache 8021q nf_log_ipv4 nf_log_common garp xt_LOG mrp stp llc bonding xt_conntrack cfg80211 iptable_filter nf_conn
> track_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack rfkill ip_tables dm_multipath x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32
> c_intel ghash_clmulni_intel mei_me mei aesni_intel lrw gf128mul glue_helper nfsd ablk_helper sb_edac iTCO_wdt ipmi_devintf pcspkr acpi_power_meter iTCO_vendor_suppor
> t dcdbas ipmi_si edac_core cryptd lpc_ich mfd_core wmi shpchp ipmi_msghandler acpi_pad auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc xfs sr_mod cdrom mgag200 sy
> scopyarea sd_mod sysfillrect sysimgblt i2c_algo_bit ixgbe drm_kms_helper mdio tg3 ttm ahci dca drm libahci ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
> [846121.119250] CPU: 8 PID: 46216 Comm: nfsd Tainted: G W 3.18.9 #1
> [846121.127982] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.4.3 07/09/2014
> [846121.137268] task: ffff881ecd854330 ti: ffff881ecebf4000 task.ti: ffff881ecebf4000
> [846121.146540] RIP: 0010:[<ffffffff811c7595>] [<ffffffff811c7595>] kmem_cache_alloc+0x65/0x140
> [846121.156839] RSP: 0018:ffff881ecebf7c28 EFLAGS: 00010286
> [846121.163626] RAX: 0000000000000000 RBX: ffff8811bd0cf3e0 RCX: 0000000000003a0c
> [846121.172438] RDX: 0000000000003a0b RSI: 00000000000080d0 RDI: ffff881fff489000
> [846121.181267] RBP: ffff881ecebf7c68 R08: 0000000000016ae0 R09: 00000000b9acddf8
> [846121.190049] R10: ffff883fe43df000 R11: ffff881856038437 R12: ff88185603834800
> [846121.198826] R13: 00000000000080d0 R14: ffff881fff489000 R15: ffffffffa0517b79
> [846121.207623] FS: 0000000000000000(0000) GS:ffff881fffa80000(0000) knlGS:0000000000000000
> [846121.217475] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [846121.224720] CR2: 00000000006e8360 CR3: 00000000019bc000 CR4: 00000000001407e0
> [846121.233525] Stack:
> [846121.236601] 0000000000000014 ffff881fff403c00 ffff881ecebf7c88 ffff8811bd0cf3e0
> [846121.245723] ffff883fe43df000 ffff8835d38afa80 0000000000000000 ffff881fff489000
> [846121.254854] ffff881ecebf7ca8 ffffffffa0517b79 0000000000000000 ffff8811bd0cf3e0
> [846121.263984] Call Trace:
> [846121.267559] [<ffffffffa0517b79>] nfs4_alloc_stid+0x29/0xc0 [nfsd]
> [846121.275355] [<ffffffffa051fcf8>] nfsd4_lock+0x6f8/0xda0 [nfsd]
> [846121.282870] [<ffffffffa050cf07>] nfsd4_proc_compound+0x4d7/0x7f0 [nfsd]
> [846121.291191] [<ffffffffa04f8e43>] nfsd_dispatch+0xc3/0x210 [nfsd]
> [846121.298869] [<ffffffffa01c8e02>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
> [846121.307590] [<ffffffffa01c7d23>] svc_process_common+0x453/0x6f0 [sunrpc]
> [846121.316018] [<ffffffffa01c80c7>] svc_process+0x107/0x170 [sunrpc]
> [846121.323764] [<ffffffffa04f87c7>] nfsd+0xf7/0x160 [nfsd]
> [846121.330541] [<ffffffffa04f86d0>] ? nfsd_destroy+0x80/0x80 [nfsd]
> [846121.338226] [<ffffffff81093861>] kthread+0xe1/0x100
> [846121.344624] [<ffffffff81093780>] ? kthread_create_on_node+0x1b0/0x1b0
> [846121.352776] [<ffffffff816912fc>] ret_from_fork+0x7c/0xb0
> [846121.359661] [<ffffffff81093780>] ? kthread_create_on_node+0x1b0/0x1b0
> [846121.367961] Code: cc 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 a7 00 00 00 48 85 c0 0f 84 9e 00 00 00 49 63 46 20 48 8d 4a 01 4d 8b 06 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
> [846121.391248] RIP [<ffffffff811c7595>] kmem_cache_alloc+0x65/0x140
> [846121.398936] RSP <ffff881ecebf7c28>
>

I wonder if this could be the race. In nfsd4_process_open2:

-------------------[snip]-------------------
if (stp) {
/* Stateid was found, this is an OPEN upgrade */
status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open);
if (status)
goto out;
} else {
stp = open->op_stp;
open->op_stp = NULL;
init_open_stateid(stp, fp, open);
status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
if (status) {
release_open_stateid(stp);
goto out;
}

stp->st_clnt_odstate = find_or_hash_clnt_odstate(fp,
open->op_odstate);
if (stp->st_clnt_odstate == open->op_odstate)
open->op_odstate = NULL;
}
-------------------[snip]-------------------

init_open_stateid will hash the stateid, and then call
nfs4_get_vfs_file, which does the actual dentry_open, sets the access
mode, etc.

If that fails, it calls release_open_stateid which basically
unrolls the whole thing -- tears down the stateid, unhashing it in the
process.

In that window where it's hashed though, another open could come along
and find that stateid. If it does, then it will call nfs4_upgrade_open.
If it requests the same access as the first open then it won't try to
actually open the file at all. It'll just assume that the stateid has
the correct access and proceed.

At that point, you may have all the access bits set in the nfs4_file,
but no actual filp associated with it. That may explain the WARN
popping.

In a similar race, you could have two opens racing for different access
(maybe one for read and one for write), or a race between an OPEN and
CLOSE.

In short, I think that there are a couple of potential races here:

1) That release_open_stateid could be problematic, simply because you
could have another user of that stateid race in before it's fully set
up. We probably need to do that a bit more gracefully though I'm a
little unclear on the right solution.

2) We probably need nfs4_upgrade_open to block until the necessary
fi->fi_fds field has been set up.

OTOH, maybe this code needs a more fundamental rethink to close these
races -- I'm not sure...

Bruce, any thoughts?
--
Jeff Layton <[email protected]>

2015-07-28 21:04:35

by J. Bruce Fields

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()

On Tue, Jul 28, 2015 at 11:49:33AM -0400, Jeff Layton wrote:
> On Tue, 28 Jul 2015 11:01:59 -0400
> Andrew W Elble <[email protected]> wrote:
>
> >
> > Still trying to develop some kind of test to reliably trigger this.
> >
> > > My guess is that that may be collateral damage from the double
> > > unhashing of the stateid. It certainly doesn't look directly related to
> > > anything in knfsd -- it's a crash down deep in the POSIX ACL code,
> > > apparently.
> >
> > Thinking that it's collateral - here's the backtrace from a similar
> > event.
> >
> > (was preceded by both the WARN_ON_ONCE() and the list_del corruption messages)
> >
> > running 3.18.9:
> >
> > [846121.007040] general protection fault: 0000 [#1] SMP
> > [846121.013461] Modules linked in: nfnetlink_queue nfnetlink_log nfnetlink bluetooth gfs2 dlm sctp sg dm_service_time cts iscsi_tcp libiscsi_tcp libiscsi rpcsec_gss_
> > krb5 nfsv4 dns_resolver nfs scsi_transport_iscsi fscache 8021q nf_log_ipv4 nf_log_common garp xt_LOG mrp stp llc bonding xt_conntrack cfg80211 iptable_filter nf_conn
> > track_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack rfkill ip_tables dm_multipath x86_pkg_temp_thermal coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32
> > c_intel ghash_clmulni_intel mei_me mei aesni_intel lrw gf128mul glue_helper nfsd ablk_helper sb_edac iTCO_wdt ipmi_devintf pcspkr acpi_power_meter iTCO_vendor_suppor
> > t dcdbas ipmi_si edac_core cryptd lpc_ich mfd_core wmi shpchp ipmi_msghandler acpi_pad auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc xfs sr_mod cdrom mgag200 sy
> > scopyarea sd_mod sysfillrect sysimgblt i2c_algo_bit ixgbe drm_kms_helper mdio tg3 ttm ahci dca drm libahci ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
> > [846121.119250] CPU: 8 PID: 46216 Comm: nfsd Tainted: G W 3.18.9 #1
> > [846121.127982] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.4.3 07/09/2014
> > [846121.137268] task: ffff881ecd854330 ti: ffff881ecebf4000 task.ti: ffff881ecebf4000
> > [846121.146540] RIP: 0010:[<ffffffff811c7595>] [<ffffffff811c7595>] kmem_cache_alloc+0x65/0x140
> > [846121.156839] RSP: 0018:ffff881ecebf7c28 EFLAGS: 00010286
> > [846121.163626] RAX: 0000000000000000 RBX: ffff8811bd0cf3e0 RCX: 0000000000003a0c
> > [846121.172438] RDX: 0000000000003a0b RSI: 00000000000080d0 RDI: ffff881fff489000
> > [846121.181267] RBP: ffff881ecebf7c68 R08: 0000000000016ae0 R09: 00000000b9acddf8
> > [846121.190049] R10: ffff883fe43df000 R11: ffff881856038437 R12: ff88185603834800
> > [846121.198826] R13: 00000000000080d0 R14: ffff881fff489000 R15: ffffffffa0517b79
> > [846121.207623] FS: 0000000000000000(0000) GS:ffff881fffa80000(0000) knlGS:0000000000000000
> > [846121.217475] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [846121.224720] CR2: 00000000006e8360 CR3: 00000000019bc000 CR4: 00000000001407e0
> > [846121.233525] Stack:
> > [846121.236601] 0000000000000014 ffff881fff403c00 ffff881ecebf7c88 ffff8811bd0cf3e0
> > [846121.245723] ffff883fe43df000 ffff8835d38afa80 0000000000000000 ffff881fff489000
> > [846121.254854] ffff881ecebf7ca8 ffffffffa0517b79 0000000000000000 ffff8811bd0cf3e0
> > [846121.263984] Call Trace:
> > [846121.267559] [<ffffffffa0517b79>] nfs4_alloc_stid+0x29/0xc0 [nfsd]
> > [846121.275355] [<ffffffffa051fcf8>] nfsd4_lock+0x6f8/0xda0 [nfsd]
> > [846121.282870] [<ffffffffa050cf07>] nfsd4_proc_compound+0x4d7/0x7f0 [nfsd]
> > [846121.291191] [<ffffffffa04f8e43>] nfsd_dispatch+0xc3/0x210 [nfsd]
> > [846121.298869] [<ffffffffa01c8e02>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
> > [846121.307590] [<ffffffffa01c7d23>] svc_process_common+0x453/0x6f0 [sunrpc]
> > [846121.316018] [<ffffffffa01c80c7>] svc_process+0x107/0x170 [sunrpc]
> > [846121.323764] [<ffffffffa04f87c7>] nfsd+0xf7/0x160 [nfsd]
> > [846121.330541] [<ffffffffa04f86d0>] ? nfsd_destroy+0x80/0x80 [nfsd]
> > [846121.338226] [<ffffffff81093861>] kthread+0xe1/0x100
> > [846121.344624] [<ffffffff81093780>] ? kthread_create_on_node+0x1b0/0x1b0
> > [846121.352776] [<ffffffff816912fc>] ret_from_fork+0x7c/0xb0
> > [846121.359661] [<ffffffff81093780>] ? kthread_create_on_node+0x1b0/0x1b0
> > [846121.367961] Code: cc 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 a7 00 00 00 48 85 c0 0f 84 9e 00 00 00 49 63 46 20 48 8d 4a 01 4d 8b 06 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63
> > [846121.391248] RIP [<ffffffff811c7595>] kmem_cache_alloc+0x65/0x140
> > [846121.398936] RSP <ffff881ecebf7c28>
> >
>
> I wonder if this could be the race. In nfsd4_process_open2:
>
> -------------------[snip]-------------------
> if (stp) {
> /* Stateid was found, this is an OPEN upgrade */
> status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open);
> if (status)
> goto out;
> } else {
> stp = open->op_stp;
> open->op_stp = NULL;
> init_open_stateid(stp, fp, open);
> status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
> if (status) {
> release_open_stateid(stp);
> goto out;
> }
>
> stp->st_clnt_odstate = find_or_hash_clnt_odstate(fp,
> open->op_odstate);
> if (stp->st_clnt_odstate == open->op_odstate)
> open->op_odstate = NULL;
> }
> -------------------[snip]-------------------
>
> init_open_stateid will hash the stateid, and then call
> nfs4_get_vfs_file, which does the actual dentry_open, sets the access
> mode, etc.
>
> If that fails, it calls release_open_stateid which basically
> unrolls the whole thing -- tears down the stateid, unhashing it in the
> process.
>
> In that window where it's hashed though, another open could come along
> and find that stateid. If it does, then it will call nfs4_upgrade_open.
> If it requests the same access as the first open then it won't try to
> actually open the file at all. It'll just assume that the stateid has
> the correct access and proceed.
>
> At that point, you may have all the access bits set in the nfs4_file,
> but no actual filp associated with it. That may explain the WARN
> popping.
>
> In a similar race, you could have two opens racing for different access
> (maybe one for read and one for write), or a race between an OPEN and
> CLOSE.
>
> In short, I think that there are a couple of potential races here:
>
> 1) That release_open_stateid could be problematic, simply because you
> could have another user of that stateid race in before it's fully set
> up. We probably need to do that a bit more gracefully though I'm a
> little unclear on the right solution.
>
> 2) We probably need nfs4_upgrade_open to block until the necessary
> fi->fi_fds field has been set up.
>
> OTOH, maybe this code needs a more fundamental rethink to close these
> races -- I'm not sure...
>
> Bruce, any thoughts?

Thanks for catching that, it does look pretty suspicious! But I need to
look at it closer.

It does sounds like both Andrew and Anna are able to trigger this fairly
quickly, so might be worth adding a WARN_ON_ONCE() or two here in
nfsd4_process_open2 to confirm whether either the upgrade or
release_open_stateid cases are dealing with a partially setup stateid.

--b.

2015-07-29 15:17:58

by Andrew W Elble

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()


Updates: slub_debug seems to interfere with this manifesting.

> so might be worth adding a WARN_ON_ONCE() or two here in
> nfsd4_process_open2 to confirm whether either the upgrade or
> release_open_stateid cases are dealing with a partially setup stateid.

I started with this - not completely sure this covers all the cases
we're interested in.

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 039f9c8a95e8..bb54c7d7b787 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -3866,6 +3866,8 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *c
if (!test_access(open->op_share_access, stp))
return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open);

+ WARN_ON_ONCE(!find_readable_file(fp));
+
/* test and set deny mode */
spin_lock(&fp->fi_lock);
status = nfs4_file_check_deny(fp, open->op_share_deny);
@@ -4171,6 +4173,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
init_open_stateid(stp, fp, open);
status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
if (status) {
+ WARN_ON_ONCE(!find_readable_file(fp));
release_open_stateid(stp);
goto out;
}


...and quickly got the warning at the usual place, and inside of nfs4_upgrade_open()

[ 836.784251] WARNING: CPU: 0 PID: 12114 at fs/nfsd/nfs4state.c:3869 nfsd4_process_open2+0xffd/0x1200 [nfsd]()
[ 930.708782] WARNING: CPU: 2 PID: 12114 at fs/nfsd/nfs4state.c:3939 nfsd4_process_open2+0xe5a/0x1200 [nfsd]()
[ 1331.966806] WARNING: CPU: 2 PID: 12108 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
[ 1332.294149] WARNING: CPU: 2 PID: 12108 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()

--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2015-07-29 19:52:06

by Andrew W Elble

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()


> I started with this - not completely sure this covers all the cases
> we're interested in.

Oops. Apparently I like to take lots of extra refs on files. Will redo this tomorrow.

--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2015-07-30 11:11:38

by Andrew W Elble

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()


Update: can get to fail with slub_debug=FPU,nfsd4_stateids

The results seem to be in line with Jeff's theory.

The patch applied to test:

=========================

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 039f9c8a95e8..0b3a9637d199 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -734,6 +734,8 @@ nfs4_put_stid(struct nfs4_stid *s)

might_lock(&clp->cl_lock);

+ WARN_ON_ONCE(s->sc_type == 0x6b);
+
if (!atomic_dec_and_lock(&s->sc_count, &clp->cl_lock)) {
wake_up_all(&close_wq);
return;
@@ -939,6 +941,23 @@ static int nfs4_access_to_omode(u32 access)
return O_RDONLY;
}

+static int warn_on_nofile(struct nfs4_file *fp, struct nfsd4_open *open)
+{
+ struct file *filp = NULL;
+ int oflag = nfs4_access_to_omode(open->op_share_access);
+
+ spin_lock(&fp->fi_lock);
+ filp = fp->fi_fds[oflag];
+ spin_unlock(&fp->fi_lock);
+
+ if (filp)
+ return 1;
+
+ return 0;
+}
+
+
+
/*
* A stateid that had a deny mode associated with it is being released
* or downgraded. Recalculate the deny mode on the file.
@@ -1024,6 +1043,7 @@ static void nfs4_free_ol_stateid(struct nfs4_stid *stid)
release_all_access(stp);
if (stp->st_stateowner)
nfs4_put_stateowner(stp->st_stateowner);
+ WARN_ON_ONCE(atomic_read(&stid->sc_count) > 0);
kmem_cache_free(stateid_slab, stid);
}

@@ -1054,6 +1074,8 @@ static void put_ol_stateid_locked(struct nfs4_ol_stateid *stp,

WARN_ON_ONCE(!list_empty(&stp->st_locks));

+ WARN_ON_ONCE(s->sc_type == 0x6b);
+
if (!atomic_dec_and_test(&s->sc_count)) {
wake_up_all(&close_wq);
return;
@@ -3866,6 +3888,8 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *c
if (!test_access(open->op_share_access, stp))
return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open);

+ WARN_ON_ONCE(!warn_on_nofile(fp, open));
+
/* test and set deny mode */
spin_lock(&fp->fi_lock);
status = nfs4_file_check_deny(fp, open->op_share_deny);
@@ -3935,6 +3959,7 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
if (!filp) {
/* We should always have a readable file here */
WARN_ON_ONCE(1);
+ locks_free_lock(fl);
return -EBADF;
}
fl->fl_file = filp;
@@ -4171,10 +4196,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
init_open_stateid(stp, fp, open);
status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
if (status) {
+ WARN_ON_ONCE(!warn_on_nofile(fp, open));
release_open_stateid(stp);
goto out;
}

+ WARN_ON_ONCE(!warn_on_nofile(fp, open));
+
stp->st_clnt_odstate = find_or_hash_clnt_odstate(fp,
open->op_odstate);
if (stp->st_clnt_odstate == open->op_odstate)



=========================

The results:

[ 1185.524196] ------------[ cut here ]------------
[ 1185.529940] WARNING: CPU: 10 PID: 30157 at fs/nfsd/nfs4state.c:3891 nfsd4_process_open2+0xec8/0x1270 [nfsd]()
[ 1185.541634] Modules linked in: gfs2 dlm sctp drbd(OE) cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x86_pkg_temp_thermal coretemp dm_multipath kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nfsd lrw gf128mul ipmi_devintf glue_helper iTCO_wdt ablk_helper cryptd iTCO_vendor_support dcdbas auth_rpcgss ipmi_si pcspkr mei_me sb_edac acpi_power_meter mei shpchp wmi lpc_ich edac_core mfd_core ipmi_msghandler acpi_pad nfs_acl lockd grace sunrpc binfmt_misc xfs mgag200 sr_mod syscopyarea sysfillrect cdrom sysimgblt i2c_algo_bit drm_kms_helper
[ 1185.625780] ttm drm sd_mod ixgbe ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 1185.639956] CPU: 10 PID: 30157 Comm: nfsd Tainted: G OE 4.1.3_7 #1
[ 1185.648711] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
[ 1185.657864] ffffffffa067558e ffff883fcaa8fbc8 ffffffff81657c67 0000000000000001
[ 1185.666854] 0000000000000000 ffff883fcaa8fc08 ffffffff8107853a ffff883fcaa8fc18
[ 1185.675992] ffff881fed9263e0 ffff881f30480540 ffff881f30480528 ffff881f36a00000
[ 1185.685007] Call Trace:
[ 1185.688536] [<ffffffff81657c67>] dump_stack+0x45/0x57
[ 1185.695026] [<ffffffff8107853a>] warn_slowpath_common+0x8a/0xc0
[ 1185.702464] [<ffffffff8107862a>] warn_slowpath_null+0x1a/0x20
[ 1185.709815] [<ffffffffa06674e8>] nfsd4_process_open2+0xec8/0x1270 [nfsd]
[ 1185.718139] [<ffffffffa0666382>] ? nfsd4_process_open1+0x152/0x3f0 [nfsd]
[ 1185.726595] [<ffffffffa06562a2>] nfsd4_open+0x532/0x830 [nfsd]
[ 1185.733965] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
[ 1185.742185] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
[ 1185.749733] [<ffffffffa018cca2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
[ 1185.758348] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
[ 1185.766675] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
[ 1185.774298] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
[ 1185.781001] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1185.788538] [<ffffffff810968b9>] kthread+0xc9/0xe0
[ 1185.794734] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1185.802733] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
[ 1185.809469] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1185.817458] ---[ end trace ae15837291fe646a ]---
[ 1186.313360] ------------[ cut here ]------------
[ 1186.319334] WARNING: CPU: 6 PID: 30156 at fs/nfsd/nfs4state.c:3961 nfsd4_process_open2+0xe6a/0x1270 [nfsd]()
[ 1186.331134] Modules linked in: gfs2 dlm sctp drbd(OE) cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x86_pkg_temp_thermal coretemp dm_multipath kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nfsd lrw gf128mul ipmi_devintf glue_helper iTCO_wdt ablk_helper cryptd iTCO_vendor_support dcdbas auth_rpcgss ipmi_si pcspkr mei_me sb_edac acpi_power_meter mei shpchp wmi lpc_ich edac_core mfd_core ipmi_msghandler acpi_pad nfs_acl lockd grace sunrpc binfmt_misc xfs mgag200 sr_mod syscopyarea sysfillrect cdrom sysimgblt i2c_algo_bit drm_kms_helper
[ 1186.417242] ttm drm sd_mod ixgbe ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 1186.431744] CPU: 6 PID: 30156 Comm: nfsd Tainted: G W OE 4.1.3_7 #1
[ 1186.440610] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
[ 1186.449869] ffffffffa067558e ffff883fceef3bc8 ffffffff81657c67 0000000000000001
[ 1186.459097] 0000000000000000 ffff883fceef3c08 ffffffff8107853a ffff881f6e414540
[ 1186.468328] ffff881feee3e3e0 ffff881f6e414540 ffff881f3912e000 ffff881f3c3c0000
[ 1186.477507] Call Trace:
[ 1186.481105] [<ffffffff81657c67>] dump_stack+0x45/0x57
[ 1186.487731] [<ffffffff8107853a>] warn_slowpath_common+0x8a/0xc0
[ 1186.495273] [<ffffffff8107862a>] warn_slowpath_null+0x1a/0x20
[ 1186.502640] [<ffffffffa066748a>] nfsd4_process_open2+0xe6a/0x1270 [nfsd]
[ 1186.511023] [<ffffffffa06562a2>] nfsd4_open+0x532/0x830 [nfsd]
[ 1186.518431] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
[ 1186.526697] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
[ 1186.534252] [<ffffffffa018cca2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
[ 1186.542870] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
[ 1186.551210] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
[ 1186.558839] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
[ 1186.565517] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1186.573028] [<ffffffff810968b9>] kthread+0xc9/0xe0
[ 1186.579186] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1186.587220] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
[ 1186.593951] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1186.601996] ---[ end trace ae15837291fe646b ]---
[ 1308.795050] ------------[ cut here ]------------
[ 1308.801044] WARNING: CPU: 24 PID: 30158 at fs/nfsd/nfs4state.c:4199 nfsd4_process_open2+0x1261/0x1270 [nfsd]()
[ 1308.812986] Modules linked in: gfs2 dlm sctp drbd(OE) cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x86_pkg_temp_thermal coretemp dm_multipath kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nfsd lrw gf128mul ipmi_devintf glue_helper iTCO_wdt ablk_helper cryptd iTCO_vendor_support dcdbas auth_rpcgss ipmi_si pcspkr mei_me sb_edac acpi_power_meter mei shpchp wmi lpc_ich edac_core mfd_core ipmi_msghandler acpi_pad nfs_acl lockd grace sunrpc binfmt_misc xfs mgag200 sr_mod syscopyarea sysfillrect cdrom sysimgblt i2c_algo_bit drm_kms_helper
[ 1308.900250] ttm drm sd_mod ixgbe ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 1308.915284] CPU: 24 PID: 30158 Comm: nfsd Tainted: G W OE 4.1.3_7 #1
[ 1308.924255] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
[ 1308.933531] ffffffffa067558e ffff883fce893bc8 ffffffff81657c67 0000000000000001
[ 1308.942954] 0000000000000000 ffff883fce893c08 ffffffff8107853a ffff883fce893c18
[ 1308.952290] ffff881fed9203e0 0000000018270000 ffff881f40eec948 ffff881f36a02af8
[ 1308.961644] Call Trace:
[ 1308.965298] [<ffffffff81657c67>] dump_stack+0x45/0x57
[ 1308.971913] [<ffffffff8107853a>] warn_slowpath_common+0x8a/0xc0
[ 1308.979547] [<ffffffff8107862a>] warn_slowpath_null+0x1a/0x20
[ 1308.986924] [<ffffffffa0667881>] nfsd4_process_open2+0x1261/0x1270 [nfsd]
[ 1308.995447] [<ffffffffa0666382>] ? nfsd4_process_open1+0x152/0x3f0 [nfsd]
[ 1309.003978] [<ffffffffa06562a2>] nfsd4_open+0x532/0x830 [nfsd]
[ 1309.011414] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
[ 1309.019704] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
[ 1309.027368] [<ffffffffa018cca2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
[ 1309.036209] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
[ 1309.044589] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
[ 1309.052266] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
[ 1309.059143] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1309.066775] [<ffffffff810968b9>] kthread+0xc9/0xe0
[ 1309.073100] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1309.081303] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
[ 1309.088088] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1309.096314] ---[ end trace ae15837291fe646c ]---
[ 1309.102259] ------------[ cut here ]------------
[ 1309.108271] WARNING: CPU: 24 PID: 30158 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
[ 1309.118421] list_del corruption, ffff881f36a02b28->next is LIST_POISON1 (dead000000100100)
[ 1309.128541] Modules linked in: gfs2 dlm sctp drbd(OE) cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x86_pkg_temp_thermal coretemp dm_multipath kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nfsd lrw gf128mul ipmi_devintf glue_helper iTCO_wdt ablk_helper cryptd iTCO_vendor_support dcdbas auth_rpcgss ipmi_si pcspkr mei_me sb_edac acpi_power_meter mei shpchp wmi lpc_ich edac_core mfd_core ipmi_msghandler acpi_pad nfs_acl lockd grace sunrpc binfmt_misc xfs mgag200 sr_mod syscopyarea sysfillrect cdrom sysimgblt i2c_algo_bit drm_kms_helper
[ 1309.214812] ttm drm sd_mod ixgbe ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 1309.229585] CPU: 24 PID: 30158 Comm: nfsd Tainted: G W OE 4.1.3_7 #1
[ 1309.238657] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
[ 1309.248021] ffffffff818c30d0 ffff883fce893b28 ffffffff81657c67 0000000000000001
[ 1309.257325] ffff883fce893b78 ffff883fce893b68 ffffffff8107853a ffffffffa0667881
[ 1309.266608] ffff881f36a02b28 ffff881f40eec94c ffff881f40eec948 ffff881f36a02af8
[ 1309.275896] Call Trace:
[ 1309.279613] [<ffffffff81657c67>] dump_stack+0x45/0x57
[ 1309.286319] [<ffffffff8107853a>] warn_slowpath_common+0x8a/0xc0
[ 1309.293968] [<ffffffffa0667881>] ? nfsd4_process_open2+0x1261/0x1270 [nfsd]
[ 1309.302753] [<ffffffff810785b6>] warn_slowpath_fmt+0x46/0x50
[ 1309.310052] [<ffffffff81313bb3>] __list_del_entry+0x63/0xd0
[ 1309.317229] [<ffffffff81313c31>] list_del+0x11/0x40
[ 1309.323613] [<ffffffffa06611f8>] unhash_ol_stateid+0x28/0x40 [nfsd]
[ 1309.331547] [<ffffffffa0666e76>] nfsd4_process_open2+0x856/0x1270 [nfsd]
[ 1309.339956] [<ffffffffa0666382>] ? nfsd4_process_open1+0x152/0x3f0 [nfsd]
[ 1309.348496] [<ffffffffa06562a2>] nfsd4_open+0x532/0x830 [nfsd]
[ 1309.355943] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
[ 1309.364257] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
[ 1309.371915] [<ffffffffa018cca2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
[ 1309.380609] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
[ 1309.389008] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
[ 1309.396761] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
[ 1309.403516] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1309.411179] [<ffffffff810968b9>] kthread+0xc9/0xe0
[ 1309.417487] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1309.425586] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
[ 1309.432505] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1309.440602] ---[ end trace ae15837291fe646d ]---
[ 1309.446563] ------------[ cut here ]------------
[ 1309.452536] WARNING: CPU: 24 PID: 30158 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
[ 1309.462706] list_del corruption, ffff881f36a02b38->next is LIST_POISON1 (dead000000100100)
[ 1309.472780] Modules linked in: gfs2 dlm sctp drbd(OE) cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x86_pkg_temp_thermal coretemp dm_multipath kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nfsd lrw gf128mul ipmi_devintf glue_helper iTCO_wdt ablk_helper cryptd iTCO_vendor_support dcdbas auth_rpcgss ipmi_si pcspkr mei_me sb_edac acpi_power_meter mei shpchp wmi lpc_ich edac_core mfd_core ipmi_msghandler acpi_pad nfs_acl lockd grace sunrpc binfmt_misc xfs mgag200 sr_mod syscopyarea sysfillrect cdrom sysimgblt i2c_algo_bit drm_kms_helper
[ 1309.559071] ttm drm sd_mod ixgbe ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 1309.573829] CPU: 24 PID: 30158 Comm: nfsd Tainted: G W OE 4.1.3_7 #1
[ 1309.582892] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
[ 1309.592198] ffffffff818c30d0 ffff883fce893b28 ffffffff81657c67 0000000000000001
[ 1309.601476] ffff883fce893b78 ffff883fce893b68 ffffffff8107853a ffffffffa0667881
[ 1309.610680] ffff881f36a02b38 ffff881f40eec94c ffff881f40eec948 ffff881f36a02af8
[ 1309.619877] Call Trace:
[ 1309.623476] [<ffffffff81657c67>] dump_stack+0x45/0x57
[ 1309.630108] [<ffffffff8107853a>] warn_slowpath_common+0x8a/0xc0
[ 1309.637666] [<ffffffffa0667881>] ? nfsd4_process_open2+0x1261/0x1270 [nfsd]
[ 1309.646410] [<ffffffff810785b6>] warn_slowpath_fmt+0x46/0x50
[ 1309.653674] [<ffffffff81313bb3>] __list_del_entry+0x63/0xd0
[ 1309.660854] [<ffffffff81313c31>] list_del+0x11/0x40
[ 1309.667228] [<ffffffffa0661209>] unhash_ol_stateid+0x39/0x40 [nfsd]
[ 1309.675185] [<ffffffffa0666e76>] nfsd4_process_open2+0x856/0x1270 [nfsd]
[ 1309.683646] [<ffffffffa0666382>] ? nfsd4_process_open1+0x152/0x3f0 [nfsd]
[ 1309.692153] [<ffffffffa06562a2>] nfsd4_open+0x532/0x830 [nfsd]
[ 1309.699604] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
[ 1309.707941] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
[ 1309.715573] [<ffffffffa018cca2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
[ 1309.724270] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
[ 1309.732680] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
[ 1309.740395] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
[ 1309.747180] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1309.754811] [<ffffffff810968b9>] kthread+0xc9/0xe0
[ 1309.761065] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1309.769166] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
[ 1309.776005] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1309.784130] ---[ end trace ae15837291fe646e ]---
[ 1309.790143] ------------[ cut here ]------------
[ 1309.796195] WARNING: CPU: 30 PID: 30158 at fs/nfsd/nfs4state.c:737 nfs4_put_stid+0xa8/0xc0 [nfsd]()
[ 1309.807211] Modules linked in: gfs2 dlm sctp drbd(OE) cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x86_pkg_temp_thermal coretemp dm_multipath kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nfsd lrw gf128mul ipmi_devintf glue_helper iTCO_wdt ablk_helper cryptd iTCO_vendor_support dcdbas auth_rpcgss ipmi_si pcspkr mei_me sb_edac acpi_power_meter mei shpchp wmi lpc_ich edac_core mfd_core ipmi_msghandler acpi_pad nfs_acl lockd grace sunrpc binfmt_misc xfs mgag200 sr_mod syscopyarea sysfillrect cdrom sysimgblt i2c_algo_bit drm_kms_helper
[ 1309.893143] ttm drm sd_mod ixgbe ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
[ 1309.907929] CPU: 30 PID: 30158 Comm: nfsd Tainted: G W OE 4.1.3_7 #1
[ 1309.916955] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
[ 1309.926279] ffffffffa067558e ffff883fce893b98 ffffffff81657c67 0000000000000001
[ 1309.935530] 0000000000000000 ffff883fce893bd8 ffffffff8107853a ffff883fce893be8
[ 1309.944766] ffff881f36a02af8 0000000018270000 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b
[ 1309.953970] Call Trace:
[ 1309.957513] [<ffffffff81657c67>] dump_stack+0x45/0x57
[ 1309.964086] [<ffffffff8107853a>] warn_slowpath_common+0x8a/0xc0
[ 1309.971594] [<ffffffff8107862a>] warn_slowpath_null+0x1a/0x20
[ 1309.978884] [<ffffffffa0663a58>] nfs4_put_stid+0xa8/0xc0 [nfsd]
[ 1309.986374] [<ffffffffa0666a90>] nfsd4_process_open2+0x470/0x1270 [nfsd]
[ 1309.994754] [<ffffffffa0666382>] ? nfsd4_process_open1+0x152/0x3f0 [nfsd]
[ 1310.003188] [<ffffffffa06562a2>] nfsd4_open+0x532/0x830 [nfsd]
[ 1310.010558] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
[ 1310.018835] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
[ 1310.026392] [<ffffffffa018cca2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
[ 1310.035000] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
[ 1310.043332] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
[ 1310.050996] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
[ 1310.057686] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1310.065198] [<ffffffff810968b9>] kthread+0xc9/0xe0
[ 1310.071380] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1310.079371] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
[ 1310.086137] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1310.094164] ---[ end trace ae15837291fe646f ]---
[ 1310.236482] =============================================================================
[ 1310.246335] BUG nfsd4_stateids (Tainted: G W OE ): Poison overwritten
[ 1310.255124] -----------------------------------------------------------------------------

[ 1310.267511] Disabling lock debugging due to kernel taint
[ 1310.274197] INFO: 0xffff881f36a02af8-0xffff881f36a02af8. First byte 0x6a instead of 0x6b
[ 1310.284073] INFO: Allocated in nfs4_alloc_stid+0x29/0xc0 [nfsd] age=430 cpu=2 pid=30156
[ 1310.293768] __slab_alloc+0x3f6/0x477
[ 1310.298649] kmem_cache_alloc+0x123/0x160
[ 1310.303884] nfs4_alloc_stid+0x29/0xc0 [nfsd]
[ 1310.309506] nfsd4_process_open1+0x152/0x3f0 [nfsd]
[ 1310.315708] nfsd4_open+0xc9/0x830 [nfsd]
[ 1310.320933] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
[ 1310.327269] nfsd_dispatch+0xc3/0x210 [nfsd]
[ 1310.332778] svc_process_common+0x440/0x6d0 [sunrpc]
[ 1310.339046] svc_process+0x113/0x1b0 [sunrpc]
[ 1310.344646] nfsd+0xff/0x170 [nfsd]
[ 1310.349235] kthread+0xc9/0xe0
[ 1310.353296] ret_from_fork+0x42/0x70
[ 1310.357923] INFO: Freed in nfs4_free_ol_stateid+0x45/0x70 [nfsd] age=486 cpu=4 pid=30155
[ 1310.367621] __slab_free+0x39/0x1df
[ 1310.372124] kmem_cache_free+0x123/0x150
[ 1310.377098] nfs4_free_ol_stateid+0x45/0x70 [nfsd]
[ 1310.383026] nfs4_put_stid+0x51/0xc0 [nfsd]
[ 1310.388311] nfsd4_close+0x21b/0x290 [nfsd]
[ 1310.393548] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
[ 1310.399560] nfsd_dispatch+0xc3/0x210 [nfsd]
[ 1310.404888] svc_process_common+0x440/0x6d0 [sunrpc]
[ 1310.410986] svc_process+0x113/0x1b0 [sunrpc]
[ 1310.416399] nfsd+0xff/0x170 [nfsd]
[ 1310.420836] kthread+0xc9/0xe0
[ 1310.424803] ret_from_fork+0x42/0x70
[ 1310.429335] INFO: Slab 0xffffea007cda8000 objects=37 used=37 fp=0x (null) flags=0x2fffff80004080
[ 1310.440578] INFO: Object 0xffff881f36a02af8 @offset=11000 fp=0xffff881f36a03020

[ 1310.451557] Bytes b4 ffff881f36a02ae8: 18 00 00 00 ce 75 00 00 40 22 0f 00 01 00 00 00 .....u..@"......
[ 1310.462788] Object ffff881f36a02af8: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b jkkkkkkkkkkkkkkk
[ 1310.473775] Object ffff881f36a02b08: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 1310.484769] Object ffff881f36a02b18: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 1310.495734] Object ffff881f36a02b28: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 1310.506693] Object ffff881f36a02b38: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 1310.517662] Object ffff881f36a02b48: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 1310.528604] Object ffff881f36a02b58: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
[ 1310.539576] Object ffff881f36a02b68: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
[ 1310.550523] CPU: 24 PID: 30158 Comm: nfsd Tainted: G B W OE 4.1.3_7 #1
[ 1310.559151] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
[ 1310.568074] ffff881f36a02af8 ffff883fce893a28 ffffffff81657c67 00000000000001b8
[ 1310.576934] ffff881ff1536100 ffff883fce893a68 ffffffff811ca65d 0000000000000080
[ 1310.585768] ffff881f00000001 ffff881f36a02af9 ffff881ff1536100 000000000000006b
[ 1310.594593] Call Trace:
[ 1310.597855] [<ffffffff81657c67>] dump_stack+0x45/0x57
[ 1310.604131] [<ffffffff811ca65d>] print_trailer+0x14d/0x200
[ 1310.610952] [<ffffffff811ca7df>] check_bytes_and_report+0xcf/0x110
[ 1310.618458] [<ffffffff811cb547>] check_object+0x1d7/0x250
[ 1310.625179] [<ffffffffa0663919>] ? nfs4_alloc_stid+0x29/0xc0 [nfsd]
[ 1310.632799] [<ffffffff81655418>] alloc_debug_processing+0x76/0x118
[ 1310.640325] [<ffffffff81655f44>] __slab_alloc+0x3f6/0x477
[ 1310.646945] [<ffffffffa0663919>] ? nfs4_alloc_stid+0x29/0xc0 [nfsd]
[ 1310.654535] [<ffffffff8109aab4>] ? groups_alloc+0x34/0x110
[ 1310.661300] [<ffffffffa0663919>] ? nfs4_alloc_stid+0x29/0xc0 [nfsd]
[ 1310.668894] [<ffffffff811cc0e3>] kmem_cache_alloc+0x123/0x160
[ 1310.675928] [<ffffffffa0663919>] nfs4_alloc_stid+0x29/0xc0 [nfsd]
[ 1310.683463] [<ffffffffa0666382>] nfsd4_process_open1+0x152/0x3f0 [nfsd]
[ 1310.691472] [<ffffffffa0655e39>] nfsd4_open+0xc9/0x830 [nfsd]
[ 1310.698493] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
[ 1310.706522] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
[ 1310.713844] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
[ 1310.721937] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
[ 1310.729338] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
[ 1310.735782] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
[ 1310.743085] [<ffffffff810968b9>] kthread+0xc9/0xe0
[ 1310.749034] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1310.756845] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
[ 1310.763397] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
[ 1310.771195] FIX nfsd4_stateids: Restoring 0xffff881f36a02af8-0xffff881f36a02af8=0x6b

[ 1310.782626] FIX nfsd4_stateids: Marking all objects used

--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2015-07-30 12:57:28

by Jeff Layton

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()

On Thu, 30 Jul 2015 07:11:37 -0400
Andrew W Elble <[email protected]> wrote:

>
> Update: can get to fail with slub_debug=FPU,nfsd4_stateids
>
> The results seem to be in line with Jeff's theory.
>
> The patch applied to test:
>
> =========================
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 039f9c8a95e8..0b3a9637d199 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -734,6 +734,8 @@ nfs4_put_stid(struct nfs4_stid *s)
>
> might_lock(&clp->cl_lock);
>
> + WARN_ON_ONCE(s->sc_type == 0x6b);
> +
> if (!atomic_dec_and_lock(&s->sc_count, &clp->cl_lock)) {
> wake_up_all(&close_wq);
> return;
> @@ -939,6 +941,23 @@ static int nfs4_access_to_omode(u32 access)
> return O_RDONLY;
> }
>
> +static int warn_on_nofile(struct nfs4_file *fp, struct nfsd4_open *open)
> +{
> + struct file *filp = NULL;
> + int oflag = nfs4_access_to_omode(open->op_share_access);
> +
> + spin_lock(&fp->fi_lock);
> + filp = fp->fi_fds[oflag];
> + spin_unlock(&fp->fi_lock);
> +

nit: probably don't need the spinlock just to do a NULL check.

> + if (filp)
> + return 1;
> +
> + return 0;
> +}
> +
> +
> +
> /*
> * A stateid that had a deny mode associated with it is being released
> * or downgraded. Recalculate the deny mode on the file.
> @@ -1024,6 +1043,7 @@ static void nfs4_free_ol_stateid(struct nfs4_stid *stid)
> release_all_access(stp);
> if (stp->st_stateowner)
> nfs4_put_stateowner(stp->st_stateowner);
> + WARN_ON_ONCE(atomic_read(&stid->sc_count) > 0);
> kmem_cache_free(stateid_slab, stid);
> }
>
> @@ -1054,6 +1074,8 @@ static void put_ol_stateid_locked(struct nfs4_ol_stateid *stp,
>
> WARN_ON_ONCE(!list_empty(&stp->st_locks));
>
> + WARN_ON_ONCE(s->sc_type == 0x6b);
> +
> if (!atomic_dec_and_test(&s->sc_count)) {
> wake_up_all(&close_wq);
> return;
> @@ -3866,6 +3888,8 @@ nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct svc_fh *c
> if (!test_access(open->op_share_access, stp))
> return nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open);
>
> + WARN_ON_ONCE(!warn_on_nofile(fp, open));
> +

Ok, it looks like the first warning came from here. So what we should
probably aim for is to make this caller wait on the construction of the
filp before proceeding.

Or we could just break this block out of the nfs4_get_vfs_file and call
it from here (with suitable cleanup to the spinlocking):

------------[snip]-------------
if (!fp->fi_fds[oflag]) {
spin_unlock(&fp->fi_lock);
status = nfsd_open(rqstp, cur_fh, S_IFREG, access, &filp);
if (status)
goto out_put_access;
spin_lock(&fp->fi_lock);
if (!fp->fi_fds[oflag]) {
fp->fi_fds[oflag] = filp;
filp = NULL;
}
}
------------[snip]-------------

That should be safe enough, I'd think. You might end up with multiple
tasks racing to open the file, but they'll just tear down the filp they
got if someone else gets there first.

FWIW, I'll be posting a patchset soon that adds a new open file cache
for nfsv2/3. It should be possible to make the nfs4_file cache also use
it. That could make this a lot cleaner as it already has some support
for handling multiple threads that want a filp at the same time like
this.

In any case, I think this explains where the "no readable file" warning
is coming from, but I'm not sure yet about the mem corruption...

> /* test and set deny mode */
> spin_lock(&fp->fi_lock);
> status = nfs4_file_check_deny(fp, open->op_share_deny);
> @@ -3935,6 +3959,7 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
> if (!filp) {
> /* We should always have a readable file here */
> WARN_ON_ONCE(1);
> + locks_free_lock(fl);
> return -EBADF;
> }
> fl->fl_file = filp;
> @@ -4171,10 +4196,13 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
> init_open_stateid(stp, fp, open);
> status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
> if (status) {
> + WARN_ON_ONCE(!warn_on_nofile(fp, open));
> release_open_stateid(stp);
> goto out;
> }
>
> + WARN_ON_ONCE(!warn_on_nofile(fp, open));
> +
> stp->st_clnt_odstate = find_or_hash_clnt_odstate(fp,
> open->op_odstate);
> if (stp->st_clnt_odstate == open->op_odstate)
>
>
>
> =========================
>
> The results:
>
> [ 1185.524196] ------------[ cut here ]------------
> [ 1185.529940] WARNING: CPU: 10 PID: 30157 at fs/nfsd/nfs4state.c:3891 nfsd4_process_open2+0xec8/0x1270 [nfsd]()
> [ 1185.541634] Modules linked in: gfs2 dlm sctp drbd(OE) cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x86_pkg_temp_thermal coretemp dm_multipath kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nfsd lrw gf128mul ipmi_devintf glue_helper iTCO_wdt ablk_helper cryptd iTCO_vendor_support dcdbas auth_rpcgss ipmi_si pcspkr mei_me sb_edac acpi_power_meter mei shpchp wmi lpc_ich edac_core mfd_core ipmi_msghandler acpi_pad nfs_acl lockd grace sunrpc binfmt_misc xfs mgag200 sr_mod syscopyarea sysfillrect cdrom sysimgblt i2c_algo_bit drm_kms_helper
> [ 1185.625780] ttm drm sd_mod ixgbe ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
> [ 1185.639956] CPU: 10 PID: 30157 Comm: nfsd Tainted: G OE 4.1.3_7 #1
> [ 1185.648711] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
> [ 1185.657864] ffffffffa067558e ffff883fcaa8fbc8 ffffffff81657c67 0000000000000001
> [ 1185.666854] 0000000000000000 ffff883fcaa8fc08 ffffffff8107853a ffff883fcaa8fc18
> [ 1185.675992] ffff881fed9263e0 ffff881f30480540 ffff881f30480528 ffff881f36a00000
> [ 1185.685007] Call Trace:
> [ 1185.688536] [<ffffffff81657c67>] dump_stack+0x45/0x57
> [ 1185.695026] [<ffffffff8107853a>] warn_slowpath_common+0x8a/0xc0
> [ 1185.702464] [<ffffffff8107862a>] warn_slowpath_null+0x1a/0x20
> [ 1185.709815] [<ffffffffa06674e8>] nfsd4_process_open2+0xec8/0x1270 [nfsd]
> [ 1185.718139] [<ffffffffa0666382>] ? nfsd4_process_open1+0x152/0x3f0 [nfsd]
> [ 1185.726595] [<ffffffffa06562a2>] nfsd4_open+0x532/0x830 [nfsd]
> [ 1185.733965] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
> [ 1185.742185] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
> [ 1185.749733] [<ffffffffa018cca2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
> [ 1185.758348] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
> [ 1185.766675] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
> [ 1185.774298] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
> [ 1185.781001] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
> [ 1185.788538] [<ffffffff810968b9>] kthread+0xc9/0xe0
> [ 1185.794734] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1185.802733] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
> [ 1185.809469] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1185.817458] ---[ end trace ae15837291fe646a ]---
> [ 1186.313360] ------------[ cut here ]------------
> [ 1186.319334] WARNING: CPU: 6 PID: 30156 at fs/nfsd/nfs4state.c:3961 nfsd4_process_open2+0xe6a/0x1270 [nfsd]()
> [ 1186.331134] Modules linked in: gfs2 dlm sctp drbd(OE) cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x86_pkg_temp_thermal coretemp dm_multipath kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nfsd lrw gf128mul ipmi_devintf glue_helper iTCO_wdt ablk_helper cryptd iTCO_vendor_support dcdbas auth_rpcgss ipmi_si pcspkr mei_me sb_edac acpi_power_meter mei shpchp wmi lpc_ich edac_core mfd_core ipmi_msghandler acpi_pad nfs_acl lockd grace sunrpc binfmt_misc xfs mgag200 sr_mod syscopyarea sysfillrect cdrom sysimgblt i2c_algo_bit drm_kms_helper
> [ 1186.417242] ttm drm sd_mod ixgbe ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
> [ 1186.431744] CPU: 6 PID: 30156 Comm: nfsd Tainted: G W OE 4.1.3_7 #1
> [ 1186.440610] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
> [ 1186.449869] ffffffffa067558e ffff883fceef3bc8 ffffffff81657c67 0000000000000001
> [ 1186.459097] 0000000000000000 ffff883fceef3c08 ffffffff8107853a ffff881f6e414540
> [ 1186.468328] ffff881feee3e3e0 ffff881f6e414540 ffff881f3912e000 ffff881f3c3c0000
> [ 1186.477507] Call Trace:
> [ 1186.481105] [<ffffffff81657c67>] dump_stack+0x45/0x57
> [ 1186.487731] [<ffffffff8107853a>] warn_slowpath_common+0x8a/0xc0
> [ 1186.495273] [<ffffffff8107862a>] warn_slowpath_null+0x1a/0x20
> [ 1186.502640] [<ffffffffa066748a>] nfsd4_process_open2+0xe6a/0x1270 [nfsd]
> [ 1186.511023] [<ffffffffa06562a2>] nfsd4_open+0x532/0x830 [nfsd]
> [ 1186.518431] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
> [ 1186.526697] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
> [ 1186.534252] [<ffffffffa018cca2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
> [ 1186.542870] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
> [ 1186.551210] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
> [ 1186.558839] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
> [ 1186.565517] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
> [ 1186.573028] [<ffffffff810968b9>] kthread+0xc9/0xe0
> [ 1186.579186] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1186.587220] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
> [ 1186.593951] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1186.601996] ---[ end trace ae15837291fe646b ]---
> [ 1308.795050] ------------[ cut here ]------------
> [ 1308.801044] WARNING: CPU: 24 PID: 30158 at fs/nfsd/nfs4state.c:4199 nfsd4_process_open2+0x1261/0x1270 [nfsd]()
> [ 1308.812986] Modules linked in: gfs2 dlm sctp drbd(OE) cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x86_pkg_temp_thermal coretemp dm_multipath kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nfsd lrw gf128mul ipmi_devintf glue_helper iTCO_wdt ablk_helper cryptd iTCO_vendor_support dcdbas auth_rpcgss ipmi_si pcspkr mei_me sb_edac acpi_power_meter mei shpchp wmi lpc_ich edac_core mfd_core ipmi_msghandler acpi_pad nfs_acl lockd grace sunrpc binfmt_misc xfs mgag200 sr_mod syscopyarea sysfillrect cdrom sysimgblt i2c_algo_bit drm_kms_helper
> [ 1308.900250] ttm drm sd_mod ixgbe ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
> [ 1308.915284] CPU: 24 PID: 30158 Comm: nfsd Tainted: G W OE 4.1.3_7 #1
> [ 1308.924255] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
> [ 1308.933531] ffffffffa067558e ffff883fce893bc8 ffffffff81657c67 0000000000000001
> [ 1308.942954] 0000000000000000 ffff883fce893c08 ffffffff8107853a ffff883fce893c18
> [ 1308.952290] ffff881fed9203e0 0000000018270000 ffff881f40eec948 ffff881f36a02af8
> [ 1308.961644] Call Trace:
> [ 1308.965298] [<ffffffff81657c67>] dump_stack+0x45/0x57
> [ 1308.971913] [<ffffffff8107853a>] warn_slowpath_common+0x8a/0xc0
> [ 1308.979547] [<ffffffff8107862a>] warn_slowpath_null+0x1a/0x20
> [ 1308.986924] [<ffffffffa0667881>] nfsd4_process_open2+0x1261/0x1270 [nfsd]
> [ 1308.995447] [<ffffffffa0666382>] ? nfsd4_process_open1+0x152/0x3f0 [nfsd]
> [ 1309.003978] [<ffffffffa06562a2>] nfsd4_open+0x532/0x830 [nfsd]
> [ 1309.011414] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
> [ 1309.019704] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
> [ 1309.027368] [<ffffffffa018cca2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
> [ 1309.036209] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
> [ 1309.044589] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
> [ 1309.052266] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
> [ 1309.059143] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
> [ 1309.066775] [<ffffffff810968b9>] kthread+0xc9/0xe0
> [ 1309.073100] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1309.081303] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
> [ 1309.088088] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1309.096314] ---[ end trace ae15837291fe646c ]---
> [ 1309.102259] ------------[ cut here ]------------
> [ 1309.108271] WARNING: CPU: 24 PID: 30158 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
> [ 1309.118421] list_del corruption, ffff881f36a02b28->next is LIST_POISON1 (dead000000100100)
> [ 1309.128541] Modules linked in: gfs2 dlm sctp drbd(OE) cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x86_pkg_temp_thermal coretemp dm_multipath kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nfsd lrw gf128mul ipmi_devintf glue_helper iTCO_wdt ablk_helper cryptd iTCO_vendor_support dcdbas auth_rpcgss ipmi_si pcspkr mei_me sb_edac acpi_power_meter mei shpchp wmi lpc_ich edac_core mfd_core ipmi_msghandler acpi_pad nfs_acl lockd grace sunrpc binfmt_misc xfs mgag200 sr_mod syscopyarea sysfillrect cdrom sysimgblt i2c_algo_bit drm_kms_helper
> [ 1309.214812] ttm drm sd_mod ixgbe ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
> [ 1309.229585] CPU: 24 PID: 30158 Comm: nfsd Tainted: G W OE 4.1.3_7 #1
> [ 1309.238657] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
> [ 1309.248021] ffffffff818c30d0 ffff883fce893b28 ffffffff81657c67 0000000000000001
> [ 1309.257325] ffff883fce893b78 ffff883fce893b68 ffffffff8107853a ffffffffa0667881
> [ 1309.266608] ffff881f36a02b28 ffff881f40eec94c ffff881f40eec948 ffff881f36a02af8
> [ 1309.275896] Call Trace:
> [ 1309.279613] [<ffffffff81657c67>] dump_stack+0x45/0x57
> [ 1309.286319] [<ffffffff8107853a>] warn_slowpath_common+0x8a/0xc0
> [ 1309.293968] [<ffffffffa0667881>] ? nfsd4_process_open2+0x1261/0x1270 [nfsd]
> [ 1309.302753] [<ffffffff810785b6>] warn_slowpath_fmt+0x46/0x50
> [ 1309.310052] [<ffffffff81313bb3>] __list_del_entry+0x63/0xd0
> [ 1309.317229] [<ffffffff81313c31>] list_del+0x11/0x40
> [ 1309.323613] [<ffffffffa06611f8>] unhash_ol_stateid+0x28/0x40 [nfsd]
> [ 1309.331547] [<ffffffffa0666e76>] nfsd4_process_open2+0x856/0x1270 [nfsd]
> [ 1309.339956] [<ffffffffa0666382>] ? nfsd4_process_open1+0x152/0x3f0 [nfsd]
> [ 1309.348496] [<ffffffffa06562a2>] nfsd4_open+0x532/0x830 [nfsd]
> [ 1309.355943] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
> [ 1309.364257] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
> [ 1309.371915] [<ffffffffa018cca2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
> [ 1309.380609] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
> [ 1309.389008] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
> [ 1309.396761] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
> [ 1309.403516] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
> [ 1309.411179] [<ffffffff810968b9>] kthread+0xc9/0xe0
> [ 1309.417487] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1309.425586] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
> [ 1309.432505] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1309.440602] ---[ end trace ae15837291fe646d ]---
> [ 1309.446563] ------------[ cut here ]------------
> [ 1309.452536] WARNING: CPU: 24 PID: 30158 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
> [ 1309.462706] list_del corruption, ffff881f36a02b38->next is LIST_POISON1 (dead000000100100)
> [ 1309.472780] Modules linked in: gfs2 dlm sctp drbd(OE) cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x86_pkg_temp_thermal coretemp dm_multipath kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nfsd lrw gf128mul ipmi_devintf glue_helper iTCO_wdt ablk_helper cryptd iTCO_vendor_support dcdbas auth_rpcgss ipmi_si pcspkr mei_me sb_edac acpi_power_meter mei shpchp wmi lpc_ich edac_core mfd_core ipmi_msghandler acpi_pad nfs_acl lockd grace sunrpc binfmt_misc xfs mgag200 sr_mod syscopyarea sysfillrect cdrom sysimgblt i2c_algo_bit drm_kms_helper
> [ 1309.559071] ttm drm sd_mod ixgbe ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
> [ 1309.573829] CPU: 24 PID: 30158 Comm: nfsd Tainted: G W OE 4.1.3_7 #1
> [ 1309.582892] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
> [ 1309.592198] ffffffff818c30d0 ffff883fce893b28 ffffffff81657c67 0000000000000001
> [ 1309.601476] ffff883fce893b78 ffff883fce893b68 ffffffff8107853a ffffffffa0667881
> [ 1309.610680] ffff881f36a02b38 ffff881f40eec94c ffff881f40eec948 ffff881f36a02af8
> [ 1309.619877] Call Trace:
> [ 1309.623476] [<ffffffff81657c67>] dump_stack+0x45/0x57
> [ 1309.630108] [<ffffffff8107853a>] warn_slowpath_common+0x8a/0xc0
> [ 1309.637666] [<ffffffffa0667881>] ? nfsd4_process_open2+0x1261/0x1270 [nfsd]
> [ 1309.646410] [<ffffffff810785b6>] warn_slowpath_fmt+0x46/0x50
> [ 1309.653674] [<ffffffff81313bb3>] __list_del_entry+0x63/0xd0
> [ 1309.660854] [<ffffffff81313c31>] list_del+0x11/0x40
> [ 1309.667228] [<ffffffffa0661209>] unhash_ol_stateid+0x39/0x40 [nfsd]
> [ 1309.675185] [<ffffffffa0666e76>] nfsd4_process_open2+0x856/0x1270 [nfsd]
> [ 1309.683646] [<ffffffffa0666382>] ? nfsd4_process_open1+0x152/0x3f0 [nfsd]
> [ 1309.692153] [<ffffffffa06562a2>] nfsd4_open+0x532/0x830 [nfsd]
> [ 1309.699604] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
> [ 1309.707941] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
> [ 1309.715573] [<ffffffffa018cca2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
> [ 1309.724270] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
> [ 1309.732680] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
> [ 1309.740395] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
> [ 1309.747180] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
> [ 1309.754811] [<ffffffff810968b9>] kthread+0xc9/0xe0
> [ 1309.761065] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1309.769166] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
> [ 1309.776005] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1309.784130] ---[ end trace ae15837291fe646e ]---
> [ 1309.790143] ------------[ cut here ]------------
> [ 1309.796195] WARNING: CPU: 30 PID: 30158 at fs/nfsd/nfs4state.c:737 nfs4_put_stid+0xa8/0xc0 [nfsd]()
> [ 1309.807211] Modules linked in: gfs2 dlm sctp drbd(OE) cts rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp stp llc bonding nf_log_ipv4 nf_log_common xt_LOG xt_conntrack iptable_filter nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x86_pkg_temp_thermal coretemp dm_multipath kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel nfsd lrw gf128mul ipmi_devintf glue_helper iTCO_wdt ablk_helper cryptd iTCO_vendor_support dcdbas auth_rpcgss ipmi_si pcspkr mei_me sb_edac acpi_power_meter mei shpchp wmi lpc_ich edac_core mfd_core ipmi_msghandler acpi_pad nfs_acl lockd grace sunrpc binfmt_misc xfs mgag200 sr_mod syscopyarea sysfillrect cdrom sysimgblt i2c_algo_bit drm_kms_helper
> [ 1309.893143] ttm drm sd_mod ixgbe ahci tg3 mdio libahci dca ptp libata megaraid_sas i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod
> [ 1309.907929] CPU: 30 PID: 30158 Comm: nfsd Tainted: G W OE 4.1.3_7 #1
> [ 1309.916955] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
> [ 1309.926279] ffffffffa067558e ffff883fce893b98 ffffffff81657c67 0000000000000001
> [ 1309.935530] 0000000000000000 ffff883fce893bd8 ffffffff8107853a ffff883fce893be8
> [ 1309.944766] ffff881f36a02af8 0000000018270000 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b
> [ 1309.953970] Call Trace:
> [ 1309.957513] [<ffffffff81657c67>] dump_stack+0x45/0x57
> [ 1309.964086] [<ffffffff8107853a>] warn_slowpath_common+0x8a/0xc0
> [ 1309.971594] [<ffffffff8107862a>] warn_slowpath_null+0x1a/0x20
> [ 1309.978884] [<ffffffffa0663a58>] nfs4_put_stid+0xa8/0xc0 [nfsd]
> [ 1309.986374] [<ffffffffa0666a90>] nfsd4_process_open2+0x470/0x1270 [nfsd]
> [ 1309.994754] [<ffffffffa0666382>] ? nfsd4_process_open1+0x152/0x3f0 [nfsd]
> [ 1310.003188] [<ffffffffa06562a2>] nfsd4_open+0x532/0x830 [nfsd]
> [ 1310.010558] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
> [ 1310.018835] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
> [ 1310.026392] [<ffffffffa018cca2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
> [ 1310.035000] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
> [ 1310.043332] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
> [ 1310.050996] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
> [ 1310.057686] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
> [ 1310.065198] [<ffffffff810968b9>] kthread+0xc9/0xe0
> [ 1310.071380] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1310.079371] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
> [ 1310.086137] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1310.094164] ---[ end trace ae15837291fe646f ]---
> [ 1310.236482] =============================================================================
> [ 1310.246335] BUG nfsd4_stateids (Tainted: G W OE ): Poison overwritten
> [ 1310.255124] -----------------------------------------------------------------------------
>
> [ 1310.267511] Disabling lock debugging due to kernel taint
> [ 1310.274197] INFO: 0xffff881f36a02af8-0xffff881f36a02af8. First byte 0x6a instead of 0x6b
> [ 1310.284073] INFO: Allocated in nfs4_alloc_stid+0x29/0xc0 [nfsd] age=430 cpu=2 pid=30156
> [ 1310.293768] __slab_alloc+0x3f6/0x477
> [ 1310.298649] kmem_cache_alloc+0x123/0x160
> [ 1310.303884] nfs4_alloc_stid+0x29/0xc0 [nfsd]
> [ 1310.309506] nfsd4_process_open1+0x152/0x3f0 [nfsd]
> [ 1310.315708] nfsd4_open+0xc9/0x830 [nfsd]
> [ 1310.320933] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
> [ 1310.327269] nfsd_dispatch+0xc3/0x210 [nfsd]
> [ 1310.332778] svc_process_common+0x440/0x6d0 [sunrpc]
> [ 1310.339046] svc_process+0x113/0x1b0 [sunrpc]
> [ 1310.344646] nfsd+0xff/0x170 [nfsd]
> [ 1310.349235] kthread+0xc9/0xe0
> [ 1310.353296] ret_from_fork+0x42/0x70
> [ 1310.357923] INFO: Freed in nfs4_free_ol_stateid+0x45/0x70 [nfsd] age=486 cpu=4 pid=30155
> [ 1310.367621] __slab_free+0x39/0x1df
> [ 1310.372124] kmem_cache_free+0x123/0x150
> [ 1310.377098] nfs4_free_ol_stateid+0x45/0x70 [nfsd]
> [ 1310.383026] nfs4_put_stid+0x51/0xc0 [nfsd]
> [ 1310.388311] nfsd4_close+0x21b/0x290 [nfsd]
> [ 1310.393548] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
> [ 1310.399560] nfsd_dispatch+0xc3/0x210 [nfsd]
> [ 1310.404888] svc_process_common+0x440/0x6d0 [sunrpc]
> [ 1310.410986] svc_process+0x113/0x1b0 [sunrpc]
> [ 1310.416399] nfsd+0xff/0x170 [nfsd]
> [ 1310.420836] kthread+0xc9/0xe0
> [ 1310.424803] ret_from_fork+0x42/0x70
> [ 1310.429335] INFO: Slab 0xffffea007cda8000 objects=37 used=37 fp=0x (null) flags=0x2fffff80004080
> [ 1310.440578] INFO: Object 0xffff881f36a02af8 @offset=11000 fp=0xffff881f36a03020
>
> [ 1310.451557] Bytes b4 ffff881f36a02ae8: 18 00 00 00 ce 75 00 00 40 22 0f 00 01 00 00 00 .....u..@"......
> [ 1310.462788] Object ffff881f36a02af8: 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b jkkkkkkkkkkkkkkk
> [ 1310.473775] Object ffff881f36a02b08: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 1310.484769] Object ffff881f36a02b18: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 1310.495734] Object ffff881f36a02b28: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 1310.506693] Object ffff881f36a02b38: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 1310.517662] Object ffff881f36a02b48: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 1310.528604] Object ffff881f36a02b58: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> [ 1310.539576] Object ffff881f36a02b68: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
> [ 1310.550523] CPU: 24 PID: 30158 Comm: nfsd Tainted: G B W OE 4.1.3_7 #1
> [ 1310.559151] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS 2.2.2 01/16/2014
> [ 1310.568074] ffff881f36a02af8 ffff883fce893a28 ffffffff81657c67 00000000000001b8
> [ 1310.576934] ffff881ff1536100 ffff883fce893a68 ffffffff811ca65d 0000000000000080
> [ 1310.585768] ffff881f00000001 ffff881f36a02af9 ffff881ff1536100 000000000000006b
> [ 1310.594593] Call Trace:
> [ 1310.597855] [<ffffffff81657c67>] dump_stack+0x45/0x57
> [ 1310.604131] [<ffffffff811ca65d>] print_trailer+0x14d/0x200
> [ 1310.610952] [<ffffffff811ca7df>] check_bytes_and_report+0xcf/0x110
> [ 1310.618458] [<ffffffff811cb547>] check_object+0x1d7/0x250
> [ 1310.625179] [<ffffffffa0663919>] ? nfs4_alloc_stid+0x29/0xc0 [nfsd]
> [ 1310.632799] [<ffffffff81655418>] alloc_debug_processing+0x76/0x118
> [ 1310.640325] [<ffffffff81655f44>] __slab_alloc+0x3f6/0x477
> [ 1310.646945] [<ffffffffa0663919>] ? nfs4_alloc_stid+0x29/0xc0 [nfsd]
> [ 1310.654535] [<ffffffff8109aab4>] ? groups_alloc+0x34/0x110
> [ 1310.661300] [<ffffffffa0663919>] ? nfs4_alloc_stid+0x29/0xc0 [nfsd]
> [ 1310.668894] [<ffffffff811cc0e3>] kmem_cache_alloc+0x123/0x160
> [ 1310.675928] [<ffffffffa0663919>] nfs4_alloc_stid+0x29/0xc0 [nfsd]
> [ 1310.683463] [<ffffffffa0666382>] nfsd4_process_open1+0x152/0x3f0 [nfsd]
> [ 1310.691472] [<ffffffffa0655e39>] nfsd4_open+0xc9/0x830 [nfsd]
> [ 1310.698493] [<ffffffffa0656a77>] nfsd4_proc_compound+0x4d7/0x7e0 [nfsd]
> [ 1310.706522] [<ffffffffa0642f83>] nfsd_dispatch+0xc3/0x210 [nfsd]
> [ 1310.713844] [<ffffffffa018bbb0>] svc_process_common+0x440/0x6d0 [sunrpc]
> [ 1310.721937] [<ffffffffa018bf53>] svc_process+0x113/0x1b0 [sunrpc]
> [ 1310.729338] [<ffffffffa064298f>] nfsd+0xff/0x170 [nfsd]
> [ 1310.735782] [<ffffffffa0642890>] ? nfsd_destroy+0x80/0x80 [nfsd]
> [ 1310.743085] [<ffffffff810968b9>] kthread+0xc9/0xe0
> [ 1310.749034] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1310.756845] [<ffffffff8165f2a2>] ret_from_fork+0x42/0x70
> [ 1310.763397] [<ffffffff810967f0>] ? kthread_create_on_node+0x180/0x180
> [ 1310.771195] FIX nfsd4_stateids: Restoring 0xffff881f36a02af8-0xffff881f36a02af8=0x6b
>
> [ 1310.782626] FIX nfsd4_stateids: Marking all objects used
>


--
Jeff Layton <[email protected]>

2015-08-04 20:18:11

by Andrew W Elble

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()


> In any case, I think this explains where the "no readable file" warning
> is coming from, but I'm not sure yet about the mem corruption...

Forgive my shorthand, but I think this is what we're seeing:

open2 close
create 1 (idr)
init 2 (hashed)

close preprocess_seqid 3 (local ref in nfsd4_close)
close_open_stateid 2 -> unhashed (unhashed)

release_open_stateid 1 -> list_del corruption (because unhashed already
-> should still be refcount 2?)

nfs4_put_stid 0 -> destroyed

nfs4_put_stid 0 -> use after free

This also explains the '6a' as the first byte, as the final
nfs4_put_stid will decrement sc_count first. There are other permutations.

Also, the return-with-status from nfs_get_vfs_file() appears to be break_lease()
(much further down) returning -EWOULDBLOCK (in both cases, memory
corruption and the simple warning case)

Thanks,

Andy
--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2015-08-05 15:11:36

by Jeff Layton

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()

On Tue, 04 Aug 2015 16:18:10 -0400
Andrew W Elble <[email protected]> wrote:

>
> > In any case, I think this explains where the "no readable file" warning
> > is coming from, but I'm not sure yet about the mem corruption...
>
> Forgive my shorthand, but I think this is what we're seeing:
>
> open2 close
> create 1 (idr)
> init 2 (hashed)
>

I'm not sure I'm following your shorthand correctly, but, I'm guessing
the above is that you're creating a stateid and then calling
init_open_stateid on it to hash it?

If you can flesh out the description of the race, it might be more
clear.

> close preprocess_seqid 3 (local ref in nfsd4_close)
> close_open_stateid 2 -> unhashed (unhashed)
>

Ok, so here we have the client sending a simultaneous close for the
same stateid? How did it get it in the first place? We won't have sent
the reply yet to the OPEN that this CLOSE is racing with.

Unless...the initial task that created the stateid got _really_
stalled, and another OPEN raced in, found that stateid and was able to
reply quickly. Then the client could send a CLOSE for that second OPEN
before the first opener ever finished.

In any case, we really do need to make subsequent openers of a nfs4_file
wait until it's fully constructed before using it.

> release_open_stateid 1 -> list_del corruption (because unhashed already
> -> should still be refcount 2?)
>

...regardless, I do agree that the release_open_stateid call here is
probably making some assumptions that it shouldn't, and that is probably
where the list corruption is coming in.

If you had multiple racing OPENs then one could find the stateid that
was previously hashed. A simple fix might be to just use list_del_init
calls in unhash_ol_stateid. That should make any second list_del's a
no-op.

Whether that's sufficient to fix the mem corruption, I'm not sure...

> nfs4_put_stid 0 -> destroyed
>
> nfs4_put_stid 0 -> use after free
>
> This also explains the '6a' as the first byte, as the final
> nfs4_put_stid will decrement sc_count first. There are other permutations.
>
> Also, the return-with-status from nfs_get_vfs_file() appears to be break_lease()
> (much further down) returning -EWOULDBLOCK (in both cases, memory
> corruption and the simple warning case)
>
> Thanks,
>
> Andy


--
Jeff Layton <[email protected]>

2015-08-05 16:33:17

by Andrew W Elble

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()


> I'm not sure I'm following your shorthand correctly, but, I'm guessing
> the above is that you're creating a stateid and then calling
> init_open_stateid on it to hash it?
>
> If you can flesh out the description of the race, it might be more
> clear.

Sorry, knew that was a bit terse.

nfsd4_process_open2 nfsd4_close

stp = open->op_stp;
nfs4_alloc_stid, refcount is 1

init_open_stateid(stp, fp, open);
is hashed, refcount is 2

nfs4_preprocess_seqid_op()
finds the same stateid,
refcount is 3

nfsd4_close_open_stateid()
unhashes stateid,
put_ol_stateid_locked() drops refcount
refcount is 2
nfs4_get_vfs_file() returns
with status due to:
nfsd_open()
nfsd_open_break_lease()
break_lease() returning
-EWOULDBLOCK

release_open_stateid()
calls unhash_open_stateid()
attempts to unhash,
WARN generated for list_del

proceeds to call put_ol_stateid_locked()
decrements refcount inappropriately
refcount is now 1.

nfs4_put_stid()
refcount is 0, memory is freed

nfs4_put_stid
calls atomic_dec_and_lock()
use after free
first corrupt byte is 6a
because of atomic decrement
with slub_debug on

> Unless...the initial task that created the stateid got _really_
> stalled, and another OPEN raced in, found that stateid and was able to
> reply quickly. Then the client could send a CLOSE for that second OPEN
> before the first opener ever finished.

Sounds plausible given the environment. Would take more effort to
directly prove. I have directly observed the same stateid being used in
the above race.

> If you had multiple racing OPENs then one could find the stateid that
> was previously hashed. A simple fix might be to just use list_del_init
> calls in unhash_ol_stateid. That should make any second list_del's a
> no-op.
>
> Whether that's sufficient to fix the mem corruption, I'm not sure...

It's sc_count decrementing when the unhashing "fails" or is going to
fail. For instance, this kind of construct will prevent one of the
possible use-after-free scenarios.

static void release_open_stateid(struct nfs4_ol_stateid *stp)
{
LIST_HEAD(reaplist);

spin_lock(&stp->st_stid.sc_client->cl_lock);
if (stp->st_perfile->next != LIST_POISON1) {
unhash_open_stateid(stp, &reaplist);
put_ol_stateid_locked(stp, &reaplist);
}
spin_unlock(&stp->st_stid.sc_client->cl_lock);
free_ol_stateid_reaplist(&reaplist);
}

Thanks,

Andy

--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2015-08-05 17:12:16

by Jeff Layton

[permalink] [raw]
Subject: Re: list_del corruption / unhash_ol_stateid()

On Wed, 05 Aug 2015 12:33:16 -0400
Andrew W Elble <[email protected]> wrote:

>
> > I'm not sure I'm following your shorthand correctly, but, I'm guessing
> > the above is that you're creating a stateid and then calling
> > init_open_stateid on it to hash it?
> >
> > If you can flesh out the description of the race, it might be more
> > clear.
>
> Sorry, knew that was a bit terse.
>
> nfsd4_process_open2 nfsd4_close
>
> stp = open->op_stp;
> nfs4_alloc_stid, refcount is 1
>
> init_open_stateid(stp, fp, open);
> is hashed, refcount is 2
>
> nfs4_preprocess_seqid_op()
> finds the same stateid,
> refcount is 3
>
> nfsd4_close_open_stateid()
> unhashes stateid,
> put_ol_stateid_locked() drops refcount
> refcount is 2
> nfs4_get_vfs_file() returns
> with status due to:
> nfsd_open()
> nfsd_open_break_lease()
> break_lease() returning
> -EWOULDBLOCK
>
> release_open_stateid()
> calls unhash_open_stateid()
> attempts to unhash,
> WARN generated for list_del
>
> proceeds to call put_ol_stateid_locked()
> decrements refcount inappropriately
> refcount is now 1.
>
> nfs4_put_stid()
> refcount is 0, memory is freed
>
> nfs4_put_stid
> calls atomic_dec_and_lock()
> use after free
> first corrupt byte is 6a
> because of atomic decrement
> with slub_debug on
>
> > Unless...the initial task that created the stateid got _really_
> > stalled, and another OPEN raced in, found that stateid and was able to
> > reply quickly. Then the client could send a CLOSE for that second OPEN
> > before the first opener ever finished.
>
> Sounds plausible given the environment. Would take more effort to
> directly prove. I have directly observed the same stateid being used in
> the above race.
>
> > If you had multiple racing OPENs then one could find the stateid that
> > was previously hashed. A simple fix might be to just use list_del_init
> > calls in unhash_ol_stateid. That should make any second list_del's a
> > no-op.
> >
> > Whether that's sufficient to fix the mem corruption, I'm not sure...
>
> It's sc_count decrementing when the unhashing "fails" or is going to
> fail. For instance, this kind of construct will prevent one of the
> possible use-after-free scenarios.
>
> static void release_open_stateid(struct nfs4_ol_stateid *stp)
> {
> LIST_HEAD(reaplist);
>
> spin_lock(&stp->st_stid.sc_client->cl_lock);
> if (stp->st_perfile->next != LIST_POISON1) {
> unhash_open_stateid(stp, &reaplist);
> put_ol_stateid_locked(stp, &reaplist);
> }
> spin_unlock(&stp->st_stid.sc_client->cl_lock);
> free_ol_stateid_reaplist(&reaplist);
> }
>
> Thanks,
>
> Andy
>

Ok, I get it now. The real problem then is a dispute over who should
put the "hash reference" for this stateid. We need to ensure that only
one caller dequeues it from the lists, and only that caller puts that
reference.

There are a couple of different ways to do that. The best one in this
case is probably to use list_del_init in unhash_ol_stateid and check
whether one of the list_heads is already empty before attempting to
unhash it and put the final reference.

We still do have the problem (I suspect) with the nfs4_file not being
fully instantiated before allowing other callers to use it, but that's
really a separate problem.

--
Jeff Layton <[email protected]>