2021-09-27 06:22:00

by Salvatore Bonaccorso

[permalink] [raw]
Subject: nfsd4_process_open2 failed to open newly-created file! status=10008 ; warning at fs/nfsd/nfs4proc.c for nfsd4_open

Hi

We recently got the following traces on a NFS server, but I'm not sure
how to further debug this, any hints?

[5746893.904448] ------------[ cut here ]------------
[5746893.910050] nfsd4_process_open2 failed to open newly-created file! status=10008
[5746893.918488] WARNING: CPU: 16 PID: 1316 at fs/nfsd/nfs4proc.c:456 nfsd4_open+0x4e0/0x6f0 [nfsd]
[5746893.927833] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding quota_v2 quota_tree intel_rapl ipmi_ssif skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ast ghash_clmulni_intel intel_cstate ttm drm_kms_helper drm intel_uncore pcspkr intel_rapl_perf mei_me ioatdma iTCO_wdt evdev joydev pcc_cpufreq sg i2c_algo_bit iTCO_vendor_support mei dca ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1
[5746894.000592] raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel xhci_pci ahci xhci_hcd libahci aesni_intel aes_x86_64 crypto_simd libata arcmsr usbcore cryptd i40e scsi_mod glue_helper lpc_ich i2c_i801 mfd_core usb_common
[5746894.023137] CPU: 16 PID: 1316 Comm: nfsd Not tainted 4.19.0-17-amd64 #1 Debian 4.19.194-3
[5746894.031916] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[5746894.042198] RIP: 0010:nfsd4_open+0x4e0/0x6f0 [nfsd]
[5746894.047704] Code: 80 88 a8 01 00 00 01 e9 52 fe ff ff 80 bb 15 01 00 00 00 0f 84 ef fe ff ff 44 89 fe 48 c7 c7 f0 10 58 c0 0f ce e8 89 70 3a d0 <0f> 0b e9 d7 fe ff ff 48 8b 83 18 01 00 00 8b 55 00 48 8d 75 04 89
[5746894.067573] RSP: 0018:ffffa5228948fda8 EFLAGS: 00010286
[5746894.073487] RAX: 0000000000000000 RBX: ffff97c0669ae3e0 RCX: 0000000000000000
[5746894.081317] RDX: ffff97cc8f91efc0 RSI: ffff97cc8f9166b8 RDI: ffff97cc8f9166b8
[5746894.089148] RBP: ffff97c0665b0068 R08: 00000000000005e8 R09: 0000000000aaaaaa
[5746894.096993] R10: 0000000000000000 R11: 0000000000000001 R12: ffff97c0cc1cb400
[5746894.104848] R13: ffff97c066aa8000 R14: ffff97c0665a9600 R15: 0000000018270000
[5746894.112717] FS: 0000000000000000(0000) GS:ffff97cc8f900000(0000) knlGS:0000000000000000
[5746894.121558] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[5746894.128035] CR2: 00007f83d25a68a0 CR3: 00000015b900a002 CR4: 00000000007606e0
[5746894.135899] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[5746894.143764] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[5746894.151647] PKRU: 55555554
[5746894.155117] Call Trace:
[5746894.158373] nfsd4_proc_compound+0x342/0x660 [nfsd]
[5746894.164084] nfsd_dispatch+0x9e/0x210 [nfsd]
[5746894.169206] svc_process_common+0x345/0x750 [sunrpc]
[5746894.175017] ? nfsd_destroy+0x50/0x50 [nfsd]
[5746894.180155] svc_process+0xb7/0xf0 [sunrpc]
[5746894.185194] nfsd+0xe3/0x140 [nfsd]
[5746894.189545] kthread+0x112/0x130
[5746894.193618] ? kthread_bind+0x30/0x30
[5746894.198072] ret_from_fork+0x35/0x40
[5746894.202439] ---[ end trace c6400532dff968eb ]---
[5768733.993671] ------------[ cut here ]------------
[5768733.999711] nfsd4_process_open2 failed to open newly-created file! status=10008
[5768734.008148] WARNING: CPU: 9 PID: 1316 at fs/nfsd/nfs4proc.c:456 nfsd4_open+0x4e0/0x6f0 [nfsd]
[5768734.017448] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding quota_v2 quota_tree intel_rapl ipmi_ssif skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ast ghash_clmulni_intel intel_cstate ttm drm_kms_helper drm intel_uncore pcspkr intel_rapl_perf mei_me ioatdma iTCO_wdt evdev joydev pcc_cpufreq sg i2c_algo_bit iTCO_vendor_support mei dca ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1
[5768734.092781] raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel xhci_pci ahci xhci_hcd libahci aesni_intel aes_x86_64 crypto_simd libata arcmsr usbcore cryptd i40e scsi_mod glue_helper lpc_ich i2c_i801 mfd_core usb_common
[5768734.115967] CPU: 9 PID: 1316 Comm: nfsd Tainted: G W 4.19.0-17-amd64 #1 Debian 4.19.194-3
[5768734.126331] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[5768734.136855] RIP: 0010:nfsd4_open+0x4e0/0x6f0 [nfsd]
[5768734.142623] Code: 80 88 a8 01 00 00 01 e9 52 fe ff ff 80 bb 15 01 00 00 00 0f 84 ef fe ff ff 44 89 fe 48 c7 c7 f0 10 58 c0 0f ce e8 89 70 3a d0 <0f> 0b e9 d7 fe ff ff 48 8b 83 18 01 00 00 8b 55 00 48 8d 75 04 89
[5768734.162998] RSP: 0018:ffffa5228948fda8 EFLAGS: 00010286
[5768734.169071] RAX: 0000000000000000 RBX: ffff97c0669ae3e0 RCX: 0000000000000000
[5768734.177036] RDX: ffff97c090e9efc0 RSI: ffff97c090e966b8 RDI: ffff97c090e966b8
[5768734.184996] RBP: ffff97c0665b0068 R08: 0000000000000608 R09: 0000000000aaaaaa
[5768734.192945] R10: 0000000000000000 R11: 0000000000000001 R12: ffff97c08698b000
[5768734.200885] R13: ffff97c066aa8000 R14: ffff97c0665a9600 R15: 0000000018270000
[5768734.208815] FS: 0000000000000000(0000) GS:ffff97c090e80000(0000) knlGS:0000000000000000
[5768734.217702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[5768734.224236] CR2: 00007f9b4006df10 CR3: 00000015b900a002 CR4: 00000000007606e0
[5768734.232157] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[5768734.240077] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[5768734.248017] PKRU: 55555554
[5768734.251553] Call Trace:
[5768734.254867] nfsd4_proc_compound+0x342/0x660 [nfsd]
[5768734.260517] nfsd_dispatch+0x9e/0x210 [nfsd]
[5768734.265570] svc_process_common+0x345/0x750 [sunrpc]
[5768734.271373] ? nfsd_destroy+0x50/0x50 [nfsd]
[5768734.276414] svc_process+0xb7/0xf0 [sunrpc]
[5768734.281370] nfsd+0xe3/0x140 [nfsd]
[5768734.285624] kthread+0x112/0x130
[5768734.289610] ? kthread_bind+0x30/0x30
[5768734.294030] ret_from_fork+0x35/0x40
[5768734.298396] ---[ end trace c6400532dff968ec ]---
[5795002.037239] ------------[ cut here ]------------
[5795002.044280] nfsd4_process_open2 failed to open newly-created file! status=10008
[5795002.053358] WARNING: CPU: 1 PID: 1315 at fs/nfsd/nfs4proc.c:456 nfsd4_open+0x4e0/0x6f0 [nfsd]
[5795002.063065] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding quota_v2 quota_tree intel_rapl ipmi_ssif skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ast ghash_clmulni_intel intel_cstate ttm drm_kms_helper drm intel_uncore pcspkr intel_rapl_perf mei_me ioatdma iTCO_wdt evdev joydev pcc_cpufreq sg i2c_algo_bit iTCO_vendor_support mei dca ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1
[5795002.138537] raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel xhci_pci ahci xhci_hcd libahci aesni_intel aes_x86_64 crypto_simd libata arcmsr usbcore cryptd i40e scsi_mod glue_helper lpc_ich i2c_i801 mfd_core usb_common
[5795002.161747] CPU: 1 PID: 1315 Comm: nfsd Tainted: G W 4.19.0-17-amd64 #1 Debian 4.19.194-3
[5795002.172089] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[5795002.182668] RIP: 0010:nfsd4_open+0x4e0/0x6f0 [nfsd]
[5795002.188400] Code: 80 88 a8 01 00 00 01 e9 52 fe ff ff 80 bb 15 01 00 00 00 0f 84 ef fe ff ff 44 89 fe 48 c7 c7 f0 10 58 c0 0f ce e8 89 70 3a d0 <0f> 0b e9 d7 fe ff ff 48 8b 83 18 01 00 00 8b 55 00 48 8d 75 04 89
[5795002.208706] RSP: 0018:ffffa52289487da8 EFLAGS: 00010286
[5795002.214775] RAX: 0000000000000000 RBX: ffff97c0669a83e0 RCX: 0000000000000000
[5795002.222743] RDX: ffff97c090a9efc0 RSI: ffff97c090a966b8 RDI: ffff97c090a966b8
[5795002.230710] RBP: ffff97c0669aa068 R08: 0000000000000628 R09: 0000000000aaaaaa
[5795002.238684] R10: 0000000000000000 R11: 0000000000000001 R12: ffff97c21bb64200
[5795002.246670] R13: ffff97c066aae000 R14: ffff97c0665a8000 R15: 0000000018270000
[5795002.254611] FS: 0000000000000000(0000) GS:ffff97c090a80000(0000) knlGS:0000000000000000
[5795002.263492] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[5795002.270097] CR2: 00007fba9f588590 CR3: 00000015b900a002 CR4: 00000000007606e0
[5795002.278100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[5795002.286091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[5795002.294078] PKRU: 55555554
[5795002.297634] Call Trace:
[5795002.300910] nfsd4_proc_compound+0x342/0x660 [nfsd]
[5795002.306572] nfsd_dispatch+0x9e/0x210 [nfsd]
[5795002.311617] svc_process_common+0x345/0x750 [sunrpc]
[5795002.317421] ? nfsd_destroy+0x50/0x50 [nfsd]
[5795002.322480] svc_process+0xb7/0xf0 [sunrpc]
[5795002.327433] nfsd+0xe3/0x140 [nfsd]
[5795002.331728] kthread+0x112/0x130
[5795002.335718] ? kthread_bind+0x30/0x30
[5795002.340144] ret_from_fork+0x35/0x40
[5795002.344475] ---[ end trace c6400532dff968ed ]---

But I have as well no reproducing recipe to trigger it.

The kernel is the current Debian buster distro kernel, based on
4.19.194.

This initally looked similar than
https://bugzilla.kernel.org/show_bug.cgi?id=195725 but the user did
get there status=5, so EIO, so seems different issue.

Regards,
Salvatore


2021-09-27 15:54:40

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd4_process_open2 failed to open newly-created file! status=10008 ; warning at fs/nfsd/nfs4proc.c for nfsd4_open

On Mon, Sep 27, 2021 at 08:10:31AM +0200, Salvatore Bonaccorso wrote:
> We recently got the following traces on a NFS server, but I'm not sure
> how to further debug this, any hints?

The server creates and opens a file in two steps, though it should
really be a single atomic operation.

That means there's a small possibility somebody could intervene and do
something like change the permissions:

>
> [5746893.904448] ------------[ cut here ]------------
> [5746893.910050] nfsd4_process_open2 failed to open newly-created file! status=10008

10008 is NFS4ERR_DELAY, so maybe somebody managed to get a delegation
before we finished opening?

We should be able to prevent that....

In your setup are there processes quickly opening new files created by
others?

--b.

> [5746893.918488] WARNING: CPU: 16 PID: 1316 at fs/nfsd/nfs4proc.c:456 nfsd4_open+0x4e0/0x6f0 [nfsd]
> [5746893.927833] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding quota_v2 quota_tree intel_rapl ipmi_ssif skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ast ghash_clmulni_intel intel_cstate ttm drm_kms_helper drm intel_uncore pcspkr intel_rapl_perf mei_me ioatdma iTCO_wdt evdev joydev pcc_cpufreq sg i2c_algo_bit iTCO_vendor_support mei dca ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1
> [5746894.000592] raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel xhci_pci ahci xhci_hcd libahci aesni_intel aes_x86_64 crypto_simd libata arcmsr usbcore cryptd i40e scsi_mod glue_helper lpc_ich i2c_i801 mfd_core usb_common
> [5746894.023137] CPU: 16 PID: 1316 Comm: nfsd Not tainted 4.19.0-17-amd64 #1 Debian 4.19.194-3
> [5746894.031916] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
> [5746894.042198] RIP: 0010:nfsd4_open+0x4e0/0x6f0 [nfsd]
> [5746894.047704] Code: 80 88 a8 01 00 00 01 e9 52 fe ff ff 80 bb 15 01 00 00 00 0f 84 ef fe ff ff 44 89 fe 48 c7 c7 f0 10 58 c0 0f ce e8 89 70 3a d0 <0f> 0b e9 d7 fe ff ff 48 8b 83 18 01 00 00 8b 55 00 48 8d 75 04 89
> [5746894.067573] RSP: 0018:ffffa5228948fda8 EFLAGS: 00010286
> [5746894.073487] RAX: 0000000000000000 RBX: ffff97c0669ae3e0 RCX: 0000000000000000
> [5746894.081317] RDX: ffff97cc8f91efc0 RSI: ffff97cc8f9166b8 RDI: ffff97cc8f9166b8
> [5746894.089148] RBP: ffff97c0665b0068 R08: 00000000000005e8 R09: 0000000000aaaaaa
> [5746894.096993] R10: 0000000000000000 R11: 0000000000000001 R12: ffff97c0cc1cb400
> [5746894.104848] R13: ffff97c066aa8000 R14: ffff97c0665a9600 R15: 0000000018270000
> [5746894.112717] FS: 0000000000000000(0000) GS:ffff97cc8f900000(0000) knlGS:0000000000000000
> [5746894.121558] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [5746894.128035] CR2: 00007f83d25a68a0 CR3: 00000015b900a002 CR4: 00000000007606e0
> [5746894.135899] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [5746894.143764] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [5746894.151647] PKRU: 55555554
> [5746894.155117] Call Trace:
> [5746894.158373] nfsd4_proc_compound+0x342/0x660 [nfsd]
> [5746894.164084] nfsd_dispatch+0x9e/0x210 [nfsd]
> [5746894.169206] svc_process_common+0x345/0x750 [sunrpc]
> [5746894.175017] ? nfsd_destroy+0x50/0x50 [nfsd]
> [5746894.180155] svc_process+0xb7/0xf0 [sunrpc]
> [5746894.185194] nfsd+0xe3/0x140 [nfsd]
> [5746894.189545] kthread+0x112/0x130
> [5746894.193618] ? kthread_bind+0x30/0x30
> [5746894.198072] ret_from_fork+0x35/0x40
> [5746894.202439] ---[ end trace c6400532dff968eb ]---
> [5768733.993671] ------------[ cut here ]------------
> [5768733.999711] nfsd4_process_open2 failed to open newly-created file! status=10008
> [5768734.008148] WARNING: CPU: 9 PID: 1316 at fs/nfsd/nfs4proc.c:456 nfsd4_open+0x4e0/0x6f0 [nfsd]
> [5768734.017448] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding quota_v2 quota_tree intel_rapl ipmi_ssif skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ast ghash_clmulni_intel intel_cstate ttm drm_kms_helper drm intel_uncore pcspkr intel_rapl_perf mei_me ioatdma iTCO_wdt evdev joydev pcc_cpufreq sg i2c_algo_bit iTCO_vendor_support mei dca ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1
> [5768734.092781] raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel xhci_pci ahci xhci_hcd libahci aesni_intel aes_x86_64 crypto_simd libata arcmsr usbcore cryptd i40e scsi_mod glue_helper lpc_ich i2c_i801 mfd_core usb_common
> [5768734.115967] CPU: 9 PID: 1316 Comm: nfsd Tainted: G W 4.19.0-17-amd64 #1 Debian 4.19.194-3
> [5768734.126331] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
> [5768734.136855] RIP: 0010:nfsd4_open+0x4e0/0x6f0 [nfsd]
> [5768734.142623] Code: 80 88 a8 01 00 00 01 e9 52 fe ff ff 80 bb 15 01 00 00 00 0f 84 ef fe ff ff 44 89 fe 48 c7 c7 f0 10 58 c0 0f ce e8 89 70 3a d0 <0f> 0b e9 d7 fe ff ff 48 8b 83 18 01 00 00 8b 55 00 48 8d 75 04 89
> [5768734.162998] RSP: 0018:ffffa5228948fda8 EFLAGS: 00010286
> [5768734.169071] RAX: 0000000000000000 RBX: ffff97c0669ae3e0 RCX: 0000000000000000
> [5768734.177036] RDX: ffff97c090e9efc0 RSI: ffff97c090e966b8 RDI: ffff97c090e966b8
> [5768734.184996] RBP: ffff97c0665b0068 R08: 0000000000000608 R09: 0000000000aaaaaa
> [5768734.192945] R10: 0000000000000000 R11: 0000000000000001 R12: ffff97c08698b000
> [5768734.200885] R13: ffff97c066aa8000 R14: ffff97c0665a9600 R15: 0000000018270000
> [5768734.208815] FS: 0000000000000000(0000) GS:ffff97c090e80000(0000) knlGS:0000000000000000
> [5768734.217702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [5768734.224236] CR2: 00007f9b4006df10 CR3: 00000015b900a002 CR4: 00000000007606e0
> [5768734.232157] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [5768734.240077] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [5768734.248017] PKRU: 55555554
> [5768734.251553] Call Trace:
> [5768734.254867] nfsd4_proc_compound+0x342/0x660 [nfsd]
> [5768734.260517] nfsd_dispatch+0x9e/0x210 [nfsd]
> [5768734.265570] svc_process_common+0x345/0x750 [sunrpc]
> [5768734.271373] ? nfsd_destroy+0x50/0x50 [nfsd]
> [5768734.276414] svc_process+0xb7/0xf0 [sunrpc]
> [5768734.281370] nfsd+0xe3/0x140 [nfsd]
> [5768734.285624] kthread+0x112/0x130
> [5768734.289610] ? kthread_bind+0x30/0x30
> [5768734.294030] ret_from_fork+0x35/0x40
> [5768734.298396] ---[ end trace c6400532dff968ec ]---
> [5795002.037239] ------------[ cut here ]------------
> [5795002.044280] nfsd4_process_open2 failed to open newly-created file! status=10008
> [5795002.053358] WARNING: CPU: 1 PID: 1315 at fs/nfsd/nfs4proc.c:456 nfsd4_open+0x4e0/0x6f0 [nfsd]
> [5795002.063065] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache bonding quota_v2 quota_tree intel_rapl ipmi_ssif skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ast ghash_clmulni_intel intel_cstate ttm drm_kms_helper drm intel_uncore pcspkr intel_rapl_perf mei_me ioatdma iTCO_wdt evdev joydev pcc_cpufreq sg i2c_algo_bit iTCO_vendor_support mei dca ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1
> [5795002.138537] raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel xhci_pci ahci xhci_hcd libahci aesni_intel aes_x86_64 crypto_simd libata arcmsr usbcore cryptd i40e scsi_mod glue_helper lpc_ich i2c_i801 mfd_core usb_common
> [5795002.161747] CPU: 1 PID: 1315 Comm: nfsd Tainted: G W 4.19.0-17-amd64 #1 Debian 4.19.194-3
> [5795002.172089] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
> [5795002.182668] RIP: 0010:nfsd4_open+0x4e0/0x6f0 [nfsd]
> [5795002.188400] Code: 80 88 a8 01 00 00 01 e9 52 fe ff ff 80 bb 15 01 00 00 00 0f 84 ef fe ff ff 44 89 fe 48 c7 c7 f0 10 58 c0 0f ce e8 89 70 3a d0 <0f> 0b e9 d7 fe ff ff 48 8b 83 18 01 00 00 8b 55 00 48 8d 75 04 89
> [5795002.208706] RSP: 0018:ffffa52289487da8 EFLAGS: 00010286
> [5795002.214775] RAX: 0000000000000000 RBX: ffff97c0669a83e0 RCX: 0000000000000000
> [5795002.222743] RDX: ffff97c090a9efc0 RSI: ffff97c090a966b8 RDI: ffff97c090a966b8
> [5795002.230710] RBP: ffff97c0669aa068 R08: 0000000000000628 R09: 0000000000aaaaaa
> [5795002.238684] R10: 0000000000000000 R11: 0000000000000001 R12: ffff97c21bb64200
> [5795002.246670] R13: ffff97c066aae000 R14: ffff97c0665a8000 R15: 0000000018270000
> [5795002.254611] FS: 0000000000000000(0000) GS:ffff97c090a80000(0000) knlGS:0000000000000000
> [5795002.263492] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [5795002.270097] CR2: 00007fba9f588590 CR3: 00000015b900a002 CR4: 00000000007606e0
> [5795002.278100] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [5795002.286091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [5795002.294078] PKRU: 55555554
> [5795002.297634] Call Trace:
> [5795002.300910] nfsd4_proc_compound+0x342/0x660 [nfsd]
> [5795002.306572] nfsd_dispatch+0x9e/0x210 [nfsd]
> [5795002.311617] svc_process_common+0x345/0x750 [sunrpc]
> [5795002.317421] ? nfsd_destroy+0x50/0x50 [nfsd]
> [5795002.322480] svc_process+0xb7/0xf0 [sunrpc]
> [5795002.327433] nfsd+0xe3/0x140 [nfsd]
> [5795002.331728] kthread+0x112/0x130
> [5795002.335718] ? kthread_bind+0x30/0x30
> [5795002.340144] ret_from_fork+0x35/0x40
> [5795002.344475] ---[ end trace c6400532dff968ed ]---
>
> But I have as well no reproducing recipe to trigger it.
>
> The kernel is the current Debian buster distro kernel, based on
> 4.19.194.
>
> This initally looked similar than
> https://bugzilla.kernel.org/show_bug.cgi?id=195725 but the user did
> get there status=5, so EIO, so seems different issue.
>
> Regards,
> Salvatore

2021-09-28 05:18:13

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: nfsd4_process_open2 failed to open newly-created file! status=10008 ; warning at fs/nfsd/nfs4proc.c for nfsd4_open

Hi Bruce,

On 27.09.2021 17:53, J. Bruce Fields wrote:
> On Mon, Sep 27, 2021 at 08:10:31AM +0200, Salvatore Bonaccorso wrote:
>> We recently got the following traces on a NFS server, but I'm not sure
>> how to further debug this, any hints?
>
> The server creates and opens a file in two steps, though it should
> really be a single atomic operation.
>
> That means there's a small possibility somebody could intervene and do
> something like change the permissions:
>
>>
>> [5746893.904448] ------------[ cut here ]------------
>> [5746893.910050] nfsd4_process_open2 failed to open newly-created
>> file! status=10008
>
> 10008 is NFS4ERR_DELAY, so maybe somebody managed to get a delegation
> before we finished opening?
>
> We should be able to prevent that....
>
> In your setup are there processes quickly opening new files created by
> others?

This is very possible. The NFS server is used as a "scratch" place
accessible from
compute cluster where people can have multiple jobs simultaneously
running through
Slurm and accessing the data. So it is possible that user create new
files from
one running instance and accessing it quickly from the other nodes.

I'm so far was unable to arificially trigger the issue but is there
anything I
can try out to get more information useful for you?

Regards,
Salvatore

2021-09-28 19:41:32

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd4_process_open2 failed to open newly-created file! status=10008 ; warning at fs/nfsd/nfs4proc.c for nfsd4_open

On Tue, Sep 28, 2021 at 07:11:41AM +0200, Salvatore Bonaccorso wrote:
> Hi Bruce,
>
> On 27.09.2021 17:53, J. Bruce Fields wrote:
> >On Mon, Sep 27, 2021 at 08:10:31AM +0200, Salvatore Bonaccorso wrote:
> >>We recently got the following traces on a NFS server, but I'm not sure
> >>how to further debug this, any hints?
> >
> >The server creates and opens a file in two steps, though it should
> >really be a single atomic operation.
> >
> >That means there's a small possibility somebody could intervene and do
> >something like change the permissions:
> >
> >>
> >>[5746893.904448] ------------[ cut here ]------------
> >>[5746893.910050] nfsd4_process_open2 failed to open
> >>newly-created file! status=10008
> >
> >10008 is NFS4ERR_DELAY, so maybe somebody managed to get a delegation
> >before we finished opening?
> >
> >We should be able to prevent that....
> >
> >In your setup are there processes quickly opening new files created by
> >others?
>
> This is very possible. The NFS server is used as a "scratch" place
> accessible from
> compute cluster where people can have multiple jobs simultaneously
> running through
> Slurm and accessing the data. So it is possible that user create new
> files from
> one running instance and accessing it quickly from the other nodes.
>
> I'm so far was unable to arificially trigger the issue but is there
> anything I
> can try out to get more information useful for you?

I think the problem's pretty obvious. I'm not sure what the fix should
be.

You can work around it for now by turning off delegations (echo 0
>/proc/sys/fs/leases_enable before starting nfsd).

--b.

2021-09-29 04:18:03

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: nfsd4_process_open2 failed to open newly-created file! status=10008 ; warning at fs/nfsd/nfs4proc.c for nfsd4_open

Hi Bruce,

On 28.09.2021 21:39, J. Bruce Fields wrote:
> On Tue, Sep 28, 2021 at 07:11:41AM +0200, Salvatore Bonaccorso wrote:
>> Hi Bruce,
>>
>> On 27.09.2021 17:53, J. Bruce Fields wrote:
>> >On Mon, Sep 27, 2021 at 08:10:31AM +0200, Salvatore Bonaccorso wrote:
>> >>We recently got the following traces on a NFS server, but I'm not sure
>> >>how to further debug this, any hints?
>> >
>> >The server creates and opens a file in two steps, though it should
>> >really be a single atomic operation.
>> >
>> >That means there's a small possibility somebody could intervene and do
>> >something like change the permissions:
>> >
>> >>
>> >>[5746893.904448] ------------[ cut here ]------------
>> >>[5746893.910050] nfsd4_process_open2 failed to open
>> >>newly-created file! status=10008
>> >
>> >10008 is NFS4ERR_DELAY, so maybe somebody managed to get a delegation
>> >before we finished opening?
>> >
>> >We should be able to prevent that....
>> >
>> >In your setup are there processes quickly opening new files created by
>> >others?
>>
>> This is very possible. The NFS server is used as a "scratch" place
>> accessible from
>> compute cluster where people can have multiple jobs simultaneously
>> running through
>> Slurm and accessing the data. So it is possible that user create new
>> files from
>> one running instance and accessing it quickly from the other nodes.
>>
>> I'm so far was unable to arificially trigger the issue but is there
>> anything I
>> can try out to get more information useful for you?
>
> I think the problem's pretty obvious. I'm not sure what the fix should
> be.

Hope you come up with the right solution! Thanks for looking into it,
much appreciated.

> You can work around it for now by turning off delegations (echo 0
>> /proc/sys/fs/leases_enable before starting nfsd).

Ack thank you.

Regards,
Salvatore