2024-03-11 19:07:38

by Rik Theys

[permalink] [raw]
Subject: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

Hi,

Since a few weeks our Rocky Linux 9 NFS server has periodically logged
hung nfsd tasks. The initial effect was that some clients could no
longer access the NFS server. This got worse and worse (probably as more
nfsd threads got blocked) and we had to restart the server. Restarting
the server also failed as the NFS server service could no longer be stopped.

The initial kernel we noticed this behavior on was
kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
happened again on this newer kernel version:

[Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
[Mon Mar 11 14:10:08 2024] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0
    pid:8865  ppid:2      flags:0x00004000
[Mon Mar 11 14:10:08 2024] Call Trace:
[Mon Mar 11 14:10:08 2024]  <TASK>
[Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
[Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
[Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
[Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
[Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
[Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
[Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
[Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
[Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120 [nfsd]
[Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
[Mon Mar 11 14:10:08 2024]  ? nfsd4_return_all_client_layouts+0xc4/0xf0
[nfsd]
[Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
[Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
[Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
[Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
[Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
[Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
[Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
[Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
[Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
[Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
[Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
[Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
[Mon Mar 11 14:10:08 2024]  </TASK>
[Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for more than
122 seconds.
[Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
[Mon Mar 11 14:10:08 2024] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0
    pid:8866  ppid:2      flags:0x00004000
[Mon Mar 11 14:10:08 2024] Call Trace:
[Mon Mar 11 14:10:08 2024]  <TASK>
[Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
[Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
[Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
[Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
[Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
[Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
[Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
[Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
[Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
[Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240 [nfsd]
[Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
[Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
[Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
[Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
[Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
[Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
[Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
[Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
[Mon Mar 11 14:10:08 2024]  </TASK>

The above is repeated a few times, and then this warning is also logged:

[Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
[Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
[Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4 dns_resolver nfs
fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core
binfmt_misc bonding tls rfkill nft_counter nft_ct
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat fat
dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
ibcrc32c dm_service_time dm_multipath intel_rapl_msr intel_rapl_common
intel_uncore_frequency intel_uncore_frequency_common isst_if_common
skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp
_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass dcdbas rapl
intel_cstate mgag200 i2c_algo_bit drm_shmem_helper dell_smbios
drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi mei
fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich ipmi_devintf
i2c_smbus ipmi_msghandler joydev acpi_power_meter
nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4 mbcache jbd2
sd_mod sg lpfc
[Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics
crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel ixgbe
megaraid_sas libata nvme_common ghash_clmulni_int
el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
dm_region_hash dm_log dm_mod
[Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not tainted
5.14.0-419.el9.x86_64 #1
[Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
R740/00WGD1, BIOS 2.20.1 09/13/2023
[Mon Mar 11 14:12:05 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
[Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48 89 df be
01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9 00 00 84 c0 0f
85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
[Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS: 00010246
[Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX: ffff8ada51930900
RCX: 0000000000000024
[Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI: ffff8ad582933c00
RDI: 0000000000002000
[Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08: ffff9929e0bb7b48
R09: 0000000000000000
[Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11: 0000000000000000
R12: ffff8ad6f497c360
[Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14: ffff8ae5942e0b10
R15: ffff8ad6f497c360
[Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
[Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3: 00000018e58de006
CR4: 00000000007706e0
[Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[Mon Mar 11 14:12:05 2024] PKRU: 55555554
[Mon Mar 11 14:12:05 2024] Call Trace:
[Mon Mar 11 14:12:05 2024]  <TASK>
[Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
[Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
[Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
[Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
[Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
[Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
[Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
[Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
[Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
[Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
[Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
[Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
[Mon Mar 11 14:12:05 2024]  ? nfsd_file_lookup_locked+0x117/0x160 [nfsd]
[Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
[Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830 [nfsd]
[Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
[Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
[Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
[Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
[Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
[Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
[Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
[Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
[Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
[Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
[Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
[Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
[Mon Mar 11 14:12:05 2024]  </TASK>
[Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---

Could this be the same issue as described here:
https://lore.kernel.org/linux-nfs/[email protected]/
?

As described in that thread, I've tried to obtain the requested information.

The attached workqueue_info.txt file contains the dmesg output after
running 'echo t > /proc/sysrq-trigger'. It's possibly truncated :-(.

I'm also attaching rpc_tasks.txt run on the server, and the
nfs_threads.txt file run on one of the clients that fails to mount the
server when the issue occurs.


Is it possible this is the issue that was fixed by the patches described
here?
https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/

Regards,

Rik


--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>


Attachments:
workqueue_info.txt (1.22 MB)
rpc_tasks.txt (111.00 B)
nfs_threads.txt (919.00 B)
Download all attachments

2024-03-12 11:23:09

by Jeffrey Layton

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
> ?
>
>
>
> Hi,
> ?
>
>
>
> Since a few weeks our Rocky Linux 9 NFS server has periodically logged hung nfsd tasks. The initial effect was that some clients could no longer access the NFS server. This got worse and worse (probably as more nfsd threads got blocked) and we had to restart the server. Restarting the server also failed as the NFS server service could no longer be stopped.
> ?
>
>
>
> The initial kernel we noticed this behavior on was kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue happened again on this newer kernel version:
> ?
>
>
>
> [Mon Mar 11 14:10:08 2024] ??????Not tainted 5.14.0-419.el9.x86_64 #1
> ?[Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ?[Mon Mar 11 14:10:08 2024] task:nfsd ???????????state:D stack:0 ????pid:8865 ?ppid:2 ?????flags:0x00004000
> ?[Mon Mar 11 14:10:08 2024] Call Trace:
> ?[Mon Mar 11 14:10:08 2024] ?<TASK>
> ?[Mon Mar 11 14:10:08 2024] ?__schedule+0x21b/0x550
> ?[Mon Mar 11 14:10:08 2024] ?schedule+0x2d/0x70
> ?[Mon Mar 11 14:10:08 2024] ?schedule_timeout+0x11f/0x160
> ?[Mon Mar 11 14:10:08 2024] ?? select_idle_sibling+0x28/0x430
> ?[Mon Mar 11 14:10:08 2024] ?? wake_affine+0x62/0x1f0
> ?[Mon Mar 11 14:10:08 2024] ?__wait_for_common+0x90/0x1d0
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_schedule_timeout+0x10/0x10
> ?[Mon Mar 11 14:10:08 2024] ?__flush_workqueue+0x13a/0x3f0
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_shutdown_callback+0x49/0x120 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?__destroy_client+0x1f3/0x290 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_exchange_id+0x75f/0x770 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_proc_compound+0x44b/0x700 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd_dispatch+0x94/0x1c0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?svc_process_common+0x2ec/0x660 [sunrpc]
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?svc_process+0x12d/0x170 [sunrpc]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd+0x84/0xb0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?kthread+0xdd/0x100
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_kthread+0x10/0x10
> ?[Mon Mar 11 14:10:08 2024] ?ret_from_fork+0x29/0x50
> ?[Mon Mar 11 14:10:08 2024] ?</TASK>
> ?[Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for more than 122 seconds.
> ?[Mon Mar 11 14:10:08 2024] ??????Not tainted 5.14.0-419.el9.x86_64 #1
> ?[Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ?[Mon Mar 11 14:10:08 2024] task:nfsd ???????????state:D stack:0 ????pid:8866 ?ppid:2 ?????flags:0x00004000
> ?[Mon Mar 11 14:10:08 2024] Call Trace:
> ?[Mon Mar 11 14:10:08 2024] ?<TASK>
> ?[Mon Mar 11 14:10:08 2024] ?__schedule+0x21b/0x550
> ?[Mon Mar 11 14:10:08 2024] ?schedule+0x2d/0x70
> ?[Mon Mar 11 14:10:08 2024] ?schedule_timeout+0x11f/0x160
> ?[Mon Mar 11 14:10:08 2024] ?? select_idle_sibling+0x28/0x430
> ?[Mon Mar 11 14:10:08 2024] ?? tcp_recvmsg+0x196/0x210
> ?[Mon Mar 11 14:10:08 2024] ?? wake_affine+0x62/0x1f0
> ?[Mon Mar 11 14:10:08 2024] ?__wait_for_common+0x90/0x1d0
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_schedule_timeout+0x10/0x10
> ?[Mon Mar 11 14:10:08 2024] ?__flush_workqueue+0x13a/0x3f0
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_destroy_session+0x1a4/0x240 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_proc_compound+0x44b/0x700 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd_dispatch+0x94/0x1c0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?svc_process_common+0x2ec/0x660 [sunrpc]
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?svc_process+0x12d/0x170 [sunrpc]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd+0x84/0xb0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?kthread+0xdd/0x100
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_kthread+0x10/0x10
> ?[Mon Mar 11 14:10:08 2024] ?ret_from_fork+0x29/0x50
> ?[Mon Mar 11 14:10:08 2024] ?</TASK>
> ?
> ?
>
>
>
>
>
>
>
> Hi,
> ?
>
>
>
>
>
>
>
> Since a few weeks our Rocky Linux 9 NFS server has periodically logged hung nfsd tasks. The initial effect was that some clients could no longer access the NFS server. This got worse and worse (probably as more nfsd threads got blocked) and we had to restart the server. Restarting the server also failed as the NFS server service could no longer be stopped.
> ?
>
>
>
>
>
>
>
> The initial kernel we noticed this behavior on was kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue happened again on this newer kernel version:
> ?
>
>
>
>
>
>
>
> [Mon Mar 11 14:10:08 2024] ??????Not tainted 5.14.0-419.el9.x86_64 #1
> ?[Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ?[Mon Mar 11 14:10:08 2024] task:nfsd ???????????state:D stack:0 ????pid:8865 ?ppid:2 ?????flags:0x00004000
> ?[Mon Mar 11 14:10:08 2024] Call Trace:
> ?[Mon Mar 11 14:10:08 2024] ?<TASK>
> ?[Mon Mar 11 14:10:08 2024] ?__schedule+0x21b/0x550
> ?[Mon Mar 11 14:10:08 2024] ?schedule+0x2d/0x70
> ?[Mon Mar 11 14:10:08 2024] ?schedule_timeout+0x11f/0x160
> ?[Mon Mar 11 14:10:08 2024] ?? select_idle_sibling+0x28/0x430
> ?[Mon Mar 11 14:10:08 2024] ?? wake_affine+0x62/0x1f0
> ?[Mon Mar 11 14:10:08 2024] ?__wait_for_common+0x90/0x1d0
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_schedule_timeout+0x10/0x10
> ?[Mon Mar 11 14:10:08 2024] ?__flush_workqueue+0x13a/0x3f0
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_shutdown_callback+0x49/0x120 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?__destroy_client+0x1f3/0x290 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_exchange_id+0x75f/0x770 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_proc_compound+0x44b/0x700 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd_dispatch+0x94/0x1c0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?svc_process_common+0x2ec/0x660 [sunrpc]
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?svc_process+0x12d/0x170 [sunrpc]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd+0x84/0xb0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?kthread+0xdd/0x100
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_kthread+0x10/0x10
> ?[Mon Mar 11 14:10:08 2024] ?ret_from_fork+0x29/0x50
> ?[Mon Mar 11 14:10:08 2024] ?</TASK>
> ?[Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for more than 122 seconds.
> ?[Mon Mar 11 14:10:08 2024] ??????Not tainted 5.14.0-419.el9.x86_64 #1
> ?[Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ?[Mon Mar 11 14:10:08 2024] task:nfsd ???????????state:D stack:0 ????pid:8866 ?ppid:2 ?????flags:0x00004000
> ?[Mon Mar 11 14:10:08 2024] Call Trace:
> ?[Mon Mar 11 14:10:08 2024] ?<TASK>
> ?[Mon Mar 11 14:10:08 2024] ?__schedule+0x21b/0x550
> ?[Mon Mar 11 14:10:08 2024] ?schedule+0x2d/0x70
> ?[Mon Mar 11 14:10:08 2024] ?schedule_timeout+0x11f/0x160
> ?[Mon Mar 11 14:10:08 2024] ?? select_idle_sibling+0x28/0x430
> ?[Mon Mar 11 14:10:08 2024] ?? tcp_recvmsg+0x196/0x210
> ?[Mon Mar 11 14:10:08 2024] ?? wake_affine+0x62/0x1f0
> ?[Mon Mar 11 14:10:08 2024] ?__wait_for_common+0x90/0x1d0
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_schedule_timeout+0x10/0x10
> ?[Mon Mar 11 14:10:08 2024] ?__flush_workqueue+0x13a/0x3f0
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_destroy_session+0x1a4/0x240 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_proc_compound+0x44b/0x700 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd_dispatch+0x94/0x1c0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?svc_process_common+0x2ec/0x660 [sunrpc]
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?svc_process+0x12d/0x170 [sunrpc]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd+0x84/0xb0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?kthread+0xdd/0x100
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_kthread+0x10/0x10
> ?[Mon Mar 11 14:10:08 2024] ?ret_from_fork+0x29/0x50
> ?[Mon Mar 11 14:10:08 2024] ?</TASK>
>

The above threads are trying to flush the workqueue, so that probably
means that they are stuck waiting on a workqueue job to finish.
>
> ?The above is repeated a few times, and then this warning is also logged:
> ?
> [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
> ?[Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4 dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
> ?nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
> ?ibcrc32c dm_service_time dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp
> ?_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
> ?ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
> ?nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4 mbcache jbd2 sd_mod sg lpfc
> ?[Mon Mar 11 14:12:05 2024] ?nvmet_fc nvmet nvme_fc nvme_fabrics crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
> ?el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror dm_region_hash dm_log dm_mod
> ?[Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not tainted 5.14.0-419.el9.x86_64 #1
> ?[Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.20.1 09/13/2023
> ?[Mon Mar 11 14:12:05 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
> ?02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
> ?[Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS: 00010246
> ?[Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX: ffff8ada51930900 RCX: 0000000000000024
> ?[Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI: ffff8ad582933c00 RDI: 0000000000002000
> ?[Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08: ffff9929e0bb7b48 R09: 0000000000000000
> ?[Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11: 0000000000000000 R12: ffff8ad6f497c360
> ?[Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14: ffff8ae5942e0b10 R15: ffff8ad6f497c360
> ?[Mon Mar 11 14:12:05 2024] FS: ?0000000000000000(0000) GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
> ?[Mon Mar 11 14:12:05 2024] CS: ?0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> ?[Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3: 00000018e58de006 CR4: 00000000007706e0
> ?[Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> ?[Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> ?[Mon Mar 11 14:12:05 2024] PKRU: 55555554
> ?[Mon Mar 11 14:12:05 2024] Call Trace:
> ?[Mon Mar 11 14:12:05 2024] ?<TASK>
> ?[Mon Mar 11 14:12:05 2024] ?? show_trace_log_lvl+0x1c4/0x2df
> ?[Mon Mar 11 14:12:05 2024] ?? show_trace_log_lvl+0x1c4/0x2df
> ?[Mon Mar 11 14:12:05 2024] ?? __break_lease+0x16f/0x5f0
> ?[Mon Mar 11 14:12:05 2024] ?? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?? __warn+0x81/0x110
> ?[Mon Mar 11 14:12:05 2024] ?? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?? report_bug+0x10a/0x140
> ?[Mon Mar 11 14:12:05 2024] ?? handle_bug+0x3c/0x70
> ?[Mon Mar 11 14:12:05 2024] ?? exc_invalid_op+0x14/0x70
> ?[Mon Mar 11 14:12:05 2024] ?? asm_exc_invalid_op+0x16/0x20
> ?[Mon Mar 11 14:12:05 2024] ?? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?__break_lease+0x16f/0x5f0
> ?[Mon Mar 11 14:12:05 2024] ?? nfsd_file_lookup_locked+0x117/0x160 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?? list_lru_del+0x101/0x150
> ?[Mon Mar 11 14:12:05 2024] ?nfsd_file_do_acquire+0x790/0x830 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?nfsd4_process_open2+0x430/0xa30 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?? fh_verify+0x297/0x2f0 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?nfsd4_open+0x3ce/0x4b0 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?nfsd4_proc_compound+0x44b/0x700 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?nfsd_dispatch+0x94/0x1c0 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?svc_process_common+0x2ec/0x660 [sunrpc]
> ?[Mon Mar 11 14:12:05 2024] ?? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?? __pfx_nfsd+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?svc_process+0x12d/0x170 [sunrpc]
> ?[Mon Mar 11 14:12:05 2024] ?nfsd+0x84/0xb0 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?kthread+0xdd/0x100
> ?[Mon Mar 11 14:12:05 2024] ?? __pfx_kthread+0x10/0x10
> ?[Mon Mar 11 14:12:05 2024] ?ret_from_fork+0x29/0x50
> ?[Mon Mar 11 14:12:05 2024] ?</TASK>
> ?[Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---

This is probably this WARN in nfsd_break_one_deleg:

WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));

It means that a delegation break callback to the client couldn't be
queued to the workqueue, and so it didn't run.

>
> Could this be the same issue as described here:https://lore.kernel.org/linux-nfs/[email protected]/ ?
> ?

Yes, most likely the same problem.


> As described in that thread, I've tried to obtain the requested information.
> ?
>

> Is it possible this is the issue that was fixed by the patches described here? https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/
>

Doubtful. Those are targeted toward a different set of issues.

If you're willing, I do have some patches queued up for CentOS here that
fix some backchannel problems that could be related. I'm mainly waiting
on Chuck to send these to Linus and then we'll likely merge them into
CentOS soon afterward:

https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689

--
Jeff Layton <[email protected]>

2024-03-12 11:37:53

by Jeffrey Layton

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
> ?
>
>
>
> Hi,
> ?
>
>
>
> Since a few weeks our Rocky Linux 9 NFS server has periodically logged hung nfsd tasks. The initial effect was that some clients could no longer access the NFS server. This got worse and worse (probably as more nfsd threads got blocked) and we had to restart the server. Restarting the server also failed as the NFS server service could no longer be stopped.
> ?
>
>
>
> The initial kernel we noticed this behavior on was kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue happened again on this newer kernel version:
> ?
>
>
>
> [Mon Mar 11 14:10:08 2024] ??????Not tainted 5.14.0-419.el9.x86_64 #1
> ?[Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ?[Mon Mar 11 14:10:08 2024] task:nfsd ???????????state:D stack:0 ????pid:8865 ?ppid:2 ?????flags:0x00004000
> ?[Mon Mar 11 14:10:08 2024] Call Trace:
> ?[Mon Mar 11 14:10:08 2024] ?<TASK>
> ?[Mon Mar 11 14:10:08 2024] ?__schedule+0x21b/0x550
> ?[Mon Mar 11 14:10:08 2024] ?schedule+0x2d/0x70
> ?[Mon Mar 11 14:10:08 2024] ?schedule_timeout+0x11f/0x160
> ?[Mon Mar 11 14:10:08 2024] ?? select_idle_sibling+0x28/0x430
> ?[Mon Mar 11 14:10:08 2024] ?? wake_affine+0x62/0x1f0
> ?[Mon Mar 11 14:10:08 2024] ?__wait_for_common+0x90/0x1d0
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_schedule_timeout+0x10/0x10
> ?[Mon Mar 11 14:10:08 2024] ?__flush_workqueue+0x13a/0x3f0
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_shutdown_callback+0x49/0x120 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?__destroy_client+0x1f3/0x290 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_exchange_id+0x75f/0x770 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_proc_compound+0x44b/0x700 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd_dispatch+0x94/0x1c0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?svc_process_common+0x2ec/0x660 [sunrpc]
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?svc_process+0x12d/0x170 [sunrpc]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd+0x84/0xb0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?kthread+0xdd/0x100
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_kthread+0x10/0x10
> ?[Mon Mar 11 14:10:08 2024] ?ret_from_fork+0x29/0x50
> ?[Mon Mar 11 14:10:08 2024] ?</TASK>
> ?[Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for more than 122 seconds.
> ?[Mon Mar 11 14:10:08 2024] ??????Not tainted 5.14.0-419.el9.x86_64 #1
> ?[Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ?[Mon Mar 11 14:10:08 2024] task:nfsd ???????????state:D stack:0 ????pid:8866 ?ppid:2 ?????flags:0x00004000
> ?[Mon Mar 11 14:10:08 2024] Call Trace:
> ?[Mon Mar 11 14:10:08 2024] ?<TASK>
> ?[Mon Mar 11 14:10:08 2024] ?__schedule+0x21b/0x550
> ?[Mon Mar 11 14:10:08 2024] ?schedule+0x2d/0x70
> ?[Mon Mar 11 14:10:08 2024] ?schedule_timeout+0x11f/0x160
> ?[Mon Mar 11 14:10:08 2024] ?? select_idle_sibling+0x28/0x430
> ?[Mon Mar 11 14:10:08 2024] ?? tcp_recvmsg+0x196/0x210
> ?[Mon Mar 11 14:10:08 2024] ?? wake_affine+0x62/0x1f0
> ?[Mon Mar 11 14:10:08 2024] ?__wait_for_common+0x90/0x1d0
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_schedule_timeout+0x10/0x10
> ?[Mon Mar 11 14:10:08 2024] ?__flush_workqueue+0x13a/0x3f0
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_destroy_session+0x1a4/0x240 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd4_proc_compound+0x44b/0x700 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd_dispatch+0x94/0x1c0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?svc_process_common+0x2ec/0x660 [sunrpc]
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?svc_process+0x12d/0x170 [sunrpc]
> ?[Mon Mar 11 14:10:08 2024] ?nfsd+0x84/0xb0 [nfsd]
> ?[Mon Mar 11 14:10:08 2024] ?kthread+0xdd/0x100
> ?[Mon Mar 11 14:10:08 2024] ?? __pfx_kthread+0x10/0x10
> ?[Mon Mar 11 14:10:08 2024] ?ret_from_fork+0x29/0x50
> ?[Mon Mar 11 14:10:08 2024] ?</TASK>
> ?
>
>
>
> ?The above is repeated a few times, and then this warning is also logged:
> ?
>
>
>
> [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
> ?[Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4 dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
> ?nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
> ?ibcrc32c dm_service_time dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp
> ?_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
> ?ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
> ?nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4 mbcache jbd2 sd_mod sg lpfc
> ?[Mon Mar 11 14:12:05 2024] ?nvmet_fc nvmet nvme_fc nvme_fabrics crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
> ?el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror dm_region_hash dm_log dm_mod
> ?[Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not tainted 5.14.0-419.el9.x86_64 #1
> ?[Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.20.1 09/13/2023
> ?[Mon Mar 11 14:12:05 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
> ?02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
> ?[Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS: 00010246
> ?[Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX: ffff8ada51930900 RCX: 0000000000000024
> ?[Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI: ffff8ad582933c00 RDI: 0000000000002000
> ?[Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08: ffff9929e0bb7b48 R09: 0000000000000000
> ?[Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11: 0000000000000000 R12: ffff8ad6f497c360
> ?[Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14: ffff8ae5942e0b10 R15: ffff8ad6f497c360
> ?[Mon Mar 11 14:12:05 2024] FS: ?0000000000000000(0000) GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
> ?[Mon Mar 11 14:12:05 2024] CS: ?0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> ?[Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3: 00000018e58de006 CR4: 00000000007706e0
> ?[Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> ?[Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> ?[Mon Mar 11 14:12:05 2024] PKRU: 55555554
> ?[Mon Mar 11 14:12:05 2024] Call Trace:
> ?[Mon Mar 11 14:12:05 2024] ?<TASK>
> ?[Mon Mar 11 14:12:05 2024] ?? show_trace_log_lvl+0x1c4/0x2df
> ?[Mon Mar 11 14:12:05 2024] ?? show_trace_log_lvl+0x1c4/0x2df
> ?[Mon Mar 11 14:12:05 2024] ?? __break_lease+0x16f/0x5f0
> ?[Mon Mar 11 14:12:05 2024] ?? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?? __warn+0x81/0x110
> ?[Mon Mar 11 14:12:05 2024] ?? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?? report_bug+0x10a/0x140
> ?[Mon Mar 11 14:12:05 2024] ?? handle_bug+0x3c/0x70
> ?[Mon Mar 11 14:12:05 2024] ?? exc_invalid_op+0x14/0x70
> ?[Mon Mar 11 14:12:05 2024] ?? asm_exc_invalid_op+0x16/0x20
> ?[Mon Mar 11 14:12:05 2024] ?? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?__break_lease+0x16f/0x5f0
> ?[Mon Mar 11 14:12:05 2024] ?? nfsd_file_lookup_locked+0x117/0x160 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?? list_lru_del+0x101/0x150
> ?[Mon Mar 11 14:12:05 2024] ?nfsd_file_do_acquire+0x790/0x830 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?nfsd4_process_open2+0x430/0xa30 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?? fh_verify+0x297/0x2f0 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?nfsd4_open+0x3ce/0x4b0 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?nfsd4_proc_compound+0x44b/0x700 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?nfsd_dispatch+0x94/0x1c0 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?svc_process_common+0x2ec/0x660 [sunrpc]
> ?[Mon Mar 11 14:12:05 2024] ?? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?? __pfx_nfsd+0x10/0x10 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?svc_process+0x12d/0x170 [sunrpc]
> ?[Mon Mar 11 14:12:05 2024] ?nfsd+0x84/0xb0 [nfsd]
> ?[Mon Mar 11 14:12:05 2024] ?kthread+0xdd/0x100
> ?[Mon Mar 11 14:12:05 2024] ?? __pfx_kthread+0x10/0x10
> ?[Mon Mar 11 14:12:05 2024] ?ret_from_fork+0x29/0x50
> ?[Mon Mar 11 14:12:05 2024] ?</TASK>
> ?[Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
> ?

[Mon Mar 11 14:29:16 2024] task:kworker/u96:3 state:D stack:0 pid:2451130 ppid:2 flags:0x00004000
[Mon Mar 11 14:29:16 2024] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
[Mon Mar 11 14:29:16 2024] Call Trace:
[Mon Mar 11 14:29:16 2024] <TASK>
[Mon Mar 11 14:29:16 2024] __schedule+0x21b/0x550
[Mon Mar 11 14:29:16 2024] schedule+0x2d/0x70
[Mon Mar 11 14:29:16 2024] schedule_timeout+0x88/0x160
[Mon Mar 11 14:29:16 2024] ? __pfx_process_timeout+0x10/0x10
[Mon Mar 11 14:29:16 2024] rpc_shutdown_client+0xb3/0x150 [sunrpc]
[Mon Mar 11 14:29:16 2024] ? __pfx_autoremove_wake_function+0x10/0x10
[Mon Mar 11 14:29:16 2024] nfsd4_process_cb_update+0x3e/0x260 [nfsd]
[Mon Mar 11 14:29:16 2024] ? sched_clock+0xc/0x30
[Mon Mar 11 14:29:16 2024] ? raw_spin_rq_lock_nested+0x19/0x80
[Mon Mar 11 14:29:16 2024] ? newidle_balance+0x26e/0x400
[Mon Mar 11 14:29:16 2024] ? pick_next_task_fair+0x41/0x500
[Mon Mar 11 14:29:16 2024] ? put_prev_task_fair+0x1e/0x40
[Mon Mar 11 14:29:16 2024] ? pick_next_task+0x861/0x950
[Mon Mar 11 14:29:16 2024] ? __update_idle_core+0x23/0xc0
[Mon Mar 11 14:29:16 2024] ? __switch_to_asm+0x3a/0x80
[Mon Mar 11 14:29:16 2024] ? finish_task_switch.isra.0+0x8c/0x2a0
[Mon Mar 11 14:29:16 2024] nfsd4_run_cb_work+0x9f/0x150 [nfsd]
[Mon Mar 11 14:29:16 2024] process_one_work+0x1e2/0x3b0
[Mon Mar 11 14:29:16 2024] worker_thread+0x50/0x3a0
[Mon Mar 11 14:29:16 2024] ? __pfx_worker_thread+0x10/0x10
[Mon Mar 11 14:29:16 2024] kthread+0xdd/0x100
[Mon Mar 11 14:29:16 2024] ? __pfx_kthread+0x10/0x10
[Mon Mar 11 14:29:16 2024] ret_from_fork+0x29/0x50
[Mon Mar 11 14:29:16 2024] </TASK> ?

The above is the main task that I see in the cb workqueue. It's trying to call rpc_shutdown_client, which is waiting for this:

wait_event_timeout(destroy_wait,
list_empty(&clnt->cl_tasks), 1*HZ);

...so basically waiting for the cl_tasks list to go empty. It repeatedly
does a rpc_killall_tasks though, so possibly trying to kill this task?

18423 2281 0 0x18 0x0 1354 nfsd4_cb_ops [nfsd] nfs4_cbv1 CB_RECALL_ANY a:call_start [sunrpc] q:delayq

Callbacks are soft RPC tasks though, so they should be easily killable.
--
Jeff Layton <[email protected]>

2024-03-12 12:25:30

by Rik Theys

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

Hi Jeff,

On 3/12/24 12:22, Jeff Layton wrote:
> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>> Since a few weeks our Rocky Linux 9 NFS server has periodically logged hung nfsd tasks. The initial effect was that some clients could no longer access the NFS server. This got worse and worse (probably as more nfsd threads got blocked) and we had to restart the server. Restarting the server also failed as the NFS server service could no longer be stopped.
>>
>>
>> The initial kernel we noticed this behavior on was kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue happened again on this newer kernel version:
>>
>> [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>  [Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>  [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0     pid:8865  ppid:2      flags:0x00004000
>>  [Mon Mar 11 14:10:08 2024] Call Trace:
>>  [Mon Mar 11 14:10:08 2024]  <TASK>
>>  [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>  [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>  [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>  [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>  [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>  [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>  [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>  [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>  [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>  [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>  [Mon Mar 11 14:10:08 2024]  </TASK>
>>  [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for more than 122 seconds.
>>  [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>  [Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>  [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0     pid:8866  ppid:2      flags:0x00004000
>>  [Mon Mar 11 14:10:08 2024] Call Trace:
>>  [Mon Mar 11 14:10:08 2024]  <TASK>
>>  [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>  [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>  [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>  [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>  [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>  [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>  [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>  [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>  [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>  [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>  [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>  [Mon Mar 11 14:10:08 2024]  </TASK>
>>
> The above threads are trying to flush the workqueue, so that probably
> means that they are stuck waiting on a workqueue job to finish.
>>  The above is repeated a few times, and then this warning is also logged:
>>
>> [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
>>  [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>  [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4 dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
>>  nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>  ibcrc32c dm_service_time dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp
>>  _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>  ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>  nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4 mbcache jbd2 sd_mod sg lpfc
>>  [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>  el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror dm_region_hash dm_log dm_mod
>>  [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not tainted 5.14.0-419.el9.x86_64 #1
>>  [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.20.1 09/13/2023
>>  [Mon Mar 11 14:12:05 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>  [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>  02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>  [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS: 00010246
>>  [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX: ffff8ada51930900 RCX: 0000000000000024
>>  [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI: ffff8ad582933c00 RDI: 0000000000002000
>>  [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08: ffff9929e0bb7b48 R09: 0000000000000000
>>  [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11: 0000000000000000 R12: ffff8ad6f497c360
>>  [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14: ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>  [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000) GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>  [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3: 00000018e58de006 CR4: 00000000007706e0
>>  [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>  [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>  [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>  [Mon Mar 11 14:12:05 2024] Call Trace:
>>  [Mon Mar 11 14:12:05 2024]  <TASK>
>>  [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>  [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>  [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>  [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>  [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>  [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>  [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>  [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>  [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>  [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>  [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>  [Mon Mar 11 14:12:05 2024]  </TASK>
>>  [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
> This is probably this WARN in nfsd_break_one_deleg:
>
> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>
> It means that a delegation break callback to the client couldn't be
> queued to the workqueue, and so it didn't run.
>
>> Could this be the same issue as described here:https://lore.kernel.org/linux-nfs/[email protected]/ ?
>>
> Yes, most likely the same problem.
If I read that thread correctly, this issue was introduced between
6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
backported these changes, or were we hitting some other bug with that
version? It seems the 6.1.x kernel is not affected? If so, that would be
the recommended kernel to run?
>
>
>> As described in that thread, I've tried to obtain the requested information.
>>
>>
>> Is it possible this is the issue that was fixed by the patches described here? https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/
>>
> Doubtful. Those are targeted toward a different set of issues.
>
> If you're willing, I do have some patches queued up for CentOS here that
> fix some backchannel problems that could be related. I'm mainly waiting
> on Chuck to send these to Linus and then we'll likely merge them into
> CentOS soon afterward:
>
> https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689
>
If you can send me a patch file, I can rebuild the C9S kernel with that
patch and run it. It can take a while for the bug to trigger as I
believe it seems to be very workload dependent (we were running very
stable for months and now hit this bug every other week).

So these patches are not yet upstream?

Regards,

Rik


--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>


2024-03-12 12:47:15

by Jeffrey Layton

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
> Hi Jeff,
>
> On 3/12/24 12:22, Jeff Layton wrote:
> > On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
> > > Since a few weeks our Rocky Linux 9 NFS server has periodically logged hung nfsd tasks. The initial effect was that some clients could no longer access the NFS server. This got worse and worse (probably as more nfsd threads got blocked) and we had to restart the server. Restarting the server also failed as the NFS server service could no longer be stopped.
> > >
> > >
> > > The initial kernel we noticed this behavior on was kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue happened again on this newer kernel version:
> > >

419 is fairly up to date with nfsd changes. There are some known bugs
around callbacks, and there is a draft MR in flight to fix it.

What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ? If we
can bracket the changes around a particular version, then that might
help identify the problem.

> > > [Mon Mar 11 14:10:08 2024] ??????Not tainted 5.14.0-419.el9.x86_64 #1
> > > ?[Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > ?[Mon Mar 11 14:10:08 2024] task:nfsd ???????????state:D stack:0 ????pid:8865 ?ppid:2 ?????flags:0x00004000
> > > ?[Mon Mar 11 14:10:08 2024] Call Trace:
> > > ?[Mon Mar 11 14:10:08 2024] ?<TASK>
> > > ?[Mon Mar 11 14:10:08 2024] ?__schedule+0x21b/0x550
> > > ?[Mon Mar 11 14:10:08 2024] ?schedule+0x2d/0x70
> > > ?[Mon Mar 11 14:10:08 2024] ?schedule_timeout+0x11f/0x160
> > > ?[Mon Mar 11 14:10:08 2024] ?? select_idle_sibling+0x28/0x430
> > > ?[Mon Mar 11 14:10:08 2024] ?? wake_affine+0x62/0x1f0
> > > ?[Mon Mar 11 14:10:08 2024] ?__wait_for_common+0x90/0x1d0
> > > ?[Mon Mar 11 14:10:08 2024] ?? __pfx_schedule_timeout+0x10/0x10
> > > ?[Mon Mar 11 14:10:08 2024] ?__flush_workqueue+0x13a/0x3f0
> > > ?[Mon Mar 11 14:10:08 2024] ?nfsd4_shutdown_callback+0x49/0x120 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?__destroy_client+0x1f3/0x290 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?nfsd4_exchange_id+0x75f/0x770 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?svc_process_common+0x2ec/0x660 [sunrpc]
> > > ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd+0x10/0x10 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?svc_process+0x12d/0x170 [sunrpc]
> > > ?[Mon Mar 11 14:10:08 2024] ?nfsd+0x84/0xb0 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?kthread+0xdd/0x100
> > > ?[Mon Mar 11 14:10:08 2024] ?? __pfx_kthread+0x10/0x10
> > > ?[Mon Mar 11 14:10:08 2024] ?ret_from_fork+0x29/0x50
> > > ?[Mon Mar 11 14:10:08 2024] ?</TASK>
> > > ?[Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for more than 122 seconds.
> > > ?[Mon Mar 11 14:10:08 2024] ??????Not tainted 5.14.0-419.el9.x86_64 #1
> > > ?[Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > ?[Mon Mar 11 14:10:08 2024] task:nfsd ???????????state:D stack:0 ????pid:8866 ?ppid:2 ?????flags:0x00004000
> > > ?[Mon Mar 11 14:10:08 2024] Call Trace:
> > > ?[Mon Mar 11 14:10:08 2024] ?<TASK>
> > > ?[Mon Mar 11 14:10:08 2024] ?__schedule+0x21b/0x550
> > > ?[Mon Mar 11 14:10:08 2024] ?schedule+0x2d/0x70
> > > ?[Mon Mar 11 14:10:08 2024] ?schedule_timeout+0x11f/0x160
> > > ?[Mon Mar 11 14:10:08 2024] ?? select_idle_sibling+0x28/0x430
> > > ?[Mon Mar 11 14:10:08 2024] ?? tcp_recvmsg+0x196/0x210
> > > ?[Mon Mar 11 14:10:08 2024] ?? wake_affine+0x62/0x1f0
> > > ?[Mon Mar 11 14:10:08 2024] ?__wait_for_common+0x90/0x1d0
> > > ?[Mon Mar 11 14:10:08 2024] ?? __pfx_schedule_timeout+0x10/0x10
> > > ?[Mon Mar 11 14:10:08 2024] ?__flush_workqueue+0x13a/0x3f0
> > > ?[Mon Mar 11 14:10:08 2024] ?nfsd4_destroy_session+0x1a4/0x240 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?svc_process_common+0x2ec/0x660 [sunrpc]
> > > ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?? __pfx_nfsd+0x10/0x10 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?svc_process+0x12d/0x170 [sunrpc]
> > > ?[Mon Mar 11 14:10:08 2024] ?nfsd+0x84/0xb0 [nfsd]
> > > ?[Mon Mar 11 14:10:08 2024] ?kthread+0xdd/0x100
> > > ?[Mon Mar 11 14:10:08 2024] ?? __pfx_kthread+0x10/0x10
> > > ?[Mon Mar 11 14:10:08 2024] ?ret_from_fork+0x29/0x50
> > > ?[Mon Mar 11 14:10:08 2024] ?</TASK>
> > >
> > The above threads are trying to flush the workqueue, so that probably
> > means that they are stuck waiting on a workqueue job to finish.
> > > ?The above is repeated a few times, and then this warning is also logged:
> > >
> > > [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
> > > ?[Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4 dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
> > > ?nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
> > > ?ibcrc32c dm_service_time dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp
> > > ?_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
> > > ?ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
> > > ?nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4 mbcache jbd2 sd_mod sg lpfc
> > > ?[Mon Mar 11 14:12:05 2024] ?nvmet_fc nvmet nvme_fc nvme_fabrics crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
> > > ?el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror dm_region_hash dm_log dm_mod
> > > ?[Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not tainted 5.14.0-419.el9.x86_64 #1
> > > ?[Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.20.1 09/13/2023
> > > ?[Mon Mar 11 14:12:05 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
> > > ?02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
> > > ?[Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS: 00010246
> > > ?[Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX: ffff8ada51930900 RCX: 0000000000000024
> > > ?[Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI: ffff8ad582933c00 RDI: 0000000000002000
> > > ?[Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08: ffff9929e0bb7b48 R09: 0000000000000000
> > > ?[Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11: 0000000000000000 R12: ffff8ad6f497c360
> > > ?[Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14: ffff8ae5942e0b10 R15: ffff8ad6f497c360
> > > ?[Mon Mar 11 14:12:05 2024] FS: ?0000000000000000(0000) GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
> > > ?[Mon Mar 11 14:12:05 2024] CS: ?0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > ?[Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3: 00000018e58de006 CR4: 00000000007706e0
> > > ?[Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > ?[Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > ?[Mon Mar 11 14:12:05 2024] PKRU: 55555554
> > > ?[Mon Mar 11 14:12:05 2024] Call Trace:
> > > ?[Mon Mar 11 14:12:05 2024] ?<TASK>
> > > ?[Mon Mar 11 14:12:05 2024] ?? show_trace_log_lvl+0x1c4/0x2df
> > > ?[Mon Mar 11 14:12:05 2024] ?? show_trace_log_lvl+0x1c4/0x2df
> > > ?[Mon Mar 11 14:12:05 2024] ?? __break_lease+0x16f/0x5f0
> > > ?[Mon Mar 11 14:12:05 2024] ?? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?? __warn+0x81/0x110
> > > ?[Mon Mar 11 14:12:05 2024] ?? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?? report_bug+0x10a/0x140
> > > ?[Mon Mar 11 14:12:05 2024] ?? handle_bug+0x3c/0x70
> > > ?[Mon Mar 11 14:12:05 2024] ?? exc_invalid_op+0x14/0x70
> > > ?[Mon Mar 11 14:12:05 2024] ?? asm_exc_invalid_op+0x16/0x20
> > > ?[Mon Mar 11 14:12:05 2024] ?? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?__break_lease+0x16f/0x5f0
> > > ?[Mon Mar 11 14:12:05 2024] ?? nfsd_file_lookup_locked+0x117/0x160 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?? list_lru_del+0x101/0x150
> > > ?[Mon Mar 11 14:12:05 2024] ?nfsd_file_do_acquire+0x790/0x830 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?nfsd4_process_open2+0x430/0xa30 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?? fh_verify+0x297/0x2f0 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?nfsd4_open+0x3ce/0x4b0 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?svc_process_common+0x2ec/0x660 [sunrpc]
> > > ?[Mon Mar 11 14:12:05 2024] ?? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?? __pfx_nfsd+0x10/0x10 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?svc_process+0x12d/0x170 [sunrpc]
> > > ?[Mon Mar 11 14:12:05 2024] ?nfsd+0x84/0xb0 [nfsd]
> > > ?[Mon Mar 11 14:12:05 2024] ?kthread+0xdd/0x100
> > > ?[Mon Mar 11 14:12:05 2024] ?? __pfx_kthread+0x10/0x10
> > > ?[Mon Mar 11 14:12:05 2024] ?ret_from_fork+0x29/0x50
> > > ?[Mon Mar 11 14:12:05 2024] ?</TASK>
> > > ?[Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
> > This is probably this WARN in nfsd_break_one_deleg:
> >
> > WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
> >
> > It means that a delegation break callback to the client couldn't be
> > queued to the workqueue, and so it didn't run.
> >
> > > Could this be the same issue as described here:https://lore.kernel.org/linux-nfs/[email protected]/ ?
> > >
> > Yes, most likely the same problem.
> If I read that thread correctly, this issue was introduced between
> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
> backported these changes, or were we hitting some other bug with that
> version? It seems the 6.1.x kernel is not affected? If so, that would be
> the recommended kernel to run?

Anything is possible. We have to identify the problem first.
>


> >
> > > As described in that thread, I've tried to obtain the requested information.
> > >
> > >
> > > Is it possible this is the issue that was fixed by the patches described here? https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/
> > >
> > Doubtful. Those are targeted toward a different set of issues.
> >
> > If you're willing, I do have some patches queued up for CentOS here that
> > fix some backchannel problems that could be related. I'm mainly waiting
> > on Chuck to send these to Linus and then we'll likely merge them into
> > CentOS soon afterward:
> >
> > https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689
> >
> If you can send me a patch file, I can rebuild the C9S kernel with that
> patch and run it. It can take a while for the bug to trigger as I
> believe it seems to be very workload dependent (we were running very
> stable for months and now hit this bug every other week).
>
>

It's probably simpler to just pull down the build artifacts for that MR.
You have to drill down through the CI for it, but they are here:

https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/1194300175/publish_x86_64/6278921877/artifacts/

There's even a repo file you can install on the box to pull them down.
--
Jeff Layton <[email protected]>

2024-03-12 13:30:27

by Rik Theys

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

Hi,

On 3/12/24 13:47, Jeff Layton wrote:
> On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
>> On 3/12/24 12:22, Jeff Layton wrote:
>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically logged hung nfsd tasks. The initial effect was that some clients could no longer access the NFS server. This got worse and worse (probably as more nfsd threads got blocked) and we had to restart the server. Restarting the server also failed as the NFS server service could no longer be stopped.
>>>>
>>>>
>>>> The initial kernel we noticed this behavior on was kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue happened again on this newer kernel version:
>>>>
> 419 is fairly up to date with nfsd changes. There are some known bugs
> around callbacks, and there is a draft MR in flight to fix it.
>
> What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ? If we
> can bracket the changes around a particular version, then that might
> help identify the problem.
The server on which we are experiencing this the most was upgraded from
CentOS 7 recently and only ran 5.14.0-362.18.1 (and now 419). Another
server on which we (less frequently) have this issue is running EL9 for
much longer (kernels 5.14.0-162.23.1, 5.14.0-284.11.1, 5.14.0-284.18.1,
5.14.0-284.30.1, 5.14.0-362.8.1), but we only started to experience the
issue on 5.14.0-362.18.1. It could be the bug was also present in older
versions and that we never triggered it there.
>
>>>> [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>>>  [Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>  [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0     pid:8865  ppid:2      flags:0x00004000
>>>>  [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>  [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>  [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>  [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>  [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>  [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>  [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>  [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>  [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>  [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>  [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>  [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for more than 122 seconds.
>>>>  [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>>>  [Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>  [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0     pid:8866  ppid:2      flags:0x00004000
>>>>  [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>  [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>  [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>  [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>  [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>  [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>  [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>  [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>  [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>  [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>  [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>  [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>
>>> The above threads are trying to flush the workqueue, so that probably
>>> means that they are stuck waiting on a workqueue job to finish.
>>>>  The above is repeated a few times, and then this warning is also logged:
>>>>
>>>> [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
>>>>  [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4 dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
>>>>  nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>  ibcrc32c dm_service_time dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp
>>>>  _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>  ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>  nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4 mbcache jbd2 sd_mod sg lpfc
>>>>  [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>  el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror dm_region_hash dm_log dm_mod
>>>>  [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not tainted 5.14.0-419.el9.x86_64 #1
>>>>  [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>  [Mon Mar 11 14:12:05 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>  02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>  [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS: 00010246
>>>>  [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX: ffff8ada51930900 RCX: 0000000000000024
>>>>  [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI: ffff8ad582933c00 RDI: 0000000000002000
>>>>  [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08: ffff9929e0bb7b48 R09: 0000000000000000
>>>>  [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11: 0000000000000000 R12: ffff8ad6f497c360
>>>>  [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14: ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>  [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000) GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>  [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>  [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3: 00000018e58de006 CR4: 00000000007706e0
>>>>  [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>  [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>  [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>  [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>  [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>  [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>  [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>  [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>  [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>  [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>  [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>  [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>>  [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>  [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>  [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>  [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>  [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>  [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
>>> This is probably this WARN in nfsd_break_one_deleg:
>>>
>>> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>>>
>>> It means that a delegation break callback to the client couldn't be
>>> queued to the workqueue, and so it didn't run.
>>>
>>>> Could this be the same issue as described here:https://lore.kernel.org/linux-nfs/[email protected]/ ?
>>>>
>>> Yes, most likely the same problem.
>> If I read that thread correctly, this issue was introduced between
>> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
>> backported these changes, or were we hitting some other bug with that
>> version? It seems the 6.1.x kernel is not affected? If so, that would be
>> the recommended kernel to run?
> Anything is possible. We have to identify the problem first.
>
>>>> As described in that thread, I've tried to obtain the requested information.
>>>>
>>>>
>>>> Is it possible this is the issue that was fixed by the patches described here? https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/
>>>>
>>> Doubtful. Those are targeted toward a different set of issues.
>>>
>>> If you're willing, I do have some patches queued up for CentOS here that
>>> fix some backchannel problems that could be related. I'm mainly waiting
>>> on Chuck to send these to Linus and then we'll likely merge them into
>>> CentOS soon afterward:
>>>
>>> https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689
>>>
>> If you can send me a patch file, I can rebuild the C9S kernel with that
>> patch and run it. It can take a while for the bug to trigger as I
>> believe it seems to be very workload dependent (we were running very
>> stable for months and now hit this bug every other week).
>>
>>
> It's probably simpler to just pull down the build artifacts for that MR.
> You have to drill down through the CI for it, but they are here:
>
> https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/1194300175/publish_x86_64/6278921877/artifacts/
>
> There's even a repo file you can install on the box to pull them down.

Ok, I will try these instead.

Regards,

Rik


--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>


2024-03-12 16:46:38

by Dai Ngo

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning


On 3/12/24 4:37 AM, Jeff Layton wrote:
> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>
>>
>>
>>
>> Hi,
>>
>>
>>
>>
>> Since a few weeks our Rocky Linux 9 NFS server has periodically logged hung nfsd tasks. The initial effect was that some clients could no longer access the NFS server. This got worse and worse (probably as more nfsd threads got blocked) and we had to restart the server. Restarting the server also failed as the NFS server service could no longer be stopped.
>>
>>
>>
>>
>> The initial kernel we noticed this behavior on was kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue happened again on this newer kernel version:
>>
>>
>>
>>
>> [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>  [Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>  [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0     pid:8865  ppid:2      flags:0x00004000
>>  [Mon Mar 11 14:10:08 2024] Call Trace:
>>  [Mon Mar 11 14:10:08 2024]  <TASK>
>>  [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>  [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>  [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>  [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>  [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>  [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>  [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>  [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>  [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>  [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>  [Mon Mar 11 14:10:08 2024]  </TASK>
>>  [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for more than 122 seconds.
>>  [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>  [Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>  [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0     pid:8866  ppid:2      flags:0x00004000
>>  [Mon Mar 11 14:10:08 2024] Call Trace:
>>  [Mon Mar 11 14:10:08 2024]  <TASK>
>>  [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>  [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>  [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>  [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>  [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>  [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>  [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>  [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>  [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>  [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>  [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>  [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>  [Mon Mar 11 14:10:08 2024]  </TASK>
>>
>>
>>
>>
>>  The above is repeated a few times, and then this warning is also logged:
>>
>>
>>
>>
>> [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
>>  [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>  [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4 dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
>>  nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>  ibcrc32c dm_service_time dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp
>>  _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>  ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>  nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4 mbcache jbd2 sd_mod sg lpfc
>>  [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>  el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror dm_region_hash dm_log dm_mod
>>  [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not tainted 5.14.0-419.el9.x86_64 #1
>>  [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.20.1 09/13/2023
>>  [Mon Mar 11 14:12:05 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>  [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>  02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>  [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS: 00010246
>>  [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX: ffff8ada51930900 RCX: 0000000000000024
>>  [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI: ffff8ad582933c00 RDI: 0000000000002000
>>  [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08: ffff9929e0bb7b48 R09: 0000000000000000
>>  [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11: 0000000000000000 R12: ffff8ad6f497c360
>>  [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14: ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>  [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000) GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>  [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3: 00000018e58de006 CR4: 00000000007706e0
>>  [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>  [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>  [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>  [Mon Mar 11 14:12:05 2024] Call Trace:
>>  [Mon Mar 11 14:12:05 2024]  <TASK>
>>  [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>  [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>  [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>  [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>  [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>  [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>  [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>  [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>  [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>  [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>  [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>  [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>  [Mon Mar 11 14:12:05 2024]  </TASK>
>>  [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
>>
> [Mon Mar 11 14:29:16 2024] task:kworker/u96:3 state:D stack:0 pid:2451130 ppid:2 flags:0x00004000
> [Mon Mar 11 14:29:16 2024] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
> [Mon Mar 11 14:29:16 2024] Call Trace:
> [Mon Mar 11 14:29:16 2024] <TASK>
> [Mon Mar 11 14:29:16 2024] __schedule+0x21b/0x550
> [Mon Mar 11 14:29:16 2024] schedule+0x2d/0x70
> [Mon Mar 11 14:29:16 2024] schedule_timeout+0x88/0x160
> [Mon Mar 11 14:29:16 2024] ? __pfx_process_timeout+0x10/0x10
> [Mon Mar 11 14:29:16 2024] rpc_shutdown_client+0xb3/0x150 [sunrpc]
> [Mon Mar 11 14:29:16 2024] ? __pfx_autoremove_wake_function+0x10/0x10
> [Mon Mar 11 14:29:16 2024] nfsd4_process_cb_update+0x3e/0x260 [nfsd]
> [Mon Mar 11 14:29:16 2024] ? sched_clock+0xc/0x30
> [Mon Mar 11 14:29:16 2024] ? raw_spin_rq_lock_nested+0x19/0x80
> [Mon Mar 11 14:29:16 2024] ? newidle_balance+0x26e/0x400
> [Mon Mar 11 14:29:16 2024] ? pick_next_task_fair+0x41/0x500
> [Mon Mar 11 14:29:16 2024] ? put_prev_task_fair+0x1e/0x40
> [Mon Mar 11 14:29:16 2024] ? pick_next_task+0x861/0x950
> [Mon Mar 11 14:29:16 2024] ? __update_idle_core+0x23/0xc0
> [Mon Mar 11 14:29:16 2024] ? __switch_to_asm+0x3a/0x80
> [Mon Mar 11 14:29:16 2024] ? finish_task_switch.isra.0+0x8c/0x2a0
> [Mon Mar 11 14:29:16 2024] nfsd4_run_cb_work+0x9f/0x150 [nfsd]
> [Mon Mar 11 14:29:16 2024] process_one_work+0x1e2/0x3b0
> [Mon Mar 11 14:29:16 2024] worker_thread+0x50/0x3a0
> [Mon Mar 11 14:29:16 2024] ? __pfx_worker_thread+0x10/0x10
> [Mon Mar 11 14:29:16 2024] kthread+0xdd/0x100
> [Mon Mar 11 14:29:16 2024] ? __pfx_kthread+0x10/0x10
> [Mon Mar 11 14:29:16 2024] ret_from_fork+0x29/0x50
> [Mon Mar 11 14:29:16 2024] </TASK>
>
> The above is the main task that I see in the cb workqueue. It's trying to call rpc_shutdown_client, which is waiting for this:
>
> wait_event_timeout(destroy_wait,
> list_empty(&clnt->cl_tasks), 1*HZ);
>
> ...so basically waiting for the cl_tasks list to go empty. It repeatedly
> does a rpc_killall_tasks though, so possibly trying to kill this task?
>
> 18423 2281 0 0x18 0x0 1354 nfsd4_cb_ops [nfsd] nfs4_cbv1 CB_RECALL_ANY a:call_start [sunrpc] q:delayq

I wonder why this task is on delayq. Could it be related to memory
shortage issue, or connection related problems?
Output of /proc/meminfo on the nfs server at time of the problem
would shed some light.

Currently there is only 1 active task allowed for the nfsd callback
workqueue at a time. If for some reasons a callback task is stuck in
the workqueue it will block all other callback tasks which can effect
multiple clients.

-Dai

>
> Callbacks are soft RPC tasks though, so they should be easily killable.

2024-03-12 18:23:41

by Rik Theys

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

Hi,

On 3/12/24 17:43, Dai Ngo wrote:
>
> On 3/12/24 4:37 AM, Jeff Layton wrote:
>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>> logged hung nfsd tasks. The initial effect was that some clients
>>> could no longer access the NFS server. This got worse and worse
>>> (probably as more nfsd threads got blocked) and we had to restart
>>> the server. Restarting the server also failed as the NFS server
>>> service could no longer be stopped.
>>>
>>>
>>>
>>> The initial kernel we noticed this behavior on was
>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
>>> happened again on this newer kernel version:
>>>
>>>
>>>
>>> [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>>   [Mon Mar 11 14:10:08 2024] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>   [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0
>>>     pid:8865  ppid:2      flags:0x00004000
>>>   [Mon Mar 11 14:10:08 2024] Call Trace:
>>>   [Mon Mar 11 14:10:08 2024]  <TASK>
>>>   [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>   [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>   [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>   [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>   [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>   [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>   [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  ?
>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>   [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>   [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>   [Mon Mar 11 14:10:08 2024]  </TASK>
>>>   [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for more
>>> than 122 seconds.
>>>   [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>>   [Mon Mar 11 14:10:08 2024] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>   [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0
>>>     pid:8866  ppid:2      flags:0x00004000
>>>   [Mon Mar 11 14:10:08 2024] Call Trace:
>>>   [Mon Mar 11 14:10:08 2024]  <TASK>
>>>   [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>   [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>   [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>   [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>   [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>   [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>   [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>   [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>   [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>   [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>   [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>   [Mon Mar 11 14:10:08 2024]  </TASK>
>>>
>>>
>>>
>>>   The above is repeated a few times, and then this warning is also
>>> logged:
>>>
>>>
>>>
>>> [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
>>>   [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4 dns_resolver
>>> nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm
>>> ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
>>>   nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>   ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>> intel_rapl_common intel_uncore_frequency
>>> intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm
>>> ipmi_ssif x86_pkg_temp
>>>   _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass dcdbas
>>> rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper dell_smbios
>>> drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>   ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi
>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>   nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4 mbcache
>>> jbd2 sd_mod sg lpfc
>>>   [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics
>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>   el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>> dm_region_hash dm_log dm_mod
>>>   [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>> tainted 5.14.0-419.el9.x86_64 #1
>>>   [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>   [Mon Mar 11 14:12:05 2024] RIP:
>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48 89
>>> df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9 00 00
>>> 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>   02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>   [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
>>> 00010246
>>>   [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>> ffff8ada51930900 RCX: 0000000000000024
>>>   [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>> ffff8ad582933c00 RDI: 0000000000002000
>>>   [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>   [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>> 0000000000000000 R12: ffff8ad6f497c360
>>>   [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>   [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>   [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>> 0000000080050033
>>>   [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>> 00000018e58de006 CR4: 00000000007706e0
>>>   [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>> 0000000000000000 DR2: 0000000000000000
>>>   [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>   [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>   [Mon Mar 11 14:12:05 2024] Call Trace:
>>>   [Mon Mar 11 14:12:05 2024]  <TASK>
>>>   [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>   [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>   [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>   [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>   [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>   [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_file_lookup_locked+0x117/0x160
>>> [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>   [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>   [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>   [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>   [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>   [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>   [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>   [Mon Mar 11 14:12:05 2024]  </TASK>
>>>   [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
>> [Mon Mar 11 14:29:16 2024] task:kworker/u96:3   state:D stack:0    
>> pid:2451130 ppid:2      flags:0x00004000
>> [Mon Mar 11 14:29:16 2024] Workqueue: nfsd4_callbacks
>> nfsd4_run_cb_work [nfsd]
>> [Mon Mar 11 14:29:16 2024] Call Trace:
>> [Mon Mar 11 14:29:16 2024]  <TASK>
>> [Mon Mar 11 14:29:16 2024]  __schedule+0x21b/0x550
>> [Mon Mar 11 14:29:16 2024]  schedule+0x2d/0x70
>> [Mon Mar 11 14:29:16 2024]  schedule_timeout+0x88/0x160
>> [Mon Mar 11 14:29:16 2024]  ? __pfx_process_timeout+0x10/0x10
>> [Mon Mar 11 14:29:16 2024]  rpc_shutdown_client+0xb3/0x150 [sunrpc]
>> [Mon Mar 11 14:29:16 2024]  ? __pfx_autoremove_wake_function+0x10/0x10
>> [Mon Mar 11 14:29:16 2024]  nfsd4_process_cb_update+0x3e/0x260 [nfsd]
>> [Mon Mar 11 14:29:16 2024]  ? sched_clock+0xc/0x30
>> [Mon Mar 11 14:29:16 2024]  ? raw_spin_rq_lock_nested+0x19/0x80
>> [Mon Mar 11 14:29:16 2024]  ? newidle_balance+0x26e/0x400
>> [Mon Mar 11 14:29:16 2024]  ? pick_next_task_fair+0x41/0x500
>> [Mon Mar 11 14:29:16 2024]  ? put_prev_task_fair+0x1e/0x40
>> [Mon Mar 11 14:29:16 2024]  ? pick_next_task+0x861/0x950
>> [Mon Mar 11 14:29:16 2024]  ? __update_idle_core+0x23/0xc0
>> [Mon Mar 11 14:29:16 2024]  ? __switch_to_asm+0x3a/0x80
>> [Mon Mar 11 14:29:16 2024]  ? finish_task_switch.isra.0+0x8c/0x2a0
>> [Mon Mar 11 14:29:16 2024]  nfsd4_run_cb_work+0x9f/0x150 [nfsd]
>> [Mon Mar 11 14:29:16 2024]  process_one_work+0x1e2/0x3b0
>> [Mon Mar 11 14:29:16 2024]  worker_thread+0x50/0x3a0
>> [Mon Mar 11 14:29:16 2024]  ? __pfx_worker_thread+0x10/0x10
>> [Mon Mar 11 14:29:16 2024]  kthread+0xdd/0x100
>> [Mon Mar 11 14:29:16 2024]  ? __pfx_kthread+0x10/0x10
>> [Mon Mar 11 14:29:16 2024]  ret_from_fork+0x29/0x50
>> [Mon Mar 11 14:29:16 2024]  </TASK>
>>
>> The above is the main task that I see in the cb workqueue. It's
>> trying to call rpc_shutdown_client, which is waiting for this:
>>
>>                  wait_event_timeout(destroy_wait,
>>                          list_empty(&clnt->cl_tasks), 1*HZ);
>>
>> ...so basically waiting for the cl_tasks list to go empty. It repeatedly
>> does a rpc_killall_tasks though, so possibly trying to kill this task?
>>
>>      18423 2281      0 0x18 0x0     1354 nfsd4_cb_ops [nfsd]
>> nfs4_cbv1 CB_RECALL_ANY a:call_start [sunrpc] q:delayq
>
> I wonder why this task is on delayq. Could it be related to memory
> shortage issue, or connection related problems?
> Output of /proc/meminfo on the nfs server at time of the problem
> would shed some light.

We don't have that anymore. I can check our monitoring host more closely
for more fine grained stats tomorrow, but when I look at the sar
statistics (see attachment) nothing special was going on memory or
network wise.

We start to get the cpu stall messages and the system load starts to
rise (starts around 2:10 PM). At 3:00 PM we restart the server as our
users can no longer work.

Looking at the stats, the cpu's were ~idle. The only thing that may be
related is that around 11:30 AM the write load (rx packets) starts to
get a lot higher than the read load (tx packets). This goes on for hours
(even after the server was restarted) and that workload was later
identified. It was a workload that was constantly rewriting a statistics
file.

Regards,

Rik


>
> Currently there is only 1 active task allowed for the nfsd callback
> workqueue at a time. If for some reasons a callback task is stuck in
> the workqueue it will block all other callback tasks which can effect
> multiple clients.
>
> -Dai
>
>>
>> Callbacks are soft RPC tasks though, so they should be easily killable.

--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>


Attachments:
sarstats.txt (16.42 kB)

2024-03-13 19:16:19

by Dai Ngo

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning


On 3/12/24 11:23 AM, Rik Theys wrote:
> Hi,
>
> On 3/12/24 17:43, Dai Ngo wrote:
>>
>> On 3/12/24 4:37 AM, Jeff Layton wrote:
>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>> logged hung nfsd tasks. The initial effect was that some clients
>>>> could no longer access the NFS server. This got worse and worse
>>>> (probably as more nfsd threads got blocked) and we had to restart
>>>> the server. Restarting the server also failed as the NFS server
>>>> service could no longer be stopped.
>>>>
>>>>
>>>>
>>>> The initial kernel we noticed this behavior on was
>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
>>>> happened again on this newer kernel version:
>>>>
>>>>
>>>>
>>>> [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>>>   [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>   [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0
>>>>     pid:8865  ppid:2      flags:0x00004000
>>>>   [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>   [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>   [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>   [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>   [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>   [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>   [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>   [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>   [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120
>>>> [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  ?
>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>   [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>   [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>   [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>   [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for more
>>>> than 122 seconds.
>>>>   [Mon Mar 11 14:10:08 2024]       Not tainted
>>>> 5.14.0-419.el9.x86_64 #1
>>>>   [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>   [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0
>>>>     pid:8866  ppid:2      flags:0x00004000
>>>>   [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>   [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>   [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>   [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>   [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>   [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>   [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>   [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>   [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>   [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>   [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>   [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>   [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>   [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>
>>>>
>>>>
>>>>   The above is repeated a few times, and then this warning is also
>>>> logged:
>>>>
>>>>
>>>>
>>>> [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
>>>>   [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4 dns_resolver
>>>> nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm
>>>> ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
>>>>   nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>   ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>> intel_rapl_common intel_uncore_frequency
>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>   _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass dcdbas
>>>> rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper dell_smbios
>>>> drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>   ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi
>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>   nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4 mbcache
>>>> jbd2 sd_mod sg lpfc
>>>>   [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics
>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>   el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>> dm_region_hash dm_log dm_mod
>>>>   [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>   [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>   [Mon Mar 11 14:12:05 2024] RIP:
>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48 89
>>>> df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9 00
>>>> 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>   02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>   [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
>>>> 00010246
>>>>   [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>   [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>   [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>   [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>   [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>   [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>   [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>>> 0000000080050033
>>>>   [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>   [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>> 0000000000000000 DR2: 0000000000000000
>>>>   [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>   [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>   [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>   [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>   [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>   [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>   [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>   [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>   [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>   [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_file_lookup_locked+0x117/0x160
>>>> [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>   [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>>   [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>   [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>   [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>   [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>   [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>   [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>   [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
>>> [Mon Mar 11 14:29:16 2024] task:kworker/u96:3   state:D stack:0    
>>> pid:2451130 ppid:2      flags:0x00004000
>>> [Mon Mar 11 14:29:16 2024] Workqueue: nfsd4_callbacks
>>> nfsd4_run_cb_work [nfsd]
>>> [Mon Mar 11 14:29:16 2024] Call Trace:
>>> [Mon Mar 11 14:29:16 2024]  <TASK>
>>> [Mon Mar 11 14:29:16 2024]  __schedule+0x21b/0x550
>>> [Mon Mar 11 14:29:16 2024]  schedule+0x2d/0x70
>>> [Mon Mar 11 14:29:16 2024]  schedule_timeout+0x88/0x160
>>> [Mon Mar 11 14:29:16 2024]  ? __pfx_process_timeout+0x10/0x10
>>> [Mon Mar 11 14:29:16 2024]  rpc_shutdown_client+0xb3/0x150 [sunrpc]
>>> [Mon Mar 11 14:29:16 2024]  ? __pfx_autoremove_wake_function+0x10/0x10
>>> [Mon Mar 11 14:29:16 2024]  nfsd4_process_cb_update+0x3e/0x260 [nfsd]
>>> [Mon Mar 11 14:29:16 2024]  ? sched_clock+0xc/0x30
>>> [Mon Mar 11 14:29:16 2024]  ? raw_spin_rq_lock_nested+0x19/0x80
>>> [Mon Mar 11 14:29:16 2024]  ? newidle_balance+0x26e/0x400
>>> [Mon Mar 11 14:29:16 2024]  ? pick_next_task_fair+0x41/0x500
>>> [Mon Mar 11 14:29:16 2024]  ? put_prev_task_fair+0x1e/0x40
>>> [Mon Mar 11 14:29:16 2024]  ? pick_next_task+0x861/0x950
>>> [Mon Mar 11 14:29:16 2024]  ? __update_idle_core+0x23/0xc0
>>> [Mon Mar 11 14:29:16 2024]  ? __switch_to_asm+0x3a/0x80
>>> [Mon Mar 11 14:29:16 2024]  ? finish_task_switch.isra.0+0x8c/0x2a0
>>> [Mon Mar 11 14:29:16 2024]  nfsd4_run_cb_work+0x9f/0x150 [nfsd]
>>> [Mon Mar 11 14:29:16 2024]  process_one_work+0x1e2/0x3b0
>>> [Mon Mar 11 14:29:16 2024]  worker_thread+0x50/0x3a0
>>> [Mon Mar 11 14:29:16 2024]  ? __pfx_worker_thread+0x10/0x10
>>> [Mon Mar 11 14:29:16 2024]  kthread+0xdd/0x100
>>> [Mon Mar 11 14:29:16 2024]  ? __pfx_kthread+0x10/0x10
>>> [Mon Mar 11 14:29:16 2024]  ret_from_fork+0x29/0x50
>>> [Mon Mar 11 14:29:16 2024]  </TASK>
>>>
>>> The above is the main task that I see in the cb workqueue. It's
>>> trying to call rpc_shutdown_client, which is waiting for this:
>>>
>>>                  wait_event_timeout(destroy_wait,
>>>                          list_empty(&clnt->cl_tasks), 1*HZ);
>>>
>>> ...so basically waiting for the cl_tasks list to go empty. It
>>> repeatedly
>>> does a rpc_killall_tasks though, so possibly trying to kill this task?
>>>
>>>      18423 2281      0 0x18 0x0     1354 nfsd4_cb_ops [nfsd]
>>> nfs4_cbv1 CB_RECALL_ANY a:call_start [sunrpc] q:delayq
>>
>> I wonder why this task is on delayq. Could it be related to memory
>> shortage issue, or connection related problems?
>> Output of /proc/meminfo on the nfs server at time of the problem
>> would shed some light.
>
> We don't have that anymore. I can check our monitoring host more
> closely for more fine grained stats tomorrow, but when I look at the
> sar statistics (see attachment) nothing special was going on memory or
> network wise.

Thanks Rik for the info.

At 2:10 PM sar statistics shows:
kbmemfree: 1014880
kbavail: 170836368
kbmemused: 2160028
%memused: 1.10
kbcached: 140151288

Paging stats:
pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
02:10:00 PM 2577.67 491251.09 2247.01 0.00 2415029.61 75131.80 0.00 150276.28 200.02

The kbmemfree is pretty low and the caches consume large amount of memory.
The paging statistics also show lots of paging activities, 150276.28/s.

In the previous rpc_tasks.txt, it shows a RPC task is on the delayq waiting
to send the CB_RECALL_ANY. With this version of the kernel, the only time
CB_RECALL_ANY is sent is when the system is under memory pressure and the
nfsd shrinker task runs to free unused delegations.

Next time when this problem happens again, you can try to reclaim some
memory from the caches to see if it helps:

# echo 3 > /proc/sys/vm/drop_caches

-Dai




>
> We start to get the cpu stall messages and the system load starts to
> rise (starts around 2:10 PM). At 3:00 PM we restart the server as our
> users can no longer work.
>
> Looking at the stats, the cpu's were ~idle. The only thing that may be
> related is that around 11:30 AM the write load (rx packets) starts to
> get a lot higher than the read load (tx packets). This goes on for
> hours (even after the server was restarted) and that workload was
> later identified. It was a workload that was constantly rewriting a
> statistics file.
>
> Regards,
>
> Rik
>
>
>>
>> Currently there is only 1 active task allowed for the nfsd callback
>> workqueue at a time. If for some reasons a callback task is stuck in
>> the workqueue it will block all other callback tasks which can effect
>> multiple clients.
>>
>> -Dai
>>
>>>
>>> Callbacks are soft RPC tasks though, so they should be easily killable.
>

2024-03-13 19:50:20

by Rik Theys

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

Hi,

On 3/13/24 19:44, Dai Ngo wrote:
>
> On 3/12/24 11:23 AM, Rik Theys wrote:
>> Hi,
>>
>> On 3/12/24 17:43, Dai Ngo wrote:
>>>
>>> On 3/12/24 4:37 AM, Jeff Layton wrote:
>>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>>> logged hung nfsd tasks. The initial effect was that some clients
>>>>> could no longer access the NFS server. This got worse and worse
>>>>> (probably as more nfsd threads got blocked) and we had to restart
>>>>> the server. Restarting the server also failed as the NFS server
>>>>> service could no longer be stopped.
>>>>>
>>>>>
>>>>>
>>>>> The initial kernel we noticed this behavior on was
>>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
>>>>> happened again on this newer kernel version:
>>>>>
>>>>>
>>>>>
>>>>> [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>>>>   [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>   [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0
>>>>>     pid:8865  ppid:2      flags:0x00004000
>>>>>   [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>   [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>   [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>   [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>   [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>   [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>   [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>   [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>>   [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120
>>>>> [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  ?
>>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>   [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>   [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>   [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for more
>>>>> than 122 seconds.
>>>>>   [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>   [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>   [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0
>>>>>     pid:8866  ppid:2      flags:0x00004000
>>>>>   [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>   [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>   [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>   [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>   [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>   [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>   [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>>   [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>   [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>>   [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240
>>>>> [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>   [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>   [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>   [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>
>>>>>
>>>>>
>>>>>   The above is repeated a few times, and then this warning is also
>>>>> logged:
>>>>>
>>>>>
>>>>>
>>>>> [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
>>>>>   [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4 dns_resolver
>>>>> nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm
>>>>> ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
>>>>>   nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
>>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>>   ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>>> intel_rapl_common intel_uncore_frequency
>>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>>   _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>>>>> dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
>>>>> dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>>   ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi
>>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>>   nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
>>>>> mbcache jbd2 sd_mod sg lpfc
>>>>>   [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics
>>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
>>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>>   el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>>> dm_region_hash dm_log dm_mod
>>>>>   [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>>   [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
>>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>>   [Mon Mar 11 14:12:05 2024] RIP:
>>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48
>>>>> 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
>>>>> 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>>   02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>>   [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
>>>>> 00010246
>>>>>   [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>>   [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>>   [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>>   [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>>   [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>>   [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
>>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>>   [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>>>> 0000000080050033
>>>>>   [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>>   [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>   [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>>   [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>>   [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>>   [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>>   [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>   [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>   [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>> [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>> [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>>   [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>>   [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>>   [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>> [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>>   [Mon Mar 11 14:12:05 2024]  ?
>>>>> nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>>   [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>>>   [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>   [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>   [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>>   [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>>   [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>>   [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>>   [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
>>>> [Mon Mar 11 14:29:16 2024] task:kworker/u96:3   state:D stack:0    
>>>> pid:2451130 ppid:2      flags:0x00004000
>>>> [Mon Mar 11 14:29:16 2024] Workqueue: nfsd4_callbacks
>>>> nfsd4_run_cb_work [nfsd]
>>>> [Mon Mar 11 14:29:16 2024] Call Trace:
>>>> [Mon Mar 11 14:29:16 2024]  <TASK>
>>>> [Mon Mar 11 14:29:16 2024]  __schedule+0x21b/0x550
>>>> [Mon Mar 11 14:29:16 2024]  schedule+0x2d/0x70
>>>> [Mon Mar 11 14:29:16 2024]  schedule_timeout+0x88/0x160
>>>> [Mon Mar 11 14:29:16 2024]  ? __pfx_process_timeout+0x10/0x10
>>>> [Mon Mar 11 14:29:16 2024]  rpc_shutdown_client+0xb3/0x150 [sunrpc]
>>>> [Mon Mar 11 14:29:16 2024]  ? __pfx_autoremove_wake_function+0x10/0x10
>>>> [Mon Mar 11 14:29:16 2024] nfsd4_process_cb_update+0x3e/0x260 [nfsd]
>>>> [Mon Mar 11 14:29:16 2024]  ? sched_clock+0xc/0x30
>>>> [Mon Mar 11 14:29:16 2024]  ? raw_spin_rq_lock_nested+0x19/0x80
>>>> [Mon Mar 11 14:29:16 2024]  ? newidle_balance+0x26e/0x400
>>>> [Mon Mar 11 14:29:16 2024]  ? pick_next_task_fair+0x41/0x500
>>>> [Mon Mar 11 14:29:16 2024]  ? put_prev_task_fair+0x1e/0x40
>>>> [Mon Mar 11 14:29:16 2024]  ? pick_next_task+0x861/0x950
>>>> [Mon Mar 11 14:29:16 2024]  ? __update_idle_core+0x23/0xc0
>>>> [Mon Mar 11 14:29:16 2024]  ? __switch_to_asm+0x3a/0x80
>>>> [Mon Mar 11 14:29:16 2024]  ? finish_task_switch.isra.0+0x8c/0x2a0
>>>> [Mon Mar 11 14:29:16 2024]  nfsd4_run_cb_work+0x9f/0x150 [nfsd]
>>>> [Mon Mar 11 14:29:16 2024]  process_one_work+0x1e2/0x3b0
>>>> [Mon Mar 11 14:29:16 2024]  worker_thread+0x50/0x3a0
>>>> [Mon Mar 11 14:29:16 2024]  ? __pfx_worker_thread+0x10/0x10
>>>> [Mon Mar 11 14:29:16 2024]  kthread+0xdd/0x100
>>>> [Mon Mar 11 14:29:16 2024]  ? __pfx_kthread+0x10/0x10
>>>> [Mon Mar 11 14:29:16 2024]  ret_from_fork+0x29/0x50
>>>> [Mon Mar 11 14:29:16 2024]  </TASK>
>>>>
>>>> The above is the main task that I see in the cb workqueue. It's
>>>> trying to call rpc_shutdown_client, which is waiting for this:
>>>>
>>>>                  wait_event_timeout(destroy_wait,
>>>>                          list_empty(&clnt->cl_tasks), 1*HZ);
>>>>
>>>> ...so basically waiting for the cl_tasks list to go empty. It
>>>> repeatedly
>>>> does a rpc_killall_tasks though, so possibly trying to kill this task?
>>>>
>>>>      18423 2281      0 0x18 0x0     1354 nfsd4_cb_ops [nfsd]
>>>> nfs4_cbv1 CB_RECALL_ANY a:call_start [sunrpc] q:delayq
>>>
>>> I wonder why this task is on delayq. Could it be related to memory
>>> shortage issue, or connection related problems?
>>> Output of /proc/meminfo on the nfs server at time of the problem
>>> would shed some light.
>>
>> We don't have that anymore. I can check our monitoring host more
>> closely for more fine grained stats tomorrow, but when I look at the
>> sar statistics (see attachment) nothing special was going on memory
>> or network wise.
>
> Thanks Rik for the info.
>
> At 2:10 PM sar statistics shows:
> kbmemfree:  1014880
> kbavail:    170836368
> kbmemused:  2160028
> %memused:   1.10
> kbcached:   140151288
>
> Paging stats:
>               pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s
> pgscank/s pgscand/s pgsteal/s    %vmeff
> 02:10:00 PM   2577.67 491251.09   2247.01      0.00 2415029.61
> 75131.80      0.00 150276.28    200.02
>
> The kbmemfree is pretty low and the caches consume large amount of
> memory.
> The paging statistics also show lots of paging activities, 150276.28/s.
>
The workload at that time was a write-heavy workload. The writes were
probably all going to memory until the buffers filled up (or the ext4
commit interval?) and was then written out to disk. And then the process
repeated itself as a large part of the writes were rewriting the same
files over and over again.
> In the previous rpc_tasks.txt, it shows a RPC task is on the delayq
> waiting
> to send the CB_RECALL_ANY. With this version of the kernel, the only time
> CB_RECALL_ANY is sent is when the system is under memory pressure and the
> nfsd shrinker task runs to free unused delegations.

Would it help to increase /proc/sys/vm/min_free_kbytes in this case?


> Next time when this problem happens again, you can try to reclaim some
> memory from the caches to see if it helps:
>
> # echo 3 > /proc/sys/vm/drop_caches

Do you think this could help the system recover at that point?

Regards,

Rik

>
> -Dai
>
>
>
>
>>
>> We start to get the cpu stall messages and the system load starts to
>> rise (starts around 2:10 PM). At 3:00 PM we restart the server as our
>> users can no longer work.
>>
>> Looking at the stats, the cpu's were ~idle. The only thing that may
>> be related is that around 11:30 AM the write load (rx packets) starts
>> to get a lot higher than the read load (tx packets). This goes on for
>> hours (even after the server was restarted) and that workload was
>> later identified. It was a workload that was constantly rewriting a
>> statistics file.
>>
>> Regards,
>>
>> Rik
>>
>>
>>>
>>> Currently there is only 1 active task allowed for the nfsd callback
>>> workqueue at a time. If for some reasons a callback task is stuck in
>>> the workqueue it will block all other callback tasks which can effect
>>> multiple clients.
>>>
>>> -Dai
>>>
>>>>
>>>> Callbacks are soft RPC tasks though, so they should be easily
>>>> killable.
>>
--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>


2024-03-13 22:48:04

by Dai Ngo

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning


On 3/13/24 12:50 PM, Rik Theys wrote:
> Hi,
>
> On 3/13/24 19:44, Dai Ngo wrote:
>>
>> On 3/12/24 11:23 AM, Rik Theys wrote:
>>> Hi,
>>>
>>> On 3/12/24 17:43, Dai Ngo wrote:
>>>>
>>>> On 3/12/24 4:37 AM, Jeff Layton wrote:
>>>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>>>> logged hung nfsd tasks. The initial effect was that some clients
>>>>>> could no longer access the NFS server. This got worse and worse
>>>>>> (probably as more nfsd threads got blocked) and we had to restart
>>>>>> the server. Restarting the server also failed as the NFS server
>>>>>> service could no longer be stopped.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The initial kernel we noticed this behavior on was
>>>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
>>>>>> happened again on this newer kernel version:
>>>>>>
>>>>>>
>>>>>>
>>>>>> [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>   [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>   [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0
>>>>>>     pid:8865  ppid:2      flags:0x00004000
>>>>>>   [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>   [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>   [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>   [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>   [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>   [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>>>   [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120
>>>>>> [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  ?
>>>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>> [sunrpc]
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>   [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>   [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>   [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
>>>>>> more than 122 seconds.
>>>>>>   [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>   [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>   [Mon Mar 11 14:10:08 2024] task:nfsd            state:D stack:0
>>>>>>     pid:8866  ppid:2      flags:0x00004000
>>>>>>   [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>   [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>   [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>   [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>   [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>   [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>>>   [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240
>>>>>> [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>> [sunrpc]
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>   [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>   [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>   [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>   [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>   [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   The above is repeated a few times, and then this warning is
>>>>>> also logged:
>>>>>>
>>>>>>
>>>>>>
>>>>>> [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
>>>>>>   [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
>>>>>> dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
>>>>>> iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter
>>>>>> nft_ct
>>>>>>   nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
>>>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>>>   ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>>>> intel_rapl_common intel_uncore_frequency
>>>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>>>   _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>>>>>> dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
>>>>>> dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>>>   ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi
>>>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>>>   nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
>>>>>> mbcache jbd2 sd_mod sg lpfc
>>>>>>   [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics
>>>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
>>>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>>>   el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>>>> dm_region_hash dm_log dm_mod
>>>>>>   [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>>>   [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
>>>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>>>   [Mon Mar 11 14:12:05 2024] RIP:
>>>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48
>>>>>> 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
>>>>>> 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>>>   02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>>>   [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
>>>>>> 00010246
>>>>>>   [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>>>   [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>>>   [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>>>   [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>>>   [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>>>   [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
>>>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>>>   [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>>>>> 0000000080050033
>>>>>>   [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>>>   [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>   [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>   [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>>>   [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>>>   [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>> [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>> [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>> [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>>>   [Mon Mar 11 14:12:05 2024]  ?
>>>>>> nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>>>   [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830
>>>>>> [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
>>>>>> [sunrpc]
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>   [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>   [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>>>   [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>   [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>>>   [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>>>   [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
>>>>> [Mon Mar 11 14:29:16 2024] task:kworker/u96:3   state:D
>>>>> stack:0     pid:2451130 ppid:2      flags:0x00004000
>>>>> [Mon Mar 11 14:29:16 2024] Workqueue: nfsd4_callbacks
>>>>> nfsd4_run_cb_work [nfsd]
>>>>> [Mon Mar 11 14:29:16 2024] Call Trace:
>>>>> [Mon Mar 11 14:29:16 2024]  <TASK>
>>>>> [Mon Mar 11 14:29:16 2024]  __schedule+0x21b/0x550
>>>>> [Mon Mar 11 14:29:16 2024]  schedule+0x2d/0x70
>>>>> [Mon Mar 11 14:29:16 2024]  schedule_timeout+0x88/0x160
>>>>> [Mon Mar 11 14:29:16 2024]  ? __pfx_process_timeout+0x10/0x10
>>>>> [Mon Mar 11 14:29:16 2024]  rpc_shutdown_client+0xb3/0x150 [sunrpc]
>>>>> [Mon Mar 11 14:29:16 2024]  ?
>>>>> __pfx_autoremove_wake_function+0x10/0x10
>>>>> [Mon Mar 11 14:29:16 2024] nfsd4_process_cb_update+0x3e/0x260 [nfsd]
>>>>> [Mon Mar 11 14:29:16 2024]  ? sched_clock+0xc/0x30
>>>>> [Mon Mar 11 14:29:16 2024]  ? raw_spin_rq_lock_nested+0x19/0x80
>>>>> [Mon Mar 11 14:29:16 2024]  ? newidle_balance+0x26e/0x400
>>>>> [Mon Mar 11 14:29:16 2024]  ? pick_next_task_fair+0x41/0x500
>>>>> [Mon Mar 11 14:29:16 2024]  ? put_prev_task_fair+0x1e/0x40
>>>>> [Mon Mar 11 14:29:16 2024]  ? pick_next_task+0x861/0x950
>>>>> [Mon Mar 11 14:29:16 2024]  ? __update_idle_core+0x23/0xc0
>>>>> [Mon Mar 11 14:29:16 2024]  ? __switch_to_asm+0x3a/0x80
>>>>> [Mon Mar 11 14:29:16 2024]  ? finish_task_switch.isra.0+0x8c/0x2a0
>>>>> [Mon Mar 11 14:29:16 2024]  nfsd4_run_cb_work+0x9f/0x150 [nfsd]
>>>>> [Mon Mar 11 14:29:16 2024]  process_one_work+0x1e2/0x3b0
>>>>> [Mon Mar 11 14:29:16 2024]  worker_thread+0x50/0x3a0
>>>>> [Mon Mar 11 14:29:16 2024]  ? __pfx_worker_thread+0x10/0x10
>>>>> [Mon Mar 11 14:29:16 2024]  kthread+0xdd/0x100
>>>>> [Mon Mar 11 14:29:16 2024]  ? __pfx_kthread+0x10/0x10
>>>>> [Mon Mar 11 14:29:16 2024]  ret_from_fork+0x29/0x50
>>>>> [Mon Mar 11 14:29:16 2024]  </TASK>
>>>>>
>>>>> The above is the main task that I see in the cb workqueue. It's
>>>>> trying to call rpc_shutdown_client, which is waiting for this:
>>>>>
>>>>>                  wait_event_timeout(destroy_wait,
>>>>> list_empty(&clnt->cl_tasks), 1*HZ);
>>>>>
>>>>> ...so basically waiting for the cl_tasks list to go empty. It
>>>>> repeatedly
>>>>> does a rpc_killall_tasks though, so possibly trying to kill this
>>>>> task?
>>>>>
>>>>>      18423 2281      0 0x18 0x0     1354 nfsd4_cb_ops [nfsd]
>>>>> nfs4_cbv1 CB_RECALL_ANY a:call_start [sunrpc] q:delayq
>>>>
>>>> I wonder why this task is on delayq. Could it be related to memory
>>>> shortage issue, or connection related problems?
>>>> Output of /proc/meminfo on the nfs server at time of the problem
>>>> would shed some light.
>>>
>>> We don't have that anymore. I can check our monitoring host more
>>> closely for more fine grained stats tomorrow, but when I look at the
>>> sar statistics (see attachment) nothing special was going on memory
>>> or network wise.
>>
>> Thanks Rik for the info.
>>
>> At 2:10 PM sar statistics shows:
>> kbmemfree:  1014880
>> kbavail:    170836368
>> kbmemused:  2160028
>> %memused:   1.10
>> kbcached:   140151288
>>
>> Paging stats:
>>               pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s
>> pgscank/s pgscand/s pgsteal/s    %vmeff
>> 02:10:00 PM   2577.67 491251.09   2247.01      0.00 2415029.61
>> 75131.80      0.00 150276.28    200.02
>>
>> The kbmemfree is pretty low and the caches consume large amount of
>> memory.
>> The paging statistics also show lots of paging activities, 150276.28/s.
>>
> The workload at that time was a write-heavy workload. The writes were
> probably all going to memory until the buffers filled up (or the ext4
> commit interval?) and was then written out to disk. And then the
> process repeated itself as a large part of the writes were rewriting
> the same files over and over again.
>> In the previous rpc_tasks.txt, it shows a RPC task is on the delayq
>> waiting
>> to send the CB_RECALL_ANY. With this version of the kernel, the only
>> time
>> CB_RECALL_ANY is sent is when the system is under memory pressure and
>> the
>> nfsd shrinker task runs to free unused delegations.
>
> Would it help to increase /proc/sys/vm/min_free_kbytes in this case?

I'm not a VM subsystem expert but I've seen recommendation to set this
value to 1% of total memory.

>
>
>> Next time when this problem happens again, you can try to reclaim some
>> memory from the caches to see if it helps:
>>
>> # echo 3 > /proc/sys/vm/drop_caches
>
> Do you think this could help the system recover at that point?

There is no guarantee but it's worth a try.

-Dai

>
> Regards,
>
> Rik
>
>>
>> -Dai
>>
>>
>>
>>
>>>
>>> We start to get the cpu stall messages and the system load starts to
>>> rise (starts around 2:10 PM). At 3:00 PM we restart the server as
>>> our users can no longer work.
>>>
>>> Looking at the stats, the cpu's were ~idle. The only thing that may
>>> be related is that around 11:30 AM the write load (rx packets)
>>> starts to get a lot higher than the read load (tx packets). This
>>> goes on for hours (even after the server was restarted) and that
>>> workload was later identified. It was a workload that was constantly
>>> rewriting a statistics file.
>>>
>>> Regards,
>>>
>>> Rik
>>>
>>>
>>>>
>>>> Currently there is only 1 active task allowed for the nfsd callback
>>>> workqueue at a time. If for some reasons a callback task is stuck in
>>>> the workqueue it will block all other callback tasks which can effect
>>>> multiple clients.
>>>>
>>>> -Dai
>>>>
>>>>>
>>>>> Callbacks are soft RPC tasks though, so they should be easily
>>>>> killable.
>>>

2024-03-18 20:22:27

by Rik Theys

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

Hi Jeff,

On 3/12/24 13:47, Jeff Layton wrote:
> On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
>> Hi Jeff,
>>
>> On 3/12/24 12:22, Jeff Layton wrote:
>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically logged hung nfsd tasks. The initial effect was that some clients could no longer access the NFS server. This got worse and worse (probably as more nfsd threads got blocked) and we had to restart the server. Restarting the server also failed as the NFS server service could no longer be stopped.
>>>>
>>>>
>>>> The initial kernel we noticed this behavior on was kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue happened again on this newer kernel version:
>>>>
> 419 is fairly up to date with nfsd changes. There are some known bugs
> around callbacks, and there is a draft MR in flight to fix it.
>
> What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ? If we
> can bracket the changes around a particular version, then that might
> help identify the problem.
>
>>>> [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>>>  [Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>  [Mon Mar 11 14:10:08 2024]task:nfsd            state:D stack:0     pid:8865  ppid:2      flags:0x00004000
>>>>  [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>  [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>  [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>  [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>  [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>  [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>  [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>  [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>  [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>  [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>  [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>  [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for more than 122 seconds.
>>>>  [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>>>  [Mon Mar 11 14:10:08 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>  [Mon Mar 11 14:10:08 2024]task:nfsd            state:D stack:0     pid:8866  ppid:2      flags:0x00004000
>>>>  [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>  [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>  [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>  [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>  [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>  [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>  [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>  [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>  [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>  [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>  [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>  [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>  [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>  [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>  [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>
>>> The above threads are trying to flush the workqueue, so that probably
>>> means that they are stuck waiting on a workqueue job to finish.
>>>>  The above is repeated a few times, and then this warning is also logged:
>>>>
>>>> [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
>>>>  [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4 dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
>>>>  nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>  ibcrc32c dm_service_time dm_multipath intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm ipmi_ssif x86_pkg_temp
>>>>  _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>  ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>  nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4 mbcache jbd2 sd_mod sg lpfc
>>>>  [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>  el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror dm_region_hash dm_log dm_mod
>>>>  [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not tainted 5.14.0-419.el9.x86_64 #1
>>>>  [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>  [Mon Mar 11 14:12:05 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>  02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>  [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS: 00010246
>>>>  [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX: ffff8ada51930900 RCX: 0000000000000024
>>>>  [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI: ffff8ad582933c00 RDI: 0000000000002000
>>>>  [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08: ffff9929e0bb7b48 R09: 0000000000000000
>>>>  [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11: 0000000000000000 R12: ffff8ad6f497c360
>>>>  [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14: ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>  [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000) GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>  [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>  [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3: 00000018e58de006 CR4: 00000000007706e0
>>>>  [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>  [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>  [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>  [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>  [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>  [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>  [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>  [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>  [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>  [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>  [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>  [Mon Mar 11 14:12:05 2024]  ? nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>  [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>>  [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>  [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>  [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>  [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>  [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>  [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>  [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
>>> This is probably this WARN in nfsd_break_one_deleg:
>>>
>>> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>>>
>>> It means that a delegation break callback to the client couldn't be
>>> queued to the workqueue, and so it didn't run.
>>>
>>>> Could this be the same issue as described here:https://lore.kernel.org/linux-nfs/[email protected]/ ?
>>>>
>>> Yes, most likely the same problem.
>> If I read that thread correctly, this issue was introduced between
>> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
>> backported these changes, or were we hitting some other bug with that
>> version? It seems the 6.1.x kernel is not affected? If so, that would be
>> the recommended kernel to run?
> Anything is possible. We have to identify the problem first.
>>>> As described in that thread, I've tried to obtain the requested information.
>>>>
>>>>
>>>> Is it possible this is the issue that was fixed by the patches described here?https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/
>>>>
>>> Doubtful. Those are targeted toward a different set of issues.
>>>
>>> If you're willing, I do have some patches queued up for CentOS here that
>>> fix some backchannel problems that could be related. I'm mainly waiting
>>> on Chuck to send these to Linus and then we'll likely merge them into
>>> CentOS soon afterward:
>>>
>>> https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689
>>>
>> If you can send me a patch file, I can rebuild the C9S kernel with that
>> patch and run it. It can take a while for the bug to trigger as I
>> believe it seems to be very workload dependent (we were running very
>> stable for months and now hit this bug every other week).
>>
>>
> It's probably simpler to just pull down the build artifacts for that MR.
> You have to drill down through the CI for it, but they are here:
>
> https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/1194300175/publish_x86_64/6278921877/artifacts/
>
> There's even a repo file you can install on the box to pull them down.

We installed this kernel on the server 3 days ago. Today, a user
informed us that their screen was black after logging in. Similar to
other occurrences of this issue, the mount command on the client was
hung. But in contrast to the other times, there were no messages in the
logs kernel logs on the server. Even restarting the client does not
resolve the issue.

Something still seems to be wrong on the server though. When I look at
the directories under /proc/fs/nfsd/clients, there's still a directory
for the specific client, even though it's no longer running:

# cat 155/info
clientid: 0xc8edb7f65f4a9ad
address: "10.87.31.152:819"
status: confirmed
seconds from last renew: 33163
name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
minor version: 2
Implementation domain: "kernel.org"
Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
Implementation time: [0, 0]
callback state: DOWN
callback address: 10.87.31.152:0


The system seems to have identified that the client is no longer
reachable, but the client entry does not go away. When a mount was
hanging on the client, there would be two directories in clients for the
same client. Killing the mount command clears up the second entry.

Even after running conntrack -D on the server to remove the tcp
connection from the conntrack table, the entry doesn't go away and the
client still can not mount anything from the server.

A tcpdump on the client while a mount was running logged the following
messages over and over again:

request:

Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024 bits)
Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
Transmission Control Protocol, Src Port: 932, Dst Port: 2049, Seq: 1,
Ack: 1, Len: 312
Remote Procedure Call, Type:Call XID:0x1d3220c4
Network File System
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    GSS Data, Ops(1): CREATE_SESSION
        Length: 152
        GSS Sequence Number: 76
        Tag: <EMPTY>
        minorversion: 2
        Operations (count: 1): CREATE_SESSION
        [Main Opcode: CREATE_SESSION (43)]
    GSS Checksum:
00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
        GSS Token Length: 40
        GSS-API Generic Security Service Application Program Interface
            krb5_blob:
040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…

response:

Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648 bits)
Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
HP_19:7d:4b (e0:73:e7:19:7d:4b)
Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
Transmission Control Protocol, Src Port: 2049, Dst Port: 932, Seq: 1,
Ack: 313, Len: 140
Remote Procedure Call, Type:Reply XID:0x1d3220c4
Network File System
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
        Length: 24
        GSS Sequence Number: 76
        Status: NFS4ERR_DELAY (10008)
        Tag: <EMPTY>
        Operations (count: 1)
        [Main Opcode: CREATE_SESSION (43)]
    GSS Checksum:
00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
        GSS Token Length: 40
        GSS-API Generic Security Service Application Program Interface
            krb5_blob:
040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…

I was hoping that giving the client a different IP address would resolve
the issue for this client, but it didn't. Even though the client had a
new IP address (hostname was kept the same), it failed to mount anything
from the server.

I created another dump of the workqueues and worker pools on the server:

[Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
[Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
[Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
active=1/256 refcnt=2
[Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
[drm_kms_helper]
[Mon Mar 18 14:59:33 2024] workqueue events_power_efficient: flags=0x80
[Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
active=1/256 refcnt=2
[Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
[Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
[Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
active=1/256 refcnt=3
[Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu BAR(362)
[Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
[Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0 nice=-20
active=1/256 refcnt=2
[Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work


In contrast to last time, it doesn't show anything regarding nfs this time.

I also tried the suggestion from Dai Ngo (echo 3 >
/proc/sys/vm/drop_caches), but that didn't seem to make any difference.

We haven't restarted the server yet as it seems the impact seems to
affect fewer clients that before. Is there anything we can run on the
server to further debug this?

In the past, the issue seemed to deteriorate rapidly and resulted in
issues for almost all clients after about 20 minutes. This time the
impact seems to be less, but it's not gone.

How can we force the NFS server to forget about a specific client? I
haven't tried to restart the nfs service yet as I'm afraid it will fail
to stop as before.


Regards,

Rik

--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>


2024-03-18 21:15:29

by Rik Theys

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

Hi,

On 3/18/24 21:21, Rik Theys wrote:
> Hi Jeff,
>
> On 3/12/24 13:47, Jeff Layton wrote:
>> On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
>>> Hi Jeff,
>>>
>>> On 3/12/24 12:22, Jeff Layton wrote:
>>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>>> logged hung nfsd tasks. The initial effect was that some clients
>>>>> could no longer access the NFS server. This got worse and worse
>>>>> (probably as more nfsd threads got blocked) and we had to restart
>>>>> the server. Restarting the server also failed as the NFS server
>>>>> service could no longer be stopped.
>>>>>
>>>>> The initial kernel we noticed this behavior on was
>>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
>>>>> happened again on this newer kernel version:
>> 419 is fairly up to date with nfsd changes. There are some known bugs
>> around callbacks, and there is a draft MR in flight to fix it.
>>
>> What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ? If we
>> can bracket the changes around a particular version, then that might
>> help identify the problem.
>>
>>>>> [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>>>>    [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>    [Mon Mar 11 14:10:08 2024]task:nfsd             state:D stack:0
>>>>>     pid:8865  ppid:2      flags:0x00004000
>>>>>    [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>    [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>    [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>    [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>    [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>    [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>    [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>    [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>>    [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120
>>>>> [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  ?
>>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>> [sunrpc]
>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>    [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>    [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>    [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
>>>>> more than 122 seconds.
>>>>>    [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>    [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>    [Mon Mar 11 14:10:08 2024]task:nfsd             state:D stack:0
>>>>>     pid:8866  ppid:2      flags:0x00004000
>>>>>    [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>    [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>    [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>    [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>    [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>    [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>    [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>>    [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>    [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>>    [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240
>>>>> [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>> [sunrpc]
>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>    [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>    [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>    [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>
>>>> The above threads are trying to flush the workqueue, so that probably
>>>> means that they are stuck waiting on a workqueue job to finish.
>>>>>    The above is repeated a few times, and then this warning is
>>>>> also logged:
>>>>>    [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
>>>>>    [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
>>>>> dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
>>>>> iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
>>>>>    nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
>>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>>    ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>>> intel_rapl_common intel_uncore_frequency
>>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>>    _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>>>>> dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
>>>>> dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>>    ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi
>>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>>    nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
>>>>> mbcache jbd2 sd_mod sg lpfc
>>>>>    [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics
>>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
>>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>>    el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>>> dm_region_hash dm_log dm_mod
>>>>>    [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>>    [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
>>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>>    [Mon Mar 11 14:12:05 2024] RIP:
>>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48
>>>>> 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
>>>>> 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>>    02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>>    [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
>>>>> 00010246
>>>>>    [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>>    [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>>    [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>>    [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>>    [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>>    [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
>>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>>    [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>>>> 0000000080050033
>>>>>    [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>>    [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>    [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>>    [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>>    [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>>    [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>>    [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>    [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>    [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>>    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>> [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>>    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>> [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>>    [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>>    [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>>    [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>>    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>> [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>>    [Mon Mar 11 14:12:05 2024]  ?
>>>>> nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830
>>>>> [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
>>>>> [sunrpc]
>>>>>    [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>    [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>>    [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>>    [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>>    [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>>    [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
>>>> This is probably this WARN in nfsd_break_one_deleg:
>>>>
>>>> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>>>>
>>>> It means that a delegation break callback to the client couldn't be
>>>> queued to the workqueue, and so it didn't run.
>>>>
>>>>> Could this be the same issue as described
>>>>> here:https://lore.kernel.org/linux-nfs/[email protected]/
>>>>> ?
>>>> Yes, most likely the same problem.
>>> If I read that thread correctly, this issue was introduced between
>>> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
>>> backported these changes, or were we hitting some other bug with that
>>> version? It seems the 6.1.x kernel is not affected? If so, that
>>> would be
>>> the recommended kernel to run?
>> Anything is possible. We have to identify the problem first.
>>>>> As described in that thread, I've tried to obtain the requested
>>>>> information.
>>>>>
>>>>> Is it possible this is the issue that was fixed by the patches
>>>>> described
>>>>> here?https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/
>>>>>
>>>> Doubtful. Those are targeted toward a different set of issues.
>>>>
>>>> If you're willing, I do have some patches queued up for CentOS here
>>>> that
>>>> fix some backchannel problems that could be related. I'm mainly
>>>> waiting
>>>> on Chuck to send these to Linus and then we'll likely merge them into
>>>> CentOS soon afterward:
>>>>
>>>> https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689
>>>>
>>>>
>>> If you can send me a patch file, I can rebuild the C9S kernel with that
>>> patch and run it. It can take a while for the bug to trigger as I
>>> believe it seems to be very workload dependent (we were running very
>>> stable for months and now hit this bug every other week).
>>>
>>>
>> It's probably simpler to just pull down the build artifacts for that MR.
>> You have to drill down through the CI for it, but they are here:
>>
>> https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/1194300175/publish_x86_64/6278921877/artifacts/
>>
>>
>> There's even a repo file you can install on the box to pull them down.
>
> We installed this kernel on the server 3 days ago. Today, a user
> informed us that their screen was black after logging in. Similar to
> other occurrences of this issue, the mount command on the client was
> hung. But in contrast to the other times, there were no messages in
> the logs kernel logs on the server. Even restarting the client does
> not resolve the issue.
>
> Something still seems to be wrong on the server though. When I look at
> the directories under /proc/fs/nfsd/clients, there's still a directory
> for the specific client, even though it's no longer running:
>
> # cat 155/info
> clientid: 0xc8edb7f65f4a9ad
> address: "10.87.31.152:819"
> status: confirmed
> seconds from last renew: 33163
> name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
> minor version: 2
> Implementation domain: "kernel.org"
> Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
> PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
> Implementation time: [0, 0]
> callback state: DOWN
> callback address: 10.87.31.152:0
>
The nfsdclnts command for this client shows the following delegations:

# nfsdclnts -f 155/states -t all
Inode number | Type   | Access | Deny | ip address            | Filename
169346743    | open   | r-     | --   | 10.87.31.152:819      |
disconnected dentry
169346743    | deleg  | r      |      | 10.87.31.152:819      |
disconnected dentry
169346746    | open   | r-     | --   | 10.87.31.152:819      |
disconnected dentry
169346746    | deleg  | r      |      | 10.87.31.152:819      |
disconnected dentry

I see a lot of recent patches regarding directory delegations. Could
this be related to this?

Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory delegation?

Regards,

Rik


>
> The system seems to have identified that the client is no longer
> reachable, but the client entry does not go away. When a mount was
> hanging on the client, there would be two directories in clients for
> the same client. Killing the mount command clears up the second entry.
>
> Even after running conntrack -D on the server to remove the tcp
> connection from the conntrack table, the entry doesn't go away and the
> client still can not mount anything from the server.
>
> A tcpdump on the client while a mount was running logged the following
> messages over and over again:
>
> request:
>
> Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024 bits)
> Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
> ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
> Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
> Transmission Control Protocol, Src Port: 932, Dst Port: 2049, Seq: 1,
> Ack: 1, Len: 312
> Remote Procedure Call, Type:Call XID:0x1d3220c4
> Network File System
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     GSS Data, Ops(1): CREATE_SESSION
>         Length: 152
>         GSS Sequence Number: 76
>         Tag: <EMPTY>
>         minorversion: 2
>         Operations (count: 1): CREATE_SESSION
>         [Main Opcode: CREATE_SESSION (43)]
>     GSS Checksum:
> 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
>         GSS Token Length: 40
>         GSS-API Generic Security Service Application Program Interface
>             krb5_blob:
> 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
>
> response:
>
> Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648 bits)
> Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
> HP_19:7d:4b (e0:73:e7:19:7d:4b)
> Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
> Transmission Control Protocol, Src Port: 2049, Dst Port: 932, Seq: 1,
> Ack: 313, Len: 140
> Remote Procedure Call, Type:Reply XID:0x1d3220c4
> Network File System
>     [Program Version: 4]
>     [V4 Procedure: COMPOUND (1)]
>     GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
>         Length: 24
>         GSS Sequence Number: 76
>         Status: NFS4ERR_DELAY (10008)
>         Tag: <EMPTY>
>         Operations (count: 1)
>         [Main Opcode: CREATE_SESSION (43)]
>     GSS Checksum:
> 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
>         GSS Token Length: 40
>         GSS-API Generic Security Service Application Program Interface
>             krb5_blob:
> 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
>
> I was hoping that giving the client a different IP address would
> resolve the issue for this client, but it didn't. Even though the
> client had a new IP address (hostname was kept the same), it failed to
> mount anything from the server.
>
> I created another dump of the workqueues and worker pools on the server:
>
> [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
> [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> active=1/256 refcnt=2
> [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
> [drm_kms_helper]
> [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient: flags=0x80
> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> active=1/256 refcnt=2
> [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
> [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> active=1/256 refcnt=3
> [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu BAR(362)
> [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
> [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0 nice=-20
> active=1/256 refcnt=2
> [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
>
>
> In contrast to last time, it doesn't show anything regarding nfs this
> time.
>
> I also tried the suggestion from Dai Ngo (echo 3 >
> /proc/sys/vm/drop_caches), but that didn't seem to make any difference.
>
> We haven't restarted the server yet as it seems the impact seems to
> affect fewer clients that before. Is there anything we can run on the
> server to further debug this?
>
> In the past, the issue seemed to deteriorate rapidly and resulted in
> issues for almost all clients after about 20 minutes. This time the
> impact seems to be less, but it's not gone.
>
> How can we force the NFS server to forget about a specific client? I
> haven't tried to restart the nfs service yet as I'm afraid it will
> fail to stop as before.
>
>
> Regards,
>
> Rik
>
--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>


2024-03-18 21:54:10

by Jeffrey Layton

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
> Hi,
>
> On 3/18/24 21:21, Rik Theys wrote:
> > Hi Jeff,
> >
> > On 3/12/24 13:47, Jeff Layton wrote:
> > > On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
> > > > Hi Jeff,
> > > >
> > > > On 3/12/24 12:22, Jeff Layton wrote:
> > > > > On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
> > > > > > Since a few weeks our Rocky Linux 9 NFS server has periodically
> > > > > > logged hung nfsd tasks. The initial effect was that some clients
> > > > > > could no longer access the NFS server. This got worse and worse
> > > > > > (probably as more nfsd threads got blocked) and we had to restart
> > > > > > the server. Restarting the server also failed as the NFS server
> > > > > > service could no longer be stopped.
> > > > > >
> > > > > > The initial kernel we noticed this behavior on was
> > > > > > kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
> > > > > > kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
> > > > > > happened again on this newer kernel version:
> > > 419 is fairly up to date with nfsd changes. There are some known bugs
> > > around callbacks, and there is a draft MR in flight to fix it.
> > >
> > > What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ? If we
> > > can bracket the changes around a particular version, then that might
> > > help identify the problem.
> > >
> > > > > > [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
> > > > > >    [Mon Mar 11 14:10:08 2024] "echo 0 >
> > > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > >    [Mon Mar 11 14:10:08 2024]task:nfsd             state:D stack:0
> > > > > >     pid:8865  ppid:2      flags:0x00004000
> > > > > >    [Mon Mar 11 14:10:08 2024] Call Trace:
> > > > > >    [Mon Mar 11 14:10:08 2024]  <TASK>
> > > > > >    [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
> > > > > >    [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
> > > > > >    [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
> > > > > >    [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
> > > > > >    [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
> > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120
> > > > > > [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  ?
> > > > > > nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
> > > > > > [sunrpc]
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
> > > > > >    [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
> > > > > >    [Mon Mar 11 14:10:08 2024]  </TASK>
> > > > > >    [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
> > > > > > more than 122 seconds.
> > > > > >    [Mon Mar 11 14:10:08 2024]       Not tainted
> > > > > > 5.14.0-419.el9.x86_64 #1
> > > > > >    [Mon Mar 11 14:10:08 2024] "echo 0 >
> > > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > >    [Mon Mar 11 14:10:08 2024]task:nfsd             state:D stack:0
> > > > > >     pid:8866  ppid:2      flags:0x00004000
> > > > > >    [Mon Mar 11 14:10:08 2024] Call Trace:
> > > > > >    [Mon Mar 11 14:10:08 2024]  <TASK>
> > > > > >    [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
> > > > > >    [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
> > > > > >    [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
> > > > > >    [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
> > > > > >    [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
> > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240
> > > > > > [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
> > > > > > [sunrpc]
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > >    [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
> > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
> > > > > >    [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
> > > > > >    [Mon Mar 11 14:10:08 2024]  </TASK>
> > > > > >
> > > > > The above threads are trying to flush the workqueue, so that probably
> > > > > means that they are stuck waiting on a workqueue job to finish.
> > > > > >    The above is repeated a few times, and then this warning is
> > > > > > also logged:
> > > > > >    [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
> > > > > >    [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
> > > > > > fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
> > > > > > dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
> > > > > > iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
> > > > > >    nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
> > > > > > nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
> > > > > > fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
> > > > > >    ibcrc32c dm_service_time dm_multipath intel_rapl_msr
> > > > > > intel_rapl_common intel_uncore_frequency
> > > > > > intel_uncore_frequency_common isst_if_common skx_edac nfit
> > > > > > libnvdimm ipmi_ssif x86_pkg_temp
> > > > > >    _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
> > > > > > dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
> > > > > > dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
> > > > > >    ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi
> > > > > > mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
> > > > > > ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
> > > > > >    nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
> > > > > > mbcache jbd2 sd_mod sg lpfc
> > > > > >    [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics
> > > > > > crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
> > > > > > ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
> > > > > >    el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
> > > > > > dm_region_hash dm_log dm_mod
> > > > > >    [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
> > > > > > tainted 5.14.0-419.el9.x86_64 #1
> > > > > >    [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
> > > > > > R740/00WGD1, BIOS 2.20.1 09/13/2023
> > > > > >    [Mon Mar 11 14:12:05 2024] RIP:
> > > > > > 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48
> > > > > > 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
> > > > > > 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
> > > > > >    02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
> > > > > >    [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
> > > > > > 00010246
> > > > > >    [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
> > > > > > ffff8ada51930900 RCX: 0000000000000024
> > > > > >    [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
> > > > > > ffff8ad582933c00 RDI: 0000000000002000
> > > > > >    [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
> > > > > > ffff9929e0bb7b48 R09: 0000000000000000
> > > > > >    [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
> > > > > > 0000000000000000 R12: ffff8ad6f497c360
> > > > > >    [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
> > > > > > ffff8ae5942e0b10 R15: ffff8ad6f497c360
> > > > > >    [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
> > > > > > GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
> > > > > >    [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
> > > > > > 0000000080050033
> > > > > >    [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
> > > > > > 00000018e58de006 CR4: 00000000007706e0
> > > > > >    [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
> > > > > > 0000000000000000 DR2: 0000000000000000
> > > > > >    [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
> > > > > > 00000000fffe0ff0 DR7: 0000000000000400
> > > > > >    [Mon Mar 11 14:12:05 2024] PKRU: 55555554
> > > > > >    [Mon Mar 11 14:12:05 2024] Call Trace:
> > > > > >    [Mon Mar 11 14:12:05 2024]  <TASK>
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
> > > > > > [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
> > > > > > [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
> > > > > > [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
> > > > > >    [Mon Mar 11 14:12:05 2024]  ?
> > > > > > nfsd_file_lookup_locked+0x117/0x160 [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
> > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830
> > > > > > [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
> > > > > > [sunrpc]
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > >    [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
> > > > > >    [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
> > > > > >    [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
> > > > > >    [Mon Mar 11 14:12:05 2024]  </TASK>
> > > > > >    [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
> > > > > This is probably this WARN in nfsd_break_one_deleg:
> > > > >
> > > > > WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
> > > > >
> > > > > It means that a delegation break callback to the client couldn't be
> > > > > queued to the workqueue, and so it didn't run.
> > > > >
> > > > > > Could this be the same issue as described
> > > > > > here:https://lore.kernel.org/linux-nfs/[email protected]/
> > > > > > ?
> > > > > Yes, most likely the same problem.
> > > > If I read that thread correctly, this issue was introduced between
> > > > 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
> > > > backported these changes, or were we hitting some other bug with that
> > > > version? It seems the 6.1.x kernel is not affected? If so, that
> > > > would be
> > > > the recommended kernel to run?
> > > Anything is possible. We have to identify the problem first.
> > > > > > As described in that thread, I've tried to obtain the requested
> > > > > > information.
> > > > > >
> > > > > > Is it possible this is the issue that was fixed by the patches
> > > > > > described
> > > > > > here?https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/
> > > > > >
> > > > > Doubtful. Those are targeted toward a different set of issues.
> > > > >
> > > > > If you're willing, I do have some patches queued up for CentOS here
> > > > > that
> > > > > fix some backchannel problems that could be related. I'm mainly
> > > > > waiting
> > > > > on Chuck to send these to Linus and then we'll likely merge them into
> > > > > CentOS soon afterward:
> > > > >
> > > > > https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689
> > > > >
> > > > >
> > > > If you can send me a patch file, I can rebuild the C9S kernel with that
> > > > patch and run it. It can take a while for the bug to trigger as I
> > > > believe it seems to be very workload dependent (we were running very
> > > > stable for months and now hit this bug every other week).
> > > >
> > > >
> > > It's probably simpler to just pull down the build artifacts for that MR.
> > > You have to drill down through the CI for it, but they are here:
> > >
> > > https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/1194300175/publish_x86_64/6278921877/artifacts/
> > >
> > >
> > > There's even a repo file you can install on the box to pull them down.
> >
> > We installed this kernel on the server 3 days ago. Today, a user
> > informed us that their screen was black after logging in. Similar to
> > other occurrences of this issue, the mount command on the client was
> > hung. But in contrast to the other times, there were no messages in
> > the logs kernel logs on the server. Even restarting the client does
> > not resolve the issue.


Ok, so you rebooted the client and it's still unable to mount? That
sounds like a server problem if so.

Are both client and server running the same kernel?

> >
> > Something still seems to be wrong on the server though. When I look at
> > the directories under /proc/fs/nfsd/clients, there's still a directory
> > for the specific client, even though it's no longer running:
> >
> > # cat 155/info
> > clientid: 0xc8edb7f65f4a9ad
> > address: "10.87.31.152:819"
> > status: confirmed
> > seconds from last renew: 33163
> > name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
> > minor version: 2
> > Implementation domain: "kernel.org"
> > Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
> > PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
> > Implementation time: [0, 0]
> > callback state: DOWN
> > callback address: 10.87.31.152:0
> >

If you just shut down the client, the server won't immediately purge its
record. In fact, assuming you're running the same kernel on the server,
it won't purge the client record until there is a conflicting request
for its state.


> The nfsdclnts command for this client shows the following delegations:
>
> # nfsdclnts -f 155/states -t all
> Inode number | Type   | Access | Deny | ip address            | Filename
> 169346743    | open   | r-     | --   | 10.87.31.152:819      |
> disconnected dentry
> 169346743    | deleg  | r      |      | 10.87.31.152:819      |
> disconnected dentry
> 169346746    | open   | r-     | --   | 10.87.31.152:819      |
> disconnected dentry
> 169346746    | deleg  | r      |      | 10.87.31.152:819      |
> disconnected dentry
>
> I see a lot of recent patches regarding directory delegations. Could
> this be related to this?
>
> Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory delegation?
>
>

No. Directory delegations are a new feature that's still under
development. They use some of the same machinery as file delegations,
but they wouldn't be a factor here.

>
> >
> > The system seems to have identified that the client is no longer
> > reachable, but the client entry does not go away. When a mount was
> > hanging on the client, there would be two directories in clients for
> > the same client. Killing the mount command clears up the second entry.
> >
> > Even after running conntrack -D on the server to remove the tcp
> > connection from the conntrack table, the entry doesn't go away and the
> > client still can not mount anything from the server.
> >
> > A tcpdump on the client while a mount was running logged the following
> > messages over and over again:
> >
> > request:
> >
> > Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024 bits)
> > Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
> > ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
> > Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
> > Transmission Control Protocol, Src Port: 932, Dst Port: 2049, Seq: 1,
> > Ack: 1, Len: 312
> > Remote Procedure Call, Type:Call XID:0x1d3220c4
> > Network File System
> >     [Program Version: 4]
> >     [V4 Procedure: COMPOUND (1)]
> >     GSS Data, Ops(1): CREATE_SESSION
> >         Length: 152
> >         GSS Sequence Number: 76
> >         Tag: <EMPTY>
> >         minorversion: 2
> >         Operations (count: 1): CREATE_SESSION
> >         [Main Opcode: CREATE_SESSION (43)]
> >     GSS Checksum:
> > 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
> >         GSS Token Length: 40
> >         GSS-API Generic Security Service Application Program Interface
> >             krb5_blob:
> > 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
> >
> > response:
> >
> > Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648 bits)
> > Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
> > HP_19:7d:4b (e0:73:e7:19:7d:4b)
> > Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
> > Transmission Control Protocol, Src Port: 2049, Dst Port: 932, Seq: 1,
> > Ack: 313, Len: 140
> > Remote Procedure Call, Type:Reply XID:0x1d3220c4
> > Network File System
> >     [Program Version: 4]
> >     [V4 Procedure: COMPOUND (1)]
> >     GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
> >         Length: 24
> >         GSS Sequence Number: 76
> >         Status: NFS4ERR_DELAY (10008)
> >         Tag: <EMPTY>
> >         Operations (count: 1)
> >         [Main Opcode: CREATE_SESSION (43)]
> >     GSS Checksum:
> > 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
> >         GSS Token Length: 40
> >         GSS-API Generic Security Service Application Program Interface
> >             krb5_blob:
> > 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
> >
> > I was hoping that giving the client a different IP address would
> > resolve the issue for this client, but it didn't. Even though the
> > client had a new IP address (hostname was kept the same), it failed to
> > mount anything from the server.
> >

Changing the IP address won't help. The client is probably using the
same long-form client id as before, so the server still identifies the
client even with the address change.

Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
The client is expected to back off and retry, so if the server keeps
returning that repeatedly, then a hung mount command is expected.

The question is why the server would keep returning DELAY. A lot of
different problems ranging from memory allocation issues to protocol
problems can result in that error. You may want to check the NFS server
and see if anything was logged there.

This is on a CREATE_SESSION call, so I wonder if the record held by the
(courteous) server is somehow blocking the attempt to reestablish the
session?

Do you have a way to reproduce this? Since this is a centos kernel, you
could follow the page here to open a bug:

https://wiki.centos.org/ReportBugs.html


> > I created another dump of the workqueues and worker pools on the server:
> >
> > [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
> > [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
> > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> > active=1/256 refcnt=2
> > [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
> > [drm_kms_helper]
> > [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient: flags=0x80
> > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> > active=1/256 refcnt=2
> > [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
> > [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
> > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> > active=1/256 refcnt=3
> > [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu BAR(362)
> > [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
> > [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0 nice=-20
> > active=1/256 refcnt=2
> > [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
> >
> >
> > In contrast to last time, it doesn't show anything regarding nfs this
> > time.
> >
> > I also tried the suggestion from Dai Ngo (echo 3 >
> > /proc/sys/vm/drop_caches), but that didn't seem to make any difference.
> >
> > We haven't restarted the server yet as it seems the impact seems to
> > affect fewer clients that before. Is there anything we can run on the
> > server to further debug this?
> >
> > In the past, the issue seemed to deteriorate rapidly and resulted in
> > issues for almost all clients after about 20 minutes. This time the
> > impact seems to be less, but it's not gone.
> >
> > How can we force the NFS server to forget about a specific client? I
> > haven't tried to restart the nfs service yet as I'm afraid it will
> > fail to stop as before.
> >

Not with that kernel. There are some new administrative interfaces that
might allow that in the future, but they were just merged upstream and
aren't in that kernel.

--
Jeff Layton <[email protected]>

2024-03-19 07:58:35

by Rik Theys

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

Hi,

On 3/18/24 22:54, Jeff Layton wrote:
> On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
>> Hi,
>>
>> On 3/18/24 21:21, Rik Theys wrote:
>>> Hi Jeff,
>>>
>>> On 3/12/24 13:47, Jeff Layton wrote:
>>>> On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
>>>>> Hi Jeff,
>>>>>
>>>>> On 3/12/24 12:22, Jeff Layton wrote:
>>>>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>>>>> logged hung nfsd tasks. The initial effect was that some clients
>>>>>>> could no longer access the NFS server. This got worse and worse
>>>>>>> (probably as more nfsd threads got blocked) and we had to restart
>>>>>>> the server. Restarting the server also failed as the NFS server
>>>>>>> service could no longer be stopped.
>>>>>>>
>>>>>>> The initial kernel we noticed this behavior on was
>>>>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
>>>>>>> happened again on this newer kernel version:
>>>> 419 is fairly up to date with nfsd changes. There are some known bugs
>>>> around callbacks, and there is a draft MR in flight to fix it.
>>>>
>>>> What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ? If we
>>>> can bracket the changes around a particular version, then that might
>>>> help identify the problem.
>>>>
>>>>>>> [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>>>>>>    [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>    [Mon Mar 11 14:10:08 2024]task:nfsd             state:D stack:0
>>>>>>>     pid:8865  ppid:2      flags:0x00004000
>>>>>>>    [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>    [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>    [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>    [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>    [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>    [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>>>>    [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120
>>>>>>> [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ?
>>>>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>> [sunrpc]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>    [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>    [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
>>>>>>> more than 122 seconds.
>>>>>>>    [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>    [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>    [Mon Mar 11 14:10:08 2024]task:nfsd             state:D stack:0
>>>>>>>     pid:8866  ppid:2      flags:0x00004000
>>>>>>>    [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>    [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>    [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>    [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>    [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>    [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>>>>    [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240
>>>>>>> [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>> [sunrpc]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>    [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>    [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>    [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>
>>>>>> The above threads are trying to flush the workqueue, so that probably
>>>>>> means that they are stuck waiting on a workqueue job to finish.
>>>>>>>    The above is repeated a few times, and then this warning is
>>>>>>> also logged:
>>>>>>>    [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
>>>>>>>    [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
>>>>>>> dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
>>>>>>> iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
>>>>>>>    nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
>>>>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>>>>    ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>>>>> intel_rapl_common intel_uncore_frequency
>>>>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>>>>    _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>>>>>>> dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
>>>>>>> dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>>>>    ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi
>>>>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>>>>    nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
>>>>>>> mbcache jbd2 sd_mod sg lpfc
>>>>>>>    [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics
>>>>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
>>>>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>>>>    el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>>>>> dm_region_hash dm_log dm_mod
>>>>>>>    [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>>>>    [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
>>>>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>>>>    [Mon Mar 11 14:12:05 2024] RIP:
>>>>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48
>>>>>>> 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
>>>>>>> 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>>>>    02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>>>>    [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
>>>>>>> 00010246
>>>>>>>    [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>>>>    [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>>>>    [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>>>>    [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>>>>    [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>>>>    [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
>>>>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>>>>    [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>>>>>> 0000000080050033
>>>>>>>    [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>>>>    [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>>    [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>    [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>>>>    [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>>>>    [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>> [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>> [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>> [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ?
>>>>>>> nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830
>>>>>>> [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
>>>>>>> [sunrpc]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>    [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>    [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>>>>    [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>>>>    [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
>>>>>> This is probably this WARN in nfsd_break_one_deleg:
>>>>>>
>>>>>> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>>>>>>
>>>>>> It means that a delegation break callback to the client couldn't be
>>>>>> queued to the workqueue, and so it didn't run.
>>>>>>
>>>>>>> Could this be the same issue as described
>>>>>>> here:https://lore.kernel.org/linux-nfs/[email protected]/
>>>>>>> ?
>>>>>> Yes, most likely the same problem.
>>>>> If I read that thread correctly, this issue was introduced between
>>>>> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
>>>>> backported these changes, or were we hitting some other bug with that
>>>>> version? It seems the 6.1.x kernel is not affected? If so, that
>>>>> would be
>>>>> the recommended kernel to run?
>>>> Anything is possible. We have to identify the problem first.
>>>>>>> As described in that thread, I've tried to obtain the requested
>>>>>>> information.
>>>>>>>
>>>>>>> Is it possible this is the issue that was fixed by the patches
>>>>>>> described
>>>>>>> here?https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/
>>>>>>>
>>>>>> Doubtful. Those are targeted toward a different set of issues.
>>>>>>
>>>>>> If you're willing, I do have some patches queued up for CentOS here
>>>>>> that
>>>>>> fix some backchannel problems that could be related. I'm mainly
>>>>>> waiting
>>>>>> on Chuck to send these to Linus and then we'll likely merge them into
>>>>>> CentOS soon afterward:
>>>>>>
>>>>>> https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689
>>>>>>
>>>>>>
>>>>> If you can send me a patch file, I can rebuild the C9S kernel with that
>>>>> patch and run it. It can take a while for the bug to trigger as I
>>>>> believe it seems to be very workload dependent (we were running very
>>>>> stable for months and now hit this bug every other week).
>>>>>
>>>>>
>>>> It's probably simpler to just pull down the build artifacts for that MR.
>>>> You have to drill down through the CI for it, but they are here:
>>>>
>>>> https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/1194300175/publish_x86_64/6278921877/artifacts/
>>>>
>>>>
>>>> There's even a repo file you can install on the box to pull them down.
>>> We installed this kernel on the server 3 days ago. Today, a user
>>> informed us that their screen was black after logging in. Similar to
>>> other occurrences of this issue, the mount command on the client was
>>> hung. But in contrast to the other times, there were no messages in
>>> the logs kernel logs on the server. Even restarting the client does
>>> not resolve the issue.
>
> Ok, so you rebooted the client and it's still unable to mount? That
> sounds like a server problem if so.
>
> Are both client and server running the same kernel?
No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
5.14.0-362.18.1.el9_3.
>
>>> Something still seems to be wrong on the server though. When I look at
>>> the directories under /proc/fs/nfsd/clients, there's still a directory
>>> for the specific client, even though it's no longer running:
>>>
>>> # cat 155/info
>>> clientid: 0xc8edb7f65f4a9ad
>>> address: "10.87.31.152:819"
>>> status: confirmed
>>> seconds from last renew: 33163
>>> name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
>>> minor version: 2
>>> Implementation domain: "kernel.org"
>>> Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
>>> PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
>>> Implementation time: [0, 0]
>>> callback state: DOWN
>>> callback address: 10.87.31.152:0
>>>
> If you just shut down the client, the server won't immediately purge its
> record. In fact, assuming you're running the same kernel on the server,
> it won't purge the client record until there is a conflicting request
> for its state.
Is there a way to force such a conflicting request (to get the client
record to purge)?
>
>
>> The nfsdclnts command for this client shows the following delegations:
>>
>> # nfsdclnts -f 155/states -t all
>> Inode number | Type   | Access | Deny | ip address            | Filename
>> 169346743    | open   | r-     | --   | 10.87.31.152:819      |
>> disconnected dentry
>> 169346743    | deleg  | r      |      | 10.87.31.152:819      |
>> disconnected dentry
>> 169346746    | open   | r-     | --   | 10.87.31.152:819      |
>> disconnected dentry
>> 169346746    | deleg  | r      |      | 10.87.31.152:819      |
>> disconnected dentry
>>
>> I see a lot of recent patches regarding directory delegations. Could
>> this be related to this?
>>
>> Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory delegation?
>>
>>
> No. Directory delegations are a new feature that's still under
> development. They use some of the same machinery as file delegations,
> but they wouldn't be a factor here.
>
>>> The system seems to have identified that the client is no longer
>>> reachable, but the client entry does not go away. When a mount was
>>> hanging on the client, there would be two directories in clients for
>>> the same client. Killing the mount command clears up the second entry.
>>>
>>> Even after running conntrack -D on the server to remove the tcp
>>> connection from the conntrack table, the entry doesn't go away and the
>>> client still can not mount anything from the server.
>>>
>>> A tcpdump on the client while a mount was running logged the following
>>> messages over and over again:
>>>
>>> request:
>>>
>>> Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024 bits)
>>> Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
>>> ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
>>> Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
>>> Transmission Control Protocol, Src Port: 932, Dst Port: 2049, Seq: 1,
>>> Ack: 1, Len: 312
>>> Remote Procedure Call, Type:Call XID:0x1d3220c4
>>> Network File System
>>>     [Program Version: 4]
>>>     [V4 Procedure: COMPOUND (1)]
>>>     GSS Data, Ops(1): CREATE_SESSION
>>>         Length: 152
>>>         GSS Sequence Number: 76
>>>         Tag: <EMPTY>
>>>         minorversion: 2
>>>         Operations (count: 1): CREATE_SESSION
>>>         [Main Opcode: CREATE_SESSION (43)]
>>>     GSS Checksum:
>>> 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
>>>         GSS Token Length: 40
>>>         GSS-API Generic Security Service Application Program Interface
>>>             krb5_blob:
>>> 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
>>>
>>> response:
>>>
>>> Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648 bits)
>>> Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
>>> HP_19:7d:4b (e0:73:e7:19:7d:4b)
>>> Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
>>> Transmission Control Protocol, Src Port: 2049, Dst Port: 932, Seq: 1,
>>> Ack: 313, Len: 140
>>> Remote Procedure Call, Type:Reply XID:0x1d3220c4
>>> Network File System
>>>     [Program Version: 4]
>>>     [V4 Procedure: COMPOUND (1)]
>>>     GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
>>>         Length: 24
>>>         GSS Sequence Number: 76
>>>         Status: NFS4ERR_DELAY (10008)
>>>         Tag: <EMPTY>
>>>         Operations (count: 1)
>>>         [Main Opcode: CREATE_SESSION (43)]
>>>     GSS Checksum:
>>> 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
>>>         GSS Token Length: 40
>>>         GSS-API Generic Security Service Application Program Interface
>>>             krb5_blob:
>>> 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
>>>
>>> I was hoping that giving the client a different IP address would
>>> resolve the issue for this client, but it didn't. Even though the
>>> client had a new IP address (hostname was kept the same), it failed to
>>> mount anything from the server.
>>>
> Changing the IP address won't help. The client is probably using the
> same long-form client id as before, so the server still identifies the
> client even with the address change.
How is the client id determined? Will changing the hostname of the
client trigger a change of the client id?
>
> Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
> The client is expected to back off and retry, so if the server keeps
> returning that repeatedly, then a hung mount command is expected.
>
> The question is why the server would keep returning DELAY. A lot of
> different problems ranging from memory allocation issues to protocol
> problems can result in that error. You may want to check the NFS server
> and see if anything was logged there.
There are no messages in the system logs that indicate any sort of
memory issue. We also increased the min_kbytes_free sysctl to 2G on the
server before we restarted it with the newer kernel.
>
> This is on a CREATE_SESSION call, so I wonder if the record held by the
> (courteous) server is somehow blocking the attempt to reestablish the
> session?
>
> Do you have a way to reproduce this? Since this is a centos kernel, you
> could follow the page here to open a bug:

Unfortunately we haven't found a reliable way to reproduce it. But we do
seem to trigger it more and more lately.

Regards,

Rik

>
> https://wiki.centos.org/ReportBugs.html
>
>
>>> I created another dump of the workqueues and worker pools on the server:
>>>
>>> [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
>>> [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>> active=1/256 refcnt=2
>>> [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
>>> [drm_kms_helper]
>>> [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient: flags=0x80
>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>> active=1/256 refcnt=2
>>> [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
>>> [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>> active=1/256 refcnt=3
>>> [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu BAR(362)
>>> [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
>>> [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0 nice=-20
>>> active=1/256 refcnt=2
>>> [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
>>>
>>>
>>> In contrast to last time, it doesn't show anything regarding nfs this
>>> time.
>>>
>>> I also tried the suggestion from Dai Ngo (echo 3 >
>>> /proc/sys/vm/drop_caches), but that didn't seem to make any difference.
>>>
>>> We haven't restarted the server yet as it seems the impact seems to
>>> affect fewer clients that before. Is there anything we can run on the
>>> server to further debug this?
>>>
>>> In the past, the issue seemed to deteriorate rapidly and resulted in
>>> issues for almost all clients after about 20 minutes. This time the
>>> impact seems to be less, but it's not gone.
>>>
>>> How can we force the NFS server to forget about a specific client? I
>>> haven't tried to restart the nfs service yet as I'm afraid it will
>>> fail to stop as before.
>>>
> Not with that kernel. There are some new administrative interfaces that
> might allow that in the future, but they were just merged upstream and
> aren't in that kernel.
>
> --
> Jeff Layton <[email protected]>

--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>


2024-03-19 10:39:38

by Jeffrey Layton

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

On Tue, 2024-03-19 at 08:58 +0100, Rik Theys wrote:
> Hi,
>
> On 3/18/24 22:54, Jeff Layton wrote:
> > On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
> > > Hi,
> > >
> > > On 3/18/24 21:21, Rik Theys wrote:
> > > > Hi Jeff,
> > > >
> > > > On 3/12/24 13:47, Jeff Layton wrote:
> > > > > On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
> > > > > > Hi Jeff,
> > > > > >
> > > > > > On 3/12/24 12:22, Jeff Layton wrote:
> > > > > > > On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
> > > > > > > > Since a few weeks our Rocky Linux 9 NFS server has periodically
> > > > > > > > logged hung nfsd tasks. The initial effect was that some clients
> > > > > > > > could no longer access the NFS server. This got worse and worse
> > > > > > > > (probably as more nfsd threads got blocked) and we had to restart
> > > > > > > > the server. Restarting the server also failed as the NFS server
> > > > > > > > service could no longer be stopped.
> > > > > > > >
> > > > > > > > The initial kernel we noticed this behavior on was
> > > > > > > > kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
> > > > > > > > kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
> > > > > > > > happened again on this newer kernel version:
> > > > > 419 is fairly up to date with nfsd changes. There are some known bugs
> > > > > around callbacks, and there is a draft MR in flight to fix it.
> > > > >
> > > > > What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ? If we
> > > > > can bracket the changes around a particular version, then that might
> > > > > help identify the problem.
> > > > >
> > > > > > > > [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
> > > > > > > >    [Mon Mar 11 14:10:08 2024] "echo 0 >
> > > > > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > > >    [Mon Mar 11 14:10:08 2024]task:nfsd             state:D stack:0
> > > > > > > >     pid:8865  ppid:2      flags:0x00004000
> > > > > > > >    [Mon Mar 11 14:10:08 2024] Call Trace:
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  <TASK>
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120
> > > > > > > > [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
> > > > > > > > [sunrpc]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  </TASK>
> > > > > > > >    [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
> > > > > > > > more than 122 seconds.
> > > > > > > >    [Mon Mar 11 14:10:08 2024]       Not tainted
> > > > > > > > 5.14.0-419.el9.x86_64 #1
> > > > > > > >    [Mon Mar 11 14:10:08 2024] "echo 0 >
> > > > > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > > >    [Mon Mar 11 14:10:08 2024]task:nfsd             state:D stack:0
> > > > > > > >     pid:8866  ppid:2      flags:0x00004000
> > > > > > > >    [Mon Mar 11 14:10:08 2024] Call Trace:
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  <TASK>
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240
> > > > > > > > [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
> > > > > > > > [sunrpc]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
> > > > > > > >    [Mon Mar 11 14:10:08 2024]  </TASK>
> > > > > > > >
> > > > > > > The above threads are trying to flush the workqueue, so that probably
> > > > > > > means that they are stuck waiting on a workqueue job to finish.
> > > > > > > >    The above is repeated a few times, and then this warning is
> > > > > > > > also logged:
> > > > > > > >    [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
> > > > > > > >    [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
> > > > > > > > fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
> > > > > > > > dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
> > > > > > > > iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
> > > > > > > >    nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
> > > > > > > > nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
> > > > > > > > fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
> > > > > > > >    ibcrc32c dm_service_time dm_multipath intel_rapl_msr
> > > > > > > > intel_rapl_common intel_uncore_frequency
> > > > > > > > intel_uncore_frequency_common isst_if_common skx_edac nfit
> > > > > > > > libnvdimm ipmi_ssif x86_pkg_temp
> > > > > > > >    _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
> > > > > > > > dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
> > > > > > > > dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
> > > > > > > >    ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi
> > > > > > > > mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
> > > > > > > > ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
> > > > > > > >    nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
> > > > > > > > mbcache jbd2 sd_mod sg lpfc
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics
> > > > > > > > crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
> > > > > > > > ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
> > > > > > > >    el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
> > > > > > > > dm_region_hash dm_log dm_mod
> > > > > > > >    [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
> > > > > > > > tainted 5.14.0-419.el9.x86_64 #1
> > > > > > > >    [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
> > > > > > > > R740/00WGD1, BIOS 2.20.1 09/13/2023
> > > > > > > >    [Mon Mar 11 14:12:05 2024] RIP:
> > > > > > > > 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48
> > > > > > > > 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
> > > > > > > > 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
> > > > > > > >    02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
> > > > > > > >    [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
> > > > > > > > 00010246
> > > > > > > >    [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
> > > > > > > > ffff8ada51930900 RCX: 0000000000000024
> > > > > > > >    [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
> > > > > > > > ffff8ad582933c00 RDI: 0000000000002000
> > > > > > > >    [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
> > > > > > > > ffff9929e0bb7b48 R09: 0000000000000000
> > > > > > > >    [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
> > > > > > > > 0000000000000000 R12: ffff8ad6f497c360
> > > > > > > >    [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
> > > > > > > > ffff8ae5942e0b10 R15: ffff8ad6f497c360
> > > > > > > >    [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
> > > > > > > > GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
> > > > > > > >    [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
> > > > > > > > 0000000080050033
> > > > > > > >    [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
> > > > > > > > 00000018e58de006 CR4: 00000000007706e0
> > > > > > > >    [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
> > > > > > > > 0000000000000000 DR2: 0000000000000000
> > > > > > > >    [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
> > > > > > > > 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > >    [Mon Mar 11 14:12:05 2024] PKRU: 55555554
> > > > > > > >    [Mon Mar 11 14:12:05 2024] Call Trace:
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  <TASK>
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
> > > > > > > > [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
> > > > > > > > [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
> > > > > > > > [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ?
> > > > > > > > nfsd_file_lookup_locked+0x117/0x160 [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830
> > > > > > > > [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
> > > > > > > > [sunrpc]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
> > > > > > > >    [Mon Mar 11 14:12:05 2024]  </TASK>
> > > > > > > >    [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
> > > > > > > This is probably this WARN in nfsd_break_one_deleg:
> > > > > > >
> > > > > > > WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
> > > > > > >
> > > > > > > It means that a delegation break callback to the client couldn't be
> > > > > > > queued to the workqueue, and so it didn't run.
> > > > > > >
> > > > > > > > Could this be the same issue as described
> > > > > > > > here:https://lore.kernel.org/linux-nfs/[email protected]/
> > > > > > > > ?
> > > > > > > Yes, most likely the same problem.
> > > > > > If I read that thread correctly, this issue was introduced between
> > > > > > 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
> > > > > > backported these changes, or were we hitting some other bug with that
> > > > > > version? It seems the 6.1.x kernel is not affected? If so, that
> > > > > > would be
> > > > > > the recommended kernel to run?
> > > > > Anything is possible. We have to identify the problem first.
> > > > > > > > As described in that thread, I've tried to obtain the requested
> > > > > > > > information.
> > > > > > > >
> > > > > > > > Is it possible this is the issue that was fixed by the patches
> > > > > > > > described
> > > > > > > > here?https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/
> > > > > > > >
> > > > > > > Doubtful. Those are targeted toward a different set of issues.
> > > > > > >
> > > > > > > If you're willing, I do have some patches queued up for CentOS here
> > > > > > > that
> > > > > > > fix some backchannel problems that could be related. I'm mainly
> > > > > > > waiting
> > > > > > > on Chuck to send these to Linus and then we'll likely merge them into
> > > > > > > CentOS soon afterward:
> > > > > > >
> > > > > > > https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689
> > > > > > >
> > > > > > >
> > > > > > If you can send me a patch file, I can rebuild the C9S kernel with that
> > > > > > patch and run it. It can take a while for the bug to trigger as I
> > > > > > believe it seems to be very workload dependent (we were running very
> > > > > > stable for months and now hit this bug every other week).
> > > > > >
> > > > > >
> > > > > It's probably simpler to just pull down the build artifacts for that MR.
> > > > > You have to drill down through the CI for it, but they are here:
> > > > >
> > > > > https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/1194300175/publish_x86_64/6278921877/artifacts/
> > > > >
> > > > >
> > > > > There's even a repo file you can install on the box to pull them down.
> > > > We installed this kernel on the server 3 days ago. Today, a user
> > > > informed us that their screen was black after logging in. Similar to
> > > > other occurrences of this issue, the mount command on the client was
> > > > hung. But in contrast to the other times, there were no messages in
> > > > the logs kernel logs on the server. Even restarting the client does
> > > > not resolve the issue.
> >
> > Ok, so you rebooted the client and it's still unable to mount? That
> > sounds like a server problem if so.
> >
> > Are both client and server running the same kernel?
> No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
> 5.14.0-362.18.1.el9_3.

Ok.

> >
> > > > Something still seems to be wrong on the server though. When I look at
> > > > the directories under /proc/fs/nfsd/clients, there's still a directory
> > > > for the specific client, even though it's no longer running:
> > > >
> > > > # cat 155/info
> > > > clientid: 0xc8edb7f65f4a9ad
> > > > address: "10.87.31.152:819"
> > > > status: confirmed
> > > > seconds from last renew: 33163
> > > > name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
> > > > minor version: 2
> > > > Implementation domain: "kernel.org"
> > > > Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
> > > > PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
> > > > Implementation time: [0, 0]
> > > > callback state: DOWN
> > > > callback address: 10.87.31.152:0
> > > >
> > If you just shut down the client, the server won't immediately purge its
> > record. In fact, assuming you're running the same kernel on the server,
> > it won't purge the client record until there is a conflicting request
> > for its state.
> Is there a way to force such a conflicting request (to get the client
> record to purge)?

From the server or a different client, you can try opening the inodes
that the stuck client is holding open. If you open them for write, that
may trigger the server to kick out the old client record.

The problem is that they are disconnected dentries, so finding them to
open via path may be difficult...

> >
> > > The nfsdclnts command for this client shows the following delegations:
> > >
> > > # nfsdclnts -f 155/states -t all
> > > Inode number | Type   | Access | Deny | ip address            | Filename
> > > 169346743    | open   | r-     | --   | 10.87.31.152:819      |
> > > disconnected dentry
> > > 169346743    | deleg  | r      |      | 10.87.31.152:819      |
> > > disconnected dentry
> > > 169346746    | open   | r-     | --   | 10.87.31.152:819      |
> > > disconnected dentry
> > > 169346746    | deleg  | r      |      | 10.87.31.152:819      |
> > > disconnected dentry
> > >
> > > I see a lot of recent patches regarding directory delegations. Could
> > > this be related to this?
> > >
> > > Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory delegation?
> > >
> > >
> > No. Directory delegations are a new feature that's still under
> > development. They use some of the same machinery as file delegations,
> > but they wouldn't be a factor here.
> >
> > > > The system seems to have identified that the client is no longer
> > > > reachable, but the client entry does not go away. When a mount was
> > > > hanging on the client, there would be two directories in clients for
> > > > the same client. Killing the mount command clears up the second entry.
> > > >
> > > > Even after running conntrack -D on the server to remove the tcp
> > > > connection from the conntrack table, the entry doesn't go away and the
> > > > client still can not mount anything from the server.
> > > >
> > > > A tcpdump on the client while a mount was running logged the following
> > > > messages over and over again:
> > > >
> > > > request:
> > > >
> > > > Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024 bits)
> > > > Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
> > > > ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
> > > > Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
> > > > Transmission Control Protocol, Src Port: 932, Dst Port: 2049, Seq: 1,
> > > > Ack: 1, Len: 312
> > > > Remote Procedure Call, Type:Call XID:0x1d3220c4
> > > > Network File System
> > > >     [Program Version: 4]
> > > >     [V4 Procedure: COMPOUND (1)]
> > > >     GSS Data, Ops(1): CREATE_SESSION
> > > >         Length: 152
> > > >         GSS Sequence Number: 76
> > > >         Tag: <EMPTY>
> > > >         minorversion: 2
> > > >         Operations (count: 1): CREATE_SESSION
> > > >         [Main Opcode: CREATE_SESSION (43)]
> > > >     GSS Checksum:
> > > > 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
> > > >         GSS Token Length: 40
> > > >         GSS-API Generic Security Service Application Program Interface
> > > >             krb5_blob:
> > > > 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
> > > >
> > > > response:
> > > >
> > > > Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648 bits)
> > > > Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
> > > > HP_19:7d:4b (e0:73:e7:19:7d:4b)
> > > > Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
> > > > Transmission Control Protocol, Src Port: 2049, Dst Port: 932, Seq: 1,
> > > > Ack: 313, Len: 140
> > > > Remote Procedure Call, Type:Reply XID:0x1d3220c4
> > > > Network File System
> > > >     [Program Version: 4]
> > > >     [V4 Procedure: COMPOUND (1)]
> > > >     GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
> > > >         Length: 24
> > > >         GSS Sequence Number: 76
> > > >         Status: NFS4ERR_DELAY (10008)
> > > >         Tag: <EMPTY>
> > > >         Operations (count: 1)
> > > >         [Main Opcode: CREATE_SESSION (43)]
> > > >     GSS Checksum:
> > > > 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
> > > >         GSS Token Length: 40
> > > >         GSS-API Generic Security Service Application Program Interface
> > > >             krb5_blob:
> > > > 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
> > > >
> > > > I was hoping that giving the client a different IP address would
> > > > resolve the issue for this client, but it didn't. Even though the
> > > > client had a new IP address (hostname was kept the same), it failed to
> > > > mount anything from the server.
> > > >
> > Changing the IP address won't help. The client is probably using the
> > same long-form client id as before, so the server still identifies the
> > client even with the address change.
> How is the client id determined? Will changing the hostname of the
> client trigger a change of the client id?

In the client record you showed a bit above, there is a "name" field:

name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"

That's the string the server uses to uniquely identify the client. So
yes, changing the hostname should change that string.

> >
> > Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
> > The client is expected to back off and retry, so if the server keeps
> > returning that repeatedly, then a hung mount command is expected.
> >
> > The question is why the server would keep returning DELAY. A lot of
> > different problems ranging from memory allocation issues to protocol
> > problems can result in that error. You may want to check the NFS server
> > and see if anything was logged there.
> There are no messages in the system logs that indicate any sort of
> memory issue. We also increased the min_kbytes_free sysctl to 2G on the
> server before we restarted it with the newer kernel.

Ok, I didn't expect to see anything like that, but it was a possibility.

> >
> > This is on a CREATE_SESSION call, so I wonder if the record held by the
> > (courteous) server is somehow blocking the attempt to reestablish the
> > session?
> >
> > Do you have a way to reproduce this? Since this is a centos kernel, you
> > could follow the page here to open a bug:
>
> Unfortunately we haven't found a reliable way to reproduce it. But we do
> seem to trigger it more and more lately.
>
>

Bummer, ok. Let us know if you figure out a way to reproduce it.

> > https://wiki.centos.org/ReportBugs.html
> >
> >
> > > > I created another dump of the workqueues and worker pools on the server:
> > > >
> > > > [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
> > > > [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
> > > > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> > > > active=1/256 refcnt=2
> > > > [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
> > > > [drm_kms_helper]
> > > > [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient: flags=0x80
> > > > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> > > > active=1/256 refcnt=2
> > > > [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
> > > > [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
> > > > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> > > > active=1/256 refcnt=3
> > > > [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu BAR(362)
> > > > [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
> > > > [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0 nice=-20
> > > > active=1/256 refcnt=2
> > > > [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
> > > >
> > > >
> > > > In contrast to last time, it doesn't show anything regarding nfs this
> > > > time.
> > > >
> > > > I also tried the suggestion from Dai Ngo (echo 3 >
> > > > /proc/sys/vm/drop_caches), but that didn't seem to make any difference.
> > > >
> > > > We haven't restarted the server yet as it seems the impact seems to
> > > > affect fewer clients that before. Is there anything we can run on the
> > > > server to further debug this?
> > > >
> > > > In the past, the issue seemed to deteriorate rapidly and resulted in
> > > > issues for almost all clients after about 20 minutes. This time the
> > > > impact seems to be less, but it's not gone.
> > > >
> > > > How can we force the NFS server to forget about a specific client? I
> > > > haven't tried to restart the nfs service yet as I'm afraid it will
> > > > fail to stop as before.
> > > >
> > Not with that kernel. There are some new administrative interfaces that
> > might allow that in the future, but they were just merged upstream and
> > aren't in that kernel.
> >
> > --
> > Jeff Layton <[email protected]>
>

--
Jeff Layton <[email protected]>

2024-03-19 10:58:59

by Rik Theys

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

Hi,

On 3/19/24 11:39, Jeff Layton wrote:
> On Tue, 2024-03-19 at 08:58 +0100, Rik Theys wrote:
>> Hi,
>>
>> On 3/18/24 22:54, Jeff Layton wrote:
>>> On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
>>>> Hi,
>>>>
>>>> On 3/18/24 21:21, Rik Theys wrote:
>>>>> Hi Jeff,
>>>>>
>>>>> On 3/12/24 13:47, Jeff Layton wrote:
>>>>>> On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
>>>>>>> Hi Jeff,
>>>>>>>
>>>>>>> On 3/12/24 12:22, Jeff Layton wrote:
>>>>>>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>>>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>>>>>>> logged hung nfsd tasks. The initial effect was that some clients
>>>>>>>>> could no longer access the NFS server. This got worse and worse
>>>>>>>>> (probably as more nfsd threads got blocked) and we had to restart
>>>>>>>>> the server. Restarting the server also failed as the NFS server
>>>>>>>>> service could no longer be stopped.
>>>>>>>>>
>>>>>>>>> The initial kernel we noticed this behavior on was
>>>>>>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>>>>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
>>>>>>>>> happened again on this newer kernel version:
>>>>>> 419 is fairly up to date with nfsd changes. There are some known bugs
>>>>>> around callbacks, and there is a draft MR in flight to fix it.
>>>>>>
>>>>>> What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ? If we
>>>>>> can bracket the changes around a particular version, then that might
>>>>>> help identify the problem.
>>>>>>
>>>>>>>>> [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
>>>>>>>>>    [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]task:nfsd             state:D stack:0
>>>>>>>>>     pid:8865  ppid:2      flags:0x00004000
>>>>>>>>>    [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120
>>>>>>>>> [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>> [sunrpc]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>>>    [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
>>>>>>>>> more than 122 seconds.
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>>>    [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]task:nfsd             state:D stack:0
>>>>>>>>>     pid:8866  ppid:2      flags:0x00004000
>>>>>>>>>    [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240
>>>>>>>>> [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>> [sunrpc]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>    [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>>>
>>>>>>>> The above threads are trying to flush the workqueue, so that probably
>>>>>>>> means that they are stuck waiting on a workqueue job to finish.
>>>>>>>>>    The above is repeated a few times, and then this warning is
>>>>>>>>> also logged:
>>>>>>>>>    [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
>>>>>>>>>    [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>>>>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
>>>>>>>>> dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
>>>>>>>>> iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
>>>>>>>>>    nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>>>>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
>>>>>>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>>>>>>    ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>>>>>>> intel_rapl_common intel_uncore_frequency
>>>>>>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>>>>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>>>>>>    _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>>>>>>>>> dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
>>>>>>>>> dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>>>>>>    ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi
>>>>>>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>>>>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>>>>>>    nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
>>>>>>>>> mbcache jbd2 sd_mod sg lpfc
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics
>>>>>>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
>>>>>>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>>>>>>    el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>>>>>>> dm_region_hash dm_log dm_mod
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>>>>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
>>>>>>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] RIP:
>>>>>>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48
>>>>>>>>> 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
>>>>>>>>> 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>>>>>>    02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
>>>>>>>>> 00010246
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>>>>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>>>>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>>>>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>>>>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>>>>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
>>>>>>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>>>>>>>> 0000000080050033
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>>>>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>>>>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>> [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>> [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>> [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>> nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830
>>>>>>>>> [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>> [sunrpc]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>    [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>>>>>>    [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
>>>>>>>> This is probably this WARN in nfsd_break_one_deleg:
>>>>>>>>
>>>>>>>> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>>>>>>>>
>>>>>>>> It means that a delegation break callback to the client couldn't be
>>>>>>>> queued to the workqueue, and so it didn't run.
>>>>>>>>
>>>>>>>>> Could this be the same issue as described
>>>>>>>>> here:https://lore.kernel.org/linux-nfs/[email protected]/
>>>>>>>>> ?
>>>>>>>> Yes, most likely the same problem.
>>>>>>> If I read that thread correctly, this issue was introduced between
>>>>>>> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
>>>>>>> backported these changes, or were we hitting some other bug with that
>>>>>>> version? It seems the 6.1.x kernel is not affected? If so, that
>>>>>>> would be
>>>>>>> the recommended kernel to run?
>>>>>> Anything is possible. We have to identify the problem first.
>>>>>>>>> As described in that thread, I've tried to obtain the requested
>>>>>>>>> information.
>>>>>>>>>
>>>>>>>>> Is it possible this is the issue that was fixed by the patches
>>>>>>>>> described
>>>>>>>>> here?https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/
>>>>>>>>>
>>>>>>>> Doubtful. Those are targeted toward a different set of issues.
>>>>>>>>
>>>>>>>> If you're willing, I do have some patches queued up for CentOS here
>>>>>>>> that
>>>>>>>> fix some backchannel problems that could be related. I'm mainly
>>>>>>>> waiting
>>>>>>>> on Chuck to send these to Linus and then we'll likely merge them into
>>>>>>>> CentOS soon afterward:
>>>>>>>>
>>>>>>>> https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689
>>>>>>>>
>>>>>>>>
>>>>>>> If you can send me a patch file, I can rebuild the C9S kernel with that
>>>>>>> patch and run it. It can take a while for the bug to trigger as I
>>>>>>> believe it seems to be very workload dependent (we were running very
>>>>>>> stable for months and now hit this bug every other week).
>>>>>>>
>>>>>>>
>>>>>> It's probably simpler to just pull down the build artifacts for that MR.
>>>>>> You have to drill down through the CI for it, but they are here:
>>>>>>
>>>>>> https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/1194300175/publish_x86_64/6278921877/artifacts/
>>>>>>
>>>>>>
>>>>>> There's even a repo file you can install on the box to pull them down.
>>>>> We installed this kernel on the server 3 days ago. Today, a user
>>>>> informed us that their screen was black after logging in. Similar to
>>>>> other occurrences of this issue, the mount command on the client was
>>>>> hung. But in contrast to the other times, there were no messages in
>>>>> the logs kernel logs on the server. Even restarting the client does
>>>>> not resolve the issue.
>>> Ok, so you rebooted the client and it's still unable to mount? That
>>> sounds like a server problem if so.
>>>
>>> Are both client and server running the same kernel?
>> No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
>> 5.14.0-362.18.1.el9_3.
> Ok.
>
>>>>> Something still seems to be wrong on the server though. When I look at
>>>>> the directories under /proc/fs/nfsd/clients, there's still a directory
>>>>> for the specific client, even though it's no longer running:
>>>>>
>>>>> # cat 155/info
>>>>> clientid: 0xc8edb7f65f4a9ad
>>>>> address: "10.87.31.152:819"
>>>>> status: confirmed
>>>>> seconds from last renew: 33163
>>>>> name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
>>>>> minor version: 2
>>>>> Implementation domain: "kernel.org"
>>>>> Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
>>>>> PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
>>>>> Implementation time: [0, 0]
>>>>> callback state: DOWN
>>>>> callback address: 10.87.31.152:0
>>>>>
>>> If you just shut down the client, the server won't immediately purge its
>>> record. In fact, assuming you're running the same kernel on the server,
>>> it won't purge the client record until there is a conflicting request
>>> for its state.
>> Is there a way to force such a conflicting request (to get the client
>> record to purge)?
> From the server or a different client, you can try opening the inodes
> that the stuck client is holding open. If you open them for write, that
> may trigger the server to kick out the old client record.
>
> The problem is that they are disconnected dentries, so finding them to
> open via path may be difficult...

I've located the file that matches one of these inodes. When I go to the

location of the file on another NFS client and touch the file, the touch
command

just hangs.

So I assume the server is trying to recall the delegation from the
dis-functional client?

When I run tcpdump on the dis-functional client, I see the
CREATE_SESSION / NFS4ERR_DELAY messages, but nothing to indicate the
server wants to revoke a delegation.

The entry for the dis-functional client is not removed on the server.

I there anything else I can do to provide more information about this
situation?

Regards,

Rik


>>>> The nfsdclnts command for this client shows the following delegations:
>>>>
>>>> # nfsdclnts -f 155/states -t all
>>>> Inode number | Type   | Access | Deny | ip address            | Filename
>>>> 169346743    | open   | r-     | --   | 10.87.31.152:819      |
>>>> disconnected dentry
>>>> 169346743    | deleg  | r      |      | 10.87.31.152:819      |
>>>> disconnected dentry
>>>> 169346746    | open   | r-     | --   | 10.87.31.152:819      |
>>>> disconnected dentry
>>>> 169346746    | deleg  | r      |      | 10.87.31.152:819      |
>>>> disconnected dentry
>>>>
>>>> I see a lot of recent patches regarding directory delegations. Could
>>>> this be related to this?
>>>>
>>>> Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory delegation?
>>>>
>>>>
>>> No. Directory delegations are a new feature that's still under
>>> development. They use some of the same machinery as file delegations,
>>> but they wouldn't be a factor here.
>>>
>>>>> The system seems to have identified that the client is no longer
>>>>> reachable, but the client entry does not go away. When a mount was
>>>>> hanging on the client, there would be two directories in clients for
>>>>> the same client. Killing the mount command clears up the second entry.
>>>>>
>>>>> Even after running conntrack -D on the server to remove the tcp
>>>>> connection from the conntrack table, the entry doesn't go away and the
>>>>> client still can not mount anything from the server.
>>>>>
>>>>> A tcpdump on the client while a mount was running logged the following
>>>>> messages over and over again:
>>>>>
>>>>> request:
>>>>>
>>>>> Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024 bits)
>>>>> Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
>>>>> ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
>>>>> Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
>>>>> Transmission Control Protocol, Src Port: 932, Dst Port: 2049, Seq: 1,
>>>>> Ack: 1, Len: 312
>>>>> Remote Procedure Call, Type:Call XID:0x1d3220c4
>>>>> Network File System
>>>>>     [Program Version: 4]
>>>>>     [V4 Procedure: COMPOUND (1)]
>>>>>     GSS Data, Ops(1): CREATE_SESSION
>>>>>         Length: 152
>>>>>         GSS Sequence Number: 76
>>>>>         Tag: <EMPTY>
>>>>>         minorversion: 2
>>>>>         Operations (count: 1): CREATE_SESSION
>>>>>         [Main Opcode: CREATE_SESSION (43)]
>>>>>     GSS Checksum:
>>>>> 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
>>>>>         GSS Token Length: 40
>>>>>         GSS-API Generic Security Service Application Program Interface
>>>>>             krb5_blob:
>>>>> 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
>>>>>
>>>>> response:
>>>>>
>>>>> Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648 bits)
>>>>> Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
>>>>> HP_19:7d:4b (e0:73:e7:19:7d:4b)
>>>>> Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
>>>>> Transmission Control Protocol, Src Port: 2049, Dst Port: 932, Seq: 1,
>>>>> Ack: 313, Len: 140
>>>>> Remote Procedure Call, Type:Reply XID:0x1d3220c4
>>>>> Network File System
>>>>>     [Program Version: 4]
>>>>>     [V4 Procedure: COMPOUND (1)]
>>>>>     GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
>>>>>         Length: 24
>>>>>         GSS Sequence Number: 76
>>>>>         Status: NFS4ERR_DELAY (10008)
>>>>>         Tag: <EMPTY>
>>>>>         Operations (count: 1)
>>>>>         [Main Opcode: CREATE_SESSION (43)]
>>>>>     GSS Checksum:
>>>>> 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
>>>>>         GSS Token Length: 40
>>>>>         GSS-API Generic Security Service Application Program Interface
>>>>>             krb5_blob:
>>>>> 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
>>>>>
>>>>> I was hoping that giving the client a different IP address would
>>>>> resolve the issue for this client, but it didn't. Even though the
>>>>> client had a new IP address (hostname was kept the same), it failed to
>>>>> mount anything from the server.
>>>>>
>>> Changing the IP address won't help. The client is probably using the
>>> same long-form client id as before, so the server still identifies the
>>> client even with the address change.
>> How is the client id determined? Will changing the hostname of the
>> client trigger a change of the client id?
> In the client record you showed a bit above, there is a "name" field:
>
> name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
>
> That's the string the server uses to uniquely identify the client. So
> yes, changing the hostname should change that string.
>
>>> Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
>>> The client is expected to back off and retry, so if the server keeps
>>> returning that repeatedly, then a hung mount command is expected.
>>>
>>> The question is why the server would keep returning DELAY. A lot of
>>> different problems ranging from memory allocation issues to protocol
>>> problems can result in that error. You may want to check the NFS server
>>> and see if anything was logged there.
>> There are no messages in the system logs that indicate any sort of
>> memory issue. We also increased the min_kbytes_free sysctl to 2G on the
>> server before we restarted it with the newer kernel.
> Ok, I didn't expect to see anything like that, but it was a possibility.
>
>>> This is on a CREATE_SESSION call, so I wonder if the record held by the
>>> (courteous) server is somehow blocking the attempt to reestablish the
>>> session?
>>>
>>> Do you have a way to reproduce this? Since this is a centos kernel, you
>>> could follow the page here to open a bug:
>> Unfortunately we haven't found a reliable way to reproduce it. But we do
>> seem to trigger it more and more lately.
>>
>>
> Bummer, ok. Let us know if you figure out a way to reproduce it.
>
>>> https://wiki.centos.org/ReportBugs.html
>>>
>>>
>>>>> I created another dump of the workqueues and worker pools on the server:
>>>>>
>>>>> [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
>>>>> [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>>> active=1/256 refcnt=2
>>>>> [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
>>>>> [drm_kms_helper]
>>>>> [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient: flags=0x80
>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>>> active=1/256 refcnt=2
>>>>> [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
>>>>> [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>>> active=1/256 refcnt=3
>>>>> [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu BAR(362)
>>>>> [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
>>>>> [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0 nice=-20
>>>>> active=1/256 refcnt=2
>>>>> [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
>>>>>
>>>>>
>>>>> In contrast to last time, it doesn't show anything regarding nfs this
>>>>> time.
>>>>>
>>>>> I also tried the suggestion from Dai Ngo (echo 3 >
>>>>> /proc/sys/vm/drop_caches), but that didn't seem to make any difference.
>>>>>
>>>>> We haven't restarted the server yet as it seems the impact seems to
>>>>> affect fewer clients that before. Is there anything we can run on the
>>>>> server to further debug this?
>>>>>
>>>>> In the past, the issue seemed to deteriorate rapidly and resulted in
>>>>> issues for almost all clients after about 20 minutes. This time the
>>>>> impact seems to be less, but it's not gone.
>>>>>
>>>>> How can we force the NFS server to forget about a specific client? I
>>>>> haven't tried to restart the nfs service yet as I'm afraid it will
>>>>> fail to stop as before.
>>>>>
>>> Not with that kernel. There are some new administrative interfaces that
>>> might allow that in the future, but they were just merged upstream and
>>> aren't in that kernel.
>>>
>>> --
>>> Jeff Layton <[email protected]>

--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>


2024-03-19 11:36:26

by Jeffrey Layton

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

On Tue, 2024-03-19 at 11:58 +0100, Rik Theys wrote:
> Hi,
>
> On 3/19/24 11:39, Jeff Layton wrote:
> > On Tue, 2024-03-19 at 08:58 +0100, Rik Theys wrote:
> > > Hi,
> > >
> > > On 3/18/24 22:54, Jeff Layton wrote:
> > > > On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
> > > > > Hi,
> > > > >
> > > > > On 3/18/24 21:21, Rik Theys wrote:
> > > > > > Hi Jeff,
> > > > > >
> > > > > > On 3/12/24 13:47, Jeff Layton wrote:
> > > > > > > On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
> > > > > > > > Hi Jeff,
> > > > > > > >
> > > > > > > > On 3/12/24 12:22, Jeff Layton wrote:
> > > > > > > > > On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
> > > > > > > > > > Since a few weeks our Rocky Linux 9 NFS server has periodically
> > > > > > > > > > logged hung nfsd tasks. The initial effect was that some clients
> > > > > > > > > > could no longer access the NFS server. This got worse and worse
> > > > > > > > > > (probably as more nfsd threads got blocked) and we had to restart
> > > > > > > > > > the server. Restarting the server also failed as the NFS server
> > > > > > > > > > service could no longer be stopped.
> > > > > > > > > >
> > > > > > > > > > The initial kernel we noticed this behavior on was
> > > > > > > > > > kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
> > > > > > > > > > kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
> > > > > > > > > > happened again on this newer kernel version:
> > > > > > > 419 is fairly up to date with nfsd changes. There are some known bugs
> > > > > > > around callbacks, and there is a draft MR in flight to fix it.
> > > > > > >
> > > > > > > What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ? If we
> > > > > > > can bracket the changes around a particular version, then that might
> > > > > > > help identify the problem.
> > > > > > >
> > > > > > > > > > [Mon Mar 11 14:10:08 2024]       Not tainted 5.14.0-419.el9.x86_64 #1
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024] "echo 0 >
> > > > > > > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]task:nfsd             state:D stack:0
> > > > > > > > > >     pid:8865  ppid:2      flags:0x00004000
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024] Call Trace:
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  <TASK>
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120
> > > > > > > > > > [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
> > > > > > > > > > [sunrpc]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  </TASK>
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
> > > > > > > > > > more than 122 seconds.
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]       Not tainted
> > > > > > > > > > 5.14.0-419.el9.x86_64 #1
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024] "echo 0 >
> > > > > > > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]task:nfsd             state:D stack:0
> > > > > > > > > >     pid:8866  ppid:2      flags:0x00004000
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024] Call Trace:
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  <TASK>
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240
> > > > > > > > > > [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
> > > > > > > > > > [sunrpc]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
> > > > > > > > > >    [Mon Mar 11 14:10:08 2024]  </TASK>
> > > > > > > > > >
> > > > > > > > > The above threads are trying to flush the workqueue, so that probably
> > > > > > > > > means that they are stuck waiting on a workqueue job to finish.
> > > > > > > > > >    The above is repeated a few times, and then this warning is
> > > > > > > > > > also logged:
> > > > > > > > > >    [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
> > > > > > > > > >    [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
> > > > > > > > > > fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
> > > > > > > > > > dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
> > > > > > > > > > iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
> > > > > > > > > >    nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
> > > > > > > > > > nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
> > > > > > > > > > fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
> > > > > > > > > >    ibcrc32c dm_service_time dm_multipath intel_rapl_msr
> > > > > > > > > > intel_rapl_common intel_uncore_frequency
> > > > > > > > > > intel_uncore_frequency_common isst_if_common skx_edac nfit
> > > > > > > > > > libnvdimm ipmi_ssif x86_pkg_temp
> > > > > > > > > >    _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
> > > > > > > > > > dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
> > > > > > > > > > dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
> > > > > > > > > >    ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi
> > > > > > > > > > mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
> > > > > > > > > > ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
> > > > > > > > > >    nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
> > > > > > > > > > mbcache jbd2 sd_mod sg lpfc
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc nvme_fabrics
> > > > > > > > > > crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
> > > > > > > > > > ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
> > > > > > > > > >    el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
> > > > > > > > > > dm_region_hash dm_log dm_mod
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
> > > > > > > > > > tainted 5.14.0-419.el9.x86_64 #1
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
> > > > > > > > > > R740/00WGD1, BIOS 2.20.1 09/13/2023
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] RIP:
> > > > > > > > > > 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48
> > > > > > > > > > 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
> > > > > > > > > > 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
> > > > > > > > > >    02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
> > > > > > > > > > 00010246
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
> > > > > > > > > > ffff8ada51930900 RCX: 0000000000000024
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
> > > > > > > > > > ffff8ad582933c00 RDI: 0000000000002000
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
> > > > > > > > > > ffff9929e0bb7b48 R09: 0000000000000000
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
> > > > > > > > > > 0000000000000000 R12: ffff8ad6f497c360
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
> > > > > > > > > > ffff8ae5942e0b10 R15: ffff8ad6f497c360
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
> > > > > > > > > > GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
> > > > > > > > > > 0000000080050033
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
> > > > > > > > > > 00000018e58de006 CR4: 00000000007706e0
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
> > > > > > > > > > 0000000000000000 DR2: 0000000000000000
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
> > > > > > > > > > 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] PKRU: 55555554
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] Call Trace:
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  <TASK>
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
> > > > > > > > > > [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
> > > > > > > > > > [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
> > > > > > > > > > [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ?
> > > > > > > > > > nfsd_file_lookup_locked+0x117/0x160 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830
> > > > > > > > > > [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
> > > > > > > > > > [sunrpc]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024]  </TASK>
> > > > > > > > > >    [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
> > > > > > > > > This is probably this WARN in nfsd_break_one_deleg:
> > > > > > > > >
> > > > > > > > > WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
> > > > > > > > >
> > > > > > > > > It means that a delegation break callback to the client couldn't be
> > > > > > > > > queued to the workqueue, and so it didn't run.
> > > > > > > > >
> > > > > > > > > > Could this be the same issue as described
> > > > > > > > > > here:https://lore.kernel.org/linux-nfs/[email protected]/
> > > > > > > > > > ?
> > > > > > > > > Yes, most likely the same problem.
> > > > > > > > If I read that thread correctly, this issue was introduced between
> > > > > > > > 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
> > > > > > > > backported these changes, or were we hitting some other bug with that
> > > > > > > > version? It seems the 6.1.x kernel is not affected? If so, that
> > > > > > > > would be
> > > > > > > > the recommended kernel to run?
> > > > > > > Anything is possible. We have to identify the problem first.
> > > > > > > > > > As described in that thread, I've tried to obtain the requested
> > > > > > > > > > information.
> > > > > > > > > >
> > > > > > > > > > Is it possible this is the issue that was fixed by the patches
> > > > > > > > > > described
> > > > > > > > > > here?https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/
> > > > > > > > > >
> > > > > > > > > Doubtful. Those are targeted toward a different set of issues.
> > > > > > > > >
> > > > > > > > > If you're willing, I do have some patches queued up for CentOS here
> > > > > > > > > that
> > > > > > > > > fix some backchannel problems that could be related. I'm mainly
> > > > > > > > > waiting
> > > > > > > > > on Chuck to send these to Linus and then we'll likely merge them into
> > > > > > > > > CentOS soon afterward:
> > > > > > > > >
> > > > > > > > > https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689
> > > > > > > > >
> > > > > > > > >
> > > > > > > > If you can send me a patch file, I can rebuild the C9S kernel with that
> > > > > > > > patch and run it. It can take a while for the bug to trigger as I
> > > > > > > > believe it seems to be very workload dependent (we were running very
> > > > > > > > stable for months and now hit this bug every other week).
> > > > > > > >
> > > > > > > >
> > > > > > > It's probably simpler to just pull down the build artifacts for that MR.
> > > > > > > You have to drill down through the CI for it, but they are here:
> > > > > > >
> > > > > > > https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/1194300175/publish_x86_64/6278921877/artifacts/
> > > > > > >
> > > > > > >
> > > > > > > There's even a repo file you can install on the box to pull them down.
> > > > > > We installed this kernel on the server 3 days ago. Today, a user
> > > > > > informed us that their screen was black after logging in. Similar to
> > > > > > other occurrences of this issue, the mount command on the client was
> > > > > > hung. But in contrast to the other times, there were no messages in
> > > > > > the logs kernel logs on the server. Even restarting the client does
> > > > > > not resolve the issue.
> > > > Ok, so you rebooted the client and it's still unable to mount? That
> > > > sounds like a server problem if so.
> > > >
> > > > Are both client and server running the same kernel?
> > > No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
> > > 5.14.0-362.18.1.el9_3.
> > Ok.
> >
> > > > > > Something still seems to be wrong on the server though. When I look at
> > > > > > the directories under /proc/fs/nfsd/clients, there's still a directory
> > > > > > for the specific client, even though it's no longer running:
> > > > > >
> > > > > > # cat 155/info
> > > > > > clientid: 0xc8edb7f65f4a9ad
> > > > > > address: "10.87.31.152:819"
> > > > > > status: confirmed
> > > > > > seconds from last renew: 33163
> > > > > > name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
> > > > > > minor version: 2
> > > > > > Implementation domain: "kernel.org"
> > > > > > Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
> > > > > > PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
> > > > > > Implementation time: [0, 0]
> > > > > > callback state: DOWN
> > > > > > callback address: 10.87.31.152:0
> > > > > >
> > > > If you just shut down the client, the server won't immediately purge its
> > > > record. In fact, assuming you're running the same kernel on the server,
> > > > it won't purge the client record until there is a conflicting request
> > > > for its state.
> > > Is there a way to force such a conflicting request (to get the client
> > > record to purge)?
> > From the server or a different client, you can try opening the inodes
> > that the stuck client is holding open. If you open them for write, that
> > may trigger the server to kick out the old client record.
> >
> > The problem is that they are disconnected dentries, so finding them to
> > open via path may be difficult...
>
> I've located the file that matches one of these inodes. When I go to the
>
> location of the file on another NFS client and touch the file, the touch
> command
>
> just hangs.
>
> So I assume the server is trying to recall the delegation from the
> dis-functional client?
>
> When I run tcpdump on the dis-functional client, I see the
> CREATE_SESSION / NFS4ERR_DELAY messages, but nothing to indicate the
> server wants to revoke a delegation.
>

Right, according to the client record above, the callback channel is
DOWN, so the server can't communicate with the client. Given that there
are 33000s since the last lease renewal, the server should kick the
client out that is blocking other activity, but that doesn't seem to be
happening here.


> The entry for the dis-functional client is not removed on the server.
>
> I there anything else I can do to provide more information about this
> situation?
>
>

The main function that handles the CREATE_SESSION call is
nfsd4_create_session. It's somewhat complex and there are a number of
reasons that function could return NFS4ERR_DELAY (aka nfserr_jukebox in
the kernel code).

What I'd do at this point is turn up tracepoints and see whether they
shed any light on what's going wrong to make it continually delay your
CREATE_SESSION calls. There aren't a lot of tracepoints in that code,
however, so it may not show much.

In the absence of that, you can try to use bpftrace or something similar
to debug what's happening in that function.

>
>
> > > > > The nfsdclnts command for this client shows the following delegations:
> > > > >
> > > > > # nfsdclnts -f 155/states -t all
> > > > > Inode number | Type   | Access | Deny | ip address            | Filename
> > > > > 169346743    | open   | r-     | --   | 10.87.31.152:819      |
> > > > > disconnected dentry
> > > > > 169346743    | deleg  | r      |      | 10.87.31.152:819      |
> > > > > disconnected dentry
> > > > > 169346746    | open   | r-     | --   | 10.87.31.152:819      |
> > > > > disconnected dentry
> > > > > 169346746    | deleg  | r      |      | 10.87.31.152:819      |
> > > > > disconnected dentry
> > > > >
> > > > > I see a lot of recent patches regarding directory delegations. Could
> > > > > this be related to this?
> > > > >
> > > > > Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory delegation?
> > > > >
> > > > >
> > > > No. Directory delegations are a new feature that's still under
> > > > development. They use some of the same machinery as file delegations,
> > > > but they wouldn't be a factor here.
> > > >
> > > > > > The system seems to have identified that the client is no longer
> > > > > > reachable, but the client entry does not go away. When a mount was
> > > > > > hanging on the client, there would be two directories in clients for
> > > > > > the same client. Killing the mount command clears up the second entry.
> > > > > >
> > > > > > Even after running conntrack -D on the server to remove the tcp
> > > > > > connection from the conntrack table, the entry doesn't go away and the
> > > > > > client still can not mount anything from the server.
> > > > > >
> > > > > > A tcpdump on the client while a mount was running logged the following
> > > > > > messages over and over again:
> > > > > >
> > > > > > request:
> > > > > >
> > > > > > Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024 bits)
> > > > > > Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
> > > > > > ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
> > > > > > Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
> > > > > > Transmission Control Protocol, Src Port: 932, Dst Port: 2049, Seq: 1,
> > > > > > Ack: 1, Len: 312
> > > > > > Remote Procedure Call, Type:Call XID:0x1d3220c4
> > > > > > Network File System
> > > > > >     [Program Version: 4]
> > > > > >     [V4 Procedure: COMPOUND (1)]
> > > > > >     GSS Data, Ops(1): CREATE_SESSION
> > > > > >         Length: 152
> > > > > >         GSS Sequence Number: 76
> > > > > >         Tag: <EMPTY>
> > > > > >         minorversion: 2
> > > > > >         Operations (count: 1): CREATE_SESSION
> > > > > >         [Main Opcode: CREATE_SESSION (43)]
> > > > > >     GSS Checksum:
> > > > > > 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
> > > > > >         GSS Token Length: 40
> > > > > >         GSS-API Generic Security Service Application Program Interface
> > > > > >             krb5_blob:
> > > > > > 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
> > > > > >
> > > > > > response:
> > > > > >
> > > > > > Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648 bits)
> > > > > > Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
> > > > > > HP_19:7d:4b (e0:73:e7:19:7d:4b)
> > > > > > Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
> > > > > > Transmission Control Protocol, Src Port: 2049, Dst Port: 932, Seq: 1,
> > > > > > Ack: 313, Len: 140
> > > > > > Remote Procedure Call, Type:Reply XID:0x1d3220c4
> > > > > > Network File System
> > > > > >     [Program Version: 4]
> > > > > >     [V4 Procedure: COMPOUND (1)]
> > > > > >     GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
> > > > > >         Length: 24
> > > > > >         GSS Sequence Number: 76
> > > > > >         Status: NFS4ERR_DELAY (10008)
> > > > > >         Tag: <EMPTY>
> > > > > >         Operations (count: 1)
> > > > > >         [Main Opcode: CREATE_SESSION (43)]
> > > > > >     GSS Checksum:
> > > > > > 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
> > > > > >         GSS Token Length: 40
> > > > > >         GSS-API Generic Security Service Application Program Interface
> > > > > >             krb5_blob:
> > > > > > 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
> > > > > >
> > > > > > I was hoping that giving the client a different IP address would
> > > > > > resolve the issue for this client, but it didn't. Even though the
> > > > > > client had a new IP address (hostname was kept the same), it failed to
> > > > > > mount anything from the server.
> > > > > >
> > > > Changing the IP address won't help. The client is probably using the
> > > > same long-form client id as before, so the server still identifies the
> > > > client even with the address change.
> > > How is the client id determined? Will changing the hostname of the
> > > client trigger a change of the client id?
> > In the client record you showed a bit above, there is a "name" field:
> >
> > name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
> >
> > That's the string the server uses to uniquely identify the client. So
> > yes, changing the hostname should change that string.
> >
> > > > Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
> > > > The client is expected to back off and retry, so if the server keeps
> > > > returning that repeatedly, then a hung mount command is expected.
> > > >
> > > > The question is why the server would keep returning DELAY. A lot of
> > > > different problems ranging from memory allocation issues to protocol
> > > > problems can result in that error. You may want to check the NFS server
> > > > and see if anything was logged there.
> > > There are no messages in the system logs that indicate any sort of
> > > memory issue. We also increased the min_kbytes_free sysctl to 2G on the
> > > server before we restarted it with the newer kernel.
> > Ok, I didn't expect to see anything like that, but it was a possibility.
> >
> > > > This is on a CREATE_SESSION call, so I wonder if the record held by the
> > > > (courteous) server is somehow blocking the attempt to reestablish the
> > > > session?
> > > >
> > > > Do you have a way to reproduce this? Since this is a centos kernel, you
> > > > could follow the page here to open a bug:
> > > Unfortunately we haven't found a reliable way to reproduce it. But we do
> > > seem to trigger it more and more lately.
> > >
> > >
> > Bummer, ok. Let us know if you figure out a way to reproduce it.
> >
> > > > https://wiki.centos.org/ReportBugs.html
> > > >
> > > >
> > > > > > I created another dump of the workqueues and worker pools on the server:
> > > > > >
> > > > > > [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
> > > > > > [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
> > > > > > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> > > > > > active=1/256 refcnt=2
> > > > > > [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
> > > > > > [drm_kms_helper]
> > > > > > [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient: flags=0x80
> > > > > > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> > > > > > active=1/256 refcnt=2
> > > > > > [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
> > > > > > [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
> > > > > > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> > > > > > active=1/256 refcnt=3
> > > > > > [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu BAR(362)
> > > > > > [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
> > > > > > [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0 nice=-20
> > > > > > active=1/256 refcnt=2
> > > > > > [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
> > > > > >
> > > > > >
> > > > > > In contrast to last time, it doesn't show anything regarding nfs this
> > > > > > time.
> > > > > >
> > > > > > I also tried the suggestion from Dai Ngo (echo 3 >
> > > > > > /proc/sys/vm/drop_caches), but that didn't seem to make any difference.
> > > > > >
> > > > > > We haven't restarted the server yet as it seems the impact seems to
> > > > > > affect fewer clients that before. Is there anything we can run on the
> > > > > > server to further debug this?
> > > > > >
> > > > > > In the past, the issue seemed to deteriorate rapidly and resulted in
> > > > > > issues for almost all clients after about 20 minutes. This time the
> > > > > > impact seems to be less, but it's not gone.
> > > > > >
> > > > > > How can we force the NFS server to forget about a specific client? I
> > > > > > haven't tried to restart the nfs service yet as I'm afraid it will
> > > > > > fail to stop as before.
> > > > > >
> > > > Not with that kernel. There are some new administrative interfaces that
> > > > might allow that in the future, but they were just merged upstream and
> > > > aren't in that kernel.
> > > >
> > > > --
> > > > Jeff Layton <[email protected]>
>

--
Jeff Layton <[email protected]>

2024-03-19 17:12:20

by Dai Ngo

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning


On 3/19/24 12:58 AM, Rik Theys wrote:
> Hi,
>
> On 3/18/24 22:54, Jeff Layton wrote:
>> On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
>>> Hi,
>>>
>>> On 3/18/24 21:21, Rik Theys wrote:
>>>> Hi Jeff,
>>>>
>>>> On 3/12/24 13:47, Jeff Layton wrote:
>>>>> On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
>>>>>> Hi Jeff,
>>>>>>
>>>>>> On 3/12/24 12:22, Jeff Layton wrote:
>>>>>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>>>>>> logged hung nfsd tasks. The initial effect was that some clients
>>>>>>>> could no longer access the NFS server. This got worse and worse
>>>>>>>> (probably as more nfsd threads got blocked) and we had to restart
>>>>>>>> the server. Restarting the server also failed as the NFS server
>>>>>>>> service could no longer be stopped.
>>>>>>>>
>>>>>>>> The initial kernel we noticed this behavior on was
>>>>>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>>>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
>>>>>>>> happened again on this newer kernel version:
>>>>> 419 is fairly up to date with nfsd changes. There are some known bugs
>>>>> around callbacks, and there is a draft MR in flight to fix it.
>>>>>
>>>>> What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ? If we
>>>>> can bracket the changes around a particular version, then that might
>>>>> help identify the problem.
>>>>>
>>>>>>>> [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>>     [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D stack:0
>>>>>>>>      pid:8865  ppid:2      flags:0x00004000
>>>>>>>>     [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd4_shutdown_callback+0x49/0x120
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? nfsd4_shutdown_copy+0x68/0xc0
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? nfsd4_decode_opaque+0x3a/0x90
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>>> [sunrpc]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>>     [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
>>>>>>>> more than 122 seconds.
>>>>>>>>     [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>>     [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D stack:0
>>>>>>>>      pid:8866  ppid:2      flags:0x00004000
>>>>>>>>     [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_schedule_timeout+0x10/0x10
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd4_proc_compound+0x44b/0x700
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>>> [sunrpc]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd_dispatch+0x10/0x10
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>>     [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>>
>>>>>>> The above threads are trying to flush the workqueue, so that
>>>>>>> probably
>>>>>>> means that they are stuck waiting on a workqueue job to finish.
>>>>>>>>     The above is repeated a few times, and then this warning is
>>>>>>>> also logged:
>>>>>>>>     [Mon Mar 11 14:12:04 2024] ------------[ cut here
>>>>>>>> ]------------
>>>>>>>>     [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>>>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
>>>>>>>> dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
>>>>>>>> iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter
>>>>>>>> nft_ct
>>>>>>>>     nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>>>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
>>>>>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>>>>>     ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>>>>>> intel_rapl_common intel_uncore_frequency
>>>>>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>>>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>>>>>     _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>>>>>>>> dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
>>>>>>>> dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>>>>>     ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt
>>>>>>>> acpi_ipmi
>>>>>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>>>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>>>>>     nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
>>>>>>>> mbcache jbd2 sd_mod sg lpfc
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc
>>>>>>>> nvme_fabrics
>>>>>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
>>>>>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>>>>>     el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>>>>>> dm_region_hash dm_log dm_mod
>>>>>>>>     [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>>>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>>>>>     [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
>>>>>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>>>>>     [Mon Mar 11 14:12:05 2024] RIP:
>>>>>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48
>>>>>>>> 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
>>>>>>>> 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>>>>>     02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>>>>>     [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
>>>>>>>> 00010246
>>>>>>>>     [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>>>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>>>>>     [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>>>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>>>>>     [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>>>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>>>>>     [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>>>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>>>>>     [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>>>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>>>>>     [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
>>>>>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>>>>>     [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>>>>>>> 0000000080050033
>>>>>>>>     [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>>>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>>>>>     [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>>>     [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>>>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>     [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>>>>>     [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>> nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd4_process_open2+0x430/0xa30
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd4_proc_compound+0x44b/0x700
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
>>>>>>>> [sunrpc]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd_dispatch+0x10/0x10
>>>>>>>> [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>>>>>     [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>>>>>     [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651
>>>>>>>> ]---
>>>>>>> This is probably this WARN in nfsd_break_one_deleg:
>>>>>>>
>>>>>>> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>>>>>>>
>>>>>>> It means that a delegation break callback to the client couldn't be
>>>>>>> queued to the workqueue, and so it didn't run.
>>>>>>>
>>>>>>>> Could this be the same issue as described
>>>>>>>> here:https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/[email protected]/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdBV9En7$
>>>>>>>> ?
>>>>>>> Yes, most likely the same problem.
>>>>>> If I read that thread correctly, this issue was introduced between
>>>>>> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
>>>>>> backported these changes, or were we hitting some other bug with
>>>>>> that
>>>>>> version? It seems the 6.1.x kernel is not affected? If so, that
>>>>>> would be
>>>>>> the recommended kernel to run?
>>>>> Anything is possible. We have to identify the problem first.
>>>>>>>> As described in that thread, I've tried to obtain the requested
>>>>>>>> information.
>>>>>>>>
>>>>>>>> Is it possible this is the issue that was fixed by the patches
>>>>>>>> described
>>>>>>>> here?https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkedtUP09$
>>>>>>>>
>>>>>>> Doubtful. Those are targeted toward a different set of issues.
>>>>>>>
>>>>>>> If you're willing, I do have some patches queued up for CentOS here
>>>>>>> that
>>>>>>> fix some backchannel problems that could be related. I'm mainly
>>>>>>> waiting
>>>>>>> on Chuck to send these to Linus and then we'll likely merge them
>>>>>>> into
>>>>>>> CentOS soon afterward:
>>>>>>>
>>>>>>> https://urldefense.com/v3/__https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdvDn8y7$
>>>>>>>
>>>>>>>
>>>>>> If you can send me a patch file, I can rebuild the C9S kernel
>>>>>> with that
>>>>>> patch and run it. It can take a while for the bug to trigger as I
>>>>>> believe it seems to be very workload dependent (we were running very
>>>>>> stable for months and now hit this bug every other week).
>>>>>>
>>>>>>
>>>>> It's probably simpler to just pull down the build artifacts for
>>>>> that MR.
>>>>> You have to drill down through the CI for it, but they are here:
>>>>>
>>>>> https://urldefense.com/v3/__https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts*1194300175*publish_x86_64*6278921877*artifacts*__;Ly8vLy8!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkaP5eW8V$
>>>>>
>>>>>
>>>>> There's even a repo file you can install on the box to pull them
>>>>> down.
>>>> We installed this kernel on the server 3 days ago. Today, a user
>>>> informed us that their screen was black after logging in. Similar to
>>>> other occurrences of this issue, the mount command on the client was
>>>> hung. But in contrast to the other times, there were no messages in
>>>> the logs kernel logs on the server. Even restarting the client does
>>>> not resolve the issue.
>>
>> Ok, so you rebooted the client and it's still unable to mount? That
>> sounds like a server problem if so.
>>
>> Are both client and server running the same kernel?
> No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
> 5.14.0-362.18.1.el9_3.
>>
>>>> Something still seems to be wrong on the server though. When I look at
>>>> the directories under /proc/fs/nfsd/clients, there's still a directory
>>>> for the specific client, even though it's no longer running:
>>>>
>>>> # cat 155/info
>>>> clientid: 0xc8edb7f65f4a9ad
>>>> address: "10.87.31.152:819"
>>>> status: confirmed
>>>> seconds from last renew: 33163
>>>> name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
>>>> minor version: 2
>>>> Implementation domain: "kernel.org"
>>>> Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
>>>> PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
>>>> Implementation time: [0, 0]
>>>> callback state: DOWN
>>>> callback address: 10.87.31.152:0
>>>>
>> If you just shut down the client, the server won't immediately purge its
>> record. In fact, assuming you're running the same kernel on the server,
>> it won't purge the client record until there is a conflicting request
>> for its state.
> Is there a way to force such a conflicting request (to get the client
> record to purge)?

Try:

# echo "expire" > /proc/fs/nfsd/clients/155/ctl

-Dai

>>
>>
>>> The nfsdclnts command for this client shows the following delegations:
>>>
>>> # nfsdclnts -f 155/states -t all
>>> Inode number | Type   | Access | Deny | ip address | Filename
>>> 169346743    | open   | r-     | --   | 10.87.31.152:819 |
>>> disconnected dentry
>>> 169346743    | deleg  | r      |      | 10.87.31.152:819 |
>>> disconnected dentry
>>> 169346746    | open   | r-     | --   | 10.87.31.152:819 |
>>> disconnected dentry
>>> 169346746    | deleg  | r      |      | 10.87.31.152:819 |
>>> disconnected dentry
>>>
>>> I see a lot of recent patches regarding directory delegations. Could
>>> this be related to this?
>>>
>>> Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory
>>> delegation?
>>>
>>>
>> No. Directory delegations are a new feature that's still under
>> development. They use some of the same machinery as file delegations,
>> but they wouldn't be a factor here.
>>
>>>> The system seems to have identified that the client is no longer
>>>> reachable, but the client entry does not go away. When a mount was
>>>> hanging on the client, there would be two directories in clients for
>>>> the same client. Killing the mount command clears up the second entry.
>>>>
>>>> Even after running conntrack -D on the server to remove the tcp
>>>> connection from the conntrack table, the entry doesn't go away and the
>>>> client still can not mount anything from the server.
>>>>
>>>> A tcpdump on the client while a mount was running logged the following
>>>> messages over and over again:
>>>>
>>>> request:
>>>>
>>>> Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024 bits)
>>>> Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
>>>> ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
>>>> Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
>>>> Transmission Control Protocol, Src Port: 932, Dst Port: 2049, Seq: 1,
>>>> Ack: 1, Len: 312
>>>> Remote Procedure Call, Type:Call XID:0x1d3220c4
>>>> Network File System
>>>>      [Program Version: 4]
>>>>      [V4 Procedure: COMPOUND (1)]
>>>>      GSS Data, Ops(1): CREATE_SESSION
>>>>          Length: 152
>>>>          GSS Sequence Number: 76
>>>>          Tag: <EMPTY>
>>>>          minorversion: 2
>>>>          Operations (count: 1): CREATE_SESSION
>>>>          [Main Opcode: CREATE_SESSION (43)]
>>>>      GSS Checksum:
>>>> 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
>>>>
>>>>          GSS Token Length: 40
>>>>          GSS-API Generic Security Service Application Program
>>>> Interface
>>>>              krb5_blob:
>>>> 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
>>>>
>>>>
>>>> response:
>>>>
>>>> Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648 bits)
>>>> Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
>>>> HP_19:7d:4b (e0:73:e7:19:7d:4b)
>>>> Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
>>>> Transmission Control Protocol, Src Port: 2049, Dst Port: 932, Seq: 1,
>>>> Ack: 313, Len: 140
>>>> Remote Procedure Call, Type:Reply XID:0x1d3220c4
>>>> Network File System
>>>>      [Program Version: 4]
>>>>      [V4 Procedure: COMPOUND (1)]
>>>>      GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
>>>>          Length: 24
>>>>          GSS Sequence Number: 76
>>>>          Status: NFS4ERR_DELAY (10008)
>>>>          Tag: <EMPTY>
>>>>          Operations (count: 1)
>>>>          [Main Opcode: CREATE_SESSION (43)]
>>>>      GSS Checksum:
>>>> 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
>>>>
>>>>          GSS Token Length: 40
>>>>          GSS-API Generic Security Service Application Program
>>>> Interface
>>>>              krb5_blob:
>>>> 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
>>>>
>>>>
>>>> I was hoping that giving the client a different IP address would
>>>> resolve the issue for this client, but it didn't. Even though the
>>>> client had a new IP address (hostname was kept the same), it failed to
>>>> mount anything from the server.
>>>>
>> Changing the IP address won't help. The client is probably using the
>> same long-form client id as before, so the server still identifies the
>> client even with the address change.
> How is the client id determined? Will changing the hostname of the
> client trigger a change of the client id?
>>
>> Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
>> The client is expected to back off and retry, so if the server keeps
>> returning that repeatedly, then a hung mount command is expected.
>>
>> The question is why the server would keep returning DELAY. A lot of
>> different problems ranging from memory allocation issues to protocol
>> problems can result in that error. You may want to check the NFS server
>> and see if anything was logged there.
> There are no messages in the system logs that indicate any sort of
> memory issue. We also increased the min_kbytes_free sysctl to 2G on
> the server before we restarted it with the newer kernel.
>>
>> This is on a CREATE_SESSION call, so I wonder if the record held by the
>> (courteous) server is somehow blocking the attempt to reestablish the
>> session?
>>
>> Do you have a way to reproduce this? Since this is a centos kernel, you
>> could follow the page here to open a bug:
>
> Unfortunately we haven't found a reliable way to reproduce it. But we
> do seem to trigger it more and more lately.
>
> Regards,
>
> Rik
>
>>
>> https://urldefense.com/v3/__https://wiki.centos.org/ReportBugs.html__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkWIqsboq$
>>
>>
>>>> I created another dump of the workqueues and worker pools on the
>>>> server:
>>>>
>>>> [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
>>>> [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>> active=1/256 refcnt=2
>>>> [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
>>>> [drm_kms_helper]
>>>> [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient:
>>>> flags=0x80
>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>> active=1/256 refcnt=2
>>>> [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
>>>> [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>> active=1/256 refcnt=3
>>>> [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu BAR(362)
>>>> [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
>>>> [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0 nice=-20
>>>> active=1/256 refcnt=2
>>>> [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
>>>>
>>>>
>>>> In contrast to last time, it doesn't show anything regarding nfs this
>>>> time.
>>>>
>>>> I also tried the suggestion from Dai Ngo (echo 3 >
>>>> /proc/sys/vm/drop_caches), but that didn't seem to make any
>>>> difference.
>>>>
>>>> We haven't restarted the server yet as it seems the impact seems to
>>>> affect fewer clients that before. Is there anything we can run on the
>>>> server to further debug this?
>>>>
>>>> In the past, the issue seemed to deteriorate rapidly and resulted in
>>>> issues for almost all clients after about 20 minutes. This time the
>>>> impact seems to be less, but it's not gone.
>>>>
>>>> How can we force the NFS server to forget about a specific client? I
>>>> haven't tried to restart the nfs service yet as I'm afraid it will
>>>> fail to stop as before.
>>>>
>> Not with that kernel. There are some new administrative interfaces that
>> might allow that in the future, but they were just merged upstream and
>> aren't in that kernel.
>>
>> --
>> Jeff Layton <[email protected]>
>

2024-03-19 19:41:28

by Rik Theys

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

Hi,

On 3/19/24 18:09, Dai Ngo wrote:
>
> On 3/19/24 12:58 AM, Rik Theys wrote:
>> Hi,
>>
>> On 3/18/24 22:54, Jeff Layton wrote:
>>> On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
>>>> Hi,
>>>>
>>>> On 3/18/24 21:21, Rik Theys wrote:
>>>>> Hi Jeff,
>>>>>
>>>>> On 3/12/24 13:47, Jeff Layton wrote:
>>>>>> On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
>>>>>>> Hi Jeff,
>>>>>>>
>>>>>>> On 3/12/24 12:22, Jeff Layton wrote:
>>>>>>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>>>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>>>>>>> logged hung nfsd tasks. The initial effect was that some clients
>>>>>>>>> could no longer access the NFS server. This got worse and worse
>>>>>>>>> (probably as more nfsd threads got blocked) and we had to restart
>>>>>>>>> the server. Restarting the server also failed as the NFS server
>>>>>>>>> service could no longer be stopped.
>>>>>>>>>
>>>>>>>>> The initial kernel we noticed this behavior on was
>>>>>>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>>>>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
>>>>>>>>> happened again on this newer kernel version:
>>>>>> 419 is fairly up to date with nfsd changes. There are some known
>>>>>> bugs
>>>>>> around callbacks, and there is a draft MR in flight to fix it.
>>>>>>
>>>>>> What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ?
>>>>>> If we
>>>>>> can bracket the changes around a particular version, then that might
>>>>>> help identify the problem.
>>>>>>
>>>>>>>>> [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>>>     [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D
>>>>>>>>> stack:0
>>>>>>>>>      pid:8865  ppid:2      flags:0x00004000
>>>>>>>>>     [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>> __pfx_schedule_timeout+0x10/0x10
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>  nfsd4_shutdown_callback+0x49/0x120
>>>>>>>>> [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0
>>>>>>>>> [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>> nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290
>>>>>>>>> [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770
>>>>>>>>> [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>> nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>> [sunrpc]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>> __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>>>     [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
>>>>>>>>> more than 122 seconds.
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>>>     [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D
>>>>>>>>> stack:0
>>>>>>>>>      pid:8866  ppid:2      flags:0x00004000
>>>>>>>>>     [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>> __pfx_schedule_timeout+0x10/0x10
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240
>>>>>>>>> [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>> [sunrpc]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>> __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>>>
>>>>>>>> The above threads are trying to flush the workqueue, so that
>>>>>>>> probably
>>>>>>>> means that they are stuck waiting on a workqueue job to finish.
>>>>>>>>>     The above is repeated a few times, and then this warning is
>>>>>>>>> also logged:
>>>>>>>>>     [Mon Mar 11 14:12:04 2024] ------------[ cut here
>>>>>>>>> ]------------
>>>>>>>>>     [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>>>>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
>>>>>>>>> dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
>>>>>>>>> iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter
>>>>>>>>> nft_ct
>>>>>>>>>     nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>>>>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
>>>>>>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>>>>>>     ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>>>>>>> intel_rapl_common intel_uncore_frequency
>>>>>>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>>>>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>>>>>>     _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>>>>>>>>> dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
>>>>>>>>> dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>>>>>>     ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt
>>>>>>>>> acpi_ipmi
>>>>>>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>>>>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>>>>>>     nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
>>>>>>>>> mbcache jbd2 sd_mod sg lpfc
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc
>>>>>>>>> nvme_fabrics
>>>>>>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
>>>>>>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>>>>>>     el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>>>>>>> dm_region_hash dm_log dm_mod
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>>>>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
>>>>>>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RIP:
>>>>>>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff
>>>>>>>>> ff 48
>>>>>>>>> 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
>>>>>>>>> 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>>>>>>     02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
>>>>>>>>> 00010246
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>>>>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>>>>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>>>>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>>>>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>>>>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
>>>>>>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>>>>>>>> 0000000080050033
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>>>>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>>>>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>> [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>> [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>> [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>> nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830
>>>>>>>>> [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0
>>>>>>>>> [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]
>>>>>>>>>  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]
>>>>>>>>>  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>> [sunrpc]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>> __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>>>>>>     [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651
>>>>>>>>> ]---
>>>>>>>> This is probably this WARN in nfsd_break_one_deleg:
>>>>>>>>
>>>>>>>> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>>>>>>>>
>>>>>>>> It means that a delegation break callback to the client
>>>>>>>> couldn't be
>>>>>>>> queued to the workqueue, and so it didn't run.
>>>>>>>>
>>>>>>>>> Could this be the same issue as described
>>>>>>>>> here:https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/[email protected]/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdBV9En7$
>>>>>>>>> ?
>>>>>>>> Yes, most likely the same problem.
>>>>>>> If I read that thread correctly, this issue was introduced between
>>>>>>> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
>>>>>>> backported these changes, or were we hitting some other bug with
>>>>>>> that
>>>>>>> version? It seems the 6.1.x kernel is not affected? If so, that
>>>>>>> would be
>>>>>>> the recommended kernel to run?
>>>>>> Anything is possible. We have to identify the problem first.
>>>>>>>>> As described in that thread, I've tried to obtain the requested
>>>>>>>>> information.
>>>>>>>>>
>>>>>>>>> Is it possible this is the issue that was fixed by the patches
>>>>>>>>> described
>>>>>>>>> here?https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkedtUP09$
>>>>>>>>>
>>>>>>>> Doubtful. Those are targeted toward a different set of issues.
>>>>>>>>
>>>>>>>> If you're willing, I do have some patches queued up for CentOS
>>>>>>>> here
>>>>>>>> that
>>>>>>>> fix some backchannel problems that could be related. I'm mainly
>>>>>>>> waiting
>>>>>>>> on Chuck to send these to Linus and then we'll likely merge
>>>>>>>> them into
>>>>>>>> CentOS soon afterward:
>>>>>>>>
>>>>>>>> https://urldefense.com/v3/__https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdvDn8y7$
>>>>>>>>
>>>>>>>>
>>>>>>> If you can send me a patch file, I can rebuild the C9S kernel
>>>>>>> with that
>>>>>>> patch and run it. It can take a while for the bug to trigger as I
>>>>>>> believe it seems to be very workload dependent (we were running
>>>>>>> very
>>>>>>> stable for months and now hit this bug every other week).
>>>>>>>
>>>>>>>
>>>>>> It's probably simpler to just pull down the build artifacts for
>>>>>> that MR.
>>>>>> You have to drill down through the CI for it, but they are here:
>>>>>>
>>>>>> https://urldefense.com/v3/__https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts*1194300175*publish_x86_64*6278921877*artifacts*__;Ly8vLy8!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkaP5eW8V$
>>>>>>
>>>>>>
>>>>>> There's even a repo file you can install on the box to pull them
>>>>>> down.
>>>>> We installed this kernel on the server 3 days ago. Today, a user
>>>>> informed us that their screen was black after logging in. Similar to
>>>>> other occurrences of this issue, the mount command on the client was
>>>>> hung. But in contrast to the other times, there were no messages in
>>>>> the logs kernel logs on the server. Even restarting the client does
>>>>> not resolve the issue.
>>>
>>> Ok, so you rebooted the client and it's still unable to mount? That
>>> sounds like a server problem if so.
>>>
>>> Are both client and server running the same kernel?
>> No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
>> 5.14.0-362.18.1.el9_3.
>>>
>>>>> Something still seems to be wrong on the server though. When I
>>>>> look at
>>>>> the directories under /proc/fs/nfsd/clients, there's still a
>>>>> directory
>>>>> for the specific client, even though it's no longer running:
>>>>>
>>>>> # cat 155/info
>>>>> clientid: 0xc8edb7f65f4a9ad
>>>>> address: "10.87.31.152:819"
>>>>> status: confirmed
>>>>> seconds from last renew: 33163
>>>>> name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
>>>>> minor version: 2
>>>>> Implementation domain: "kernel.org"
>>>>> Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
>>>>> PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
>>>>> Implementation time: [0, 0]
>>>>> callback state: DOWN
>>>>> callback address: 10.87.31.152:0
>>>>>
>>> If you just shut down the client, the server won't immediately purge
>>> its
>>> record. In fact, assuming you're running the same kernel on the server,
>>> it won't purge the client record until there is a conflicting request
>>> for its state.
>> Is there a way to force such a conflicting request (to get the client
>> record to purge)?
>
> Try:
>
> # echo "expire" > /proc/fs/nfsd/clients/155/ctl

I've tried that. The command hangs and can not be interrupted with ctrl-c.

I've now also noticed in the dmesg output that the kernel issued the
following WARNING a few hours ago. It wasn't directly triggered by the
echo command above, but seems to have been triggered a few hours ago
(probably when another client started to have the same problem as more
clients are experiencing issues now).

[Tue Mar 19 14:53:44 2024] ------------[ cut here ]------------
[Tue Mar 19 14:53:44 2024] WARNING: CPU: 44 PID: 5843 at
fs/nfsd/nfs4state.c:4920 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
[Tue Mar 19 14:53:44 2024] Modules linked in: nf_conntrack_netlink nfsv4
dns_resolver nfs fscache netfs binfmt_misc xsk_diag rpcsec_gss_krb5
rpcrdma rdma_cm iw_cm ib_cm ib_core bonding tls rfkill nft_counter
nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat fat
dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c
dm_service_time dm_multipath intel_rapl_msr intel_rapl_common
intel_uncore_frequency intel_uncore_frequency_common isst_if_common
skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm dcdbas irqbypass ipmi_ssif rapl intel_cstate mgag200
i2c_algo_bit drm_shmem_helper drm_kms_helper dell_smbios syscopyarea
intel_uncore sysfillrect wmi_bmof dell_wmi_descriptor pcspkr sysimgblt
fb_sys_fops mei_me i2c_i801 mei intel_pch_thermal acpi_ipmi i2c_smbus
lpc_ich ipmi_si ipmi_devintf ipmi_msghandler joydev acpi_power_meter
nfsd nfs_acl lockd auth_rpcgss grace drm fuse sunrpc ext4
[Tue Mar 19 14:53:44 2024]  mbcache jbd2 sd_mod sg lpfc nvmet_fc nvmet
nvme_fc nvme_fabrics crct10dif_pclmul crc32_pclmul nvme_core ixgbe
crc32c_intel ahci libahci nvme_common megaraid_sas t10_pi
ghash_clmulni_intel wdat_wdt libata scsi_transport_fc mdio dca wmi
dm_mirror dm_region_hash dm_log dm_mod
[Tue Mar 19 14:53:44 2024] CPU: 44 PID: 5843 Comm: nfsd Not tainted
5.14.0-427.3689_1194299994.el9.x86_64 #1
[Tue Mar 19 14:53:44 2024] Hardware name: Dell Inc. PowerEdge
R740/00WGD1, BIOS 2.20.1 09/13/2023
[Tue Mar 19 14:53:44 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
[Tue Mar 19 14:53:44 2024] Code: 76 76 cd de e9 ff fe ff ff 48 89 df be
01 00 00 00 e8 34 a1 1b df 48 8d bb 98 00 00 00 e8 a8 fe 00 00 84 c0 0f
85 2e ff ff ff <0f> 0b e9 27 ff ff ff be 02 00 00 00 48 89 df e8 0c a1
1b df e9 01
[Tue Mar 19 14:53:44 2024] RSP: 0018:ffffb2878f2cfc38 EFLAGS: 00010246
[Tue Mar 19 14:53:44 2024] RAX: 0000000000000000 RBX: ffff88d5171067b8
RCX: 0000000000000000
[Tue Mar 19 14:53:44 2024] RDX: ffff88d517106880 RSI: ffff88bdceec8600
RDI: 0000000000002000
[Tue Mar 19 14:53:44 2024] RBP: ffff88d68a38a284 R08: ffffb2878f2cfc00
R09: 0000000000000000
[Tue Mar 19 14:53:44 2024] R10: ffff88bf57dd7878 R11: 0000000000000000
R12: ffff88d5b79c4798
[Tue Mar 19 14:53:44 2024] R13: ffff88d68a38a270 R14: ffff88cab06ad0c8
R15: ffff88d5b79c4798
[Tue Mar 19 14:53:44 2024] FS:  0000000000000000(0000)
GS:ffff88d4a1180000(0000) knlGS:0000000000000000
[Tue Mar 19 14:53:44 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Mar 19 14:53:44 2024] CR2: 00007fe46ef90000 CR3: 000000019d010004
CR4: 00000000007706e0
[Tue Mar 19 14:53:44 2024] DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
[Tue Mar 19 14:53:44 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
DR7: 0000000000000400
[Tue Mar 19 14:53:44 2024] PKRU: 55555554
[Tue Mar 19 14:53:44 2024] Call Trace:
[Tue Mar 19 14:53:44 2024]  <TASK>
[Tue Mar 19 14:53:44 2024]  ? show_trace_log_lvl+0x1c4/0x2df
[Tue Mar 19 14:53:44 2024]  ? show_trace_log_lvl+0x1c4/0x2df
[Tue Mar 19 14:53:44 2024]  ? __break_lease+0x16f/0x5f0
[Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
[Tue Mar 19 14:53:44 2024]  ? __warn+0x81/0x110
[Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
[Tue Mar 19 14:53:44 2024]  ? report_bug+0x10a/0x140
[Tue Mar 19 14:53:44 2024]  ? handle_bug+0x3c/0x70
[Tue Mar 19 14:53:44 2024]  ? exc_invalid_op+0x14/0x70
[Tue Mar 19 14:53:44 2024]  ? asm_exc_invalid_op+0x16/0x20
[Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
[Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x96/0x190 [nfsd]
[Tue Mar 19 14:53:44 2024]  __break_lease+0x16f/0x5f0
[Tue Mar 19 14:53:44 2024]  nfs4_get_vfs_file+0x164/0x3a0 [nfsd]
[Tue Mar 19 14:53:44 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
[Tue Mar 19 14:53:44 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
[Tue Mar 19 14:53:44 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
[Tue Mar 19 14:53:44 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
[Tue Mar 19 14:53:44 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
[Tue Mar 19 14:53:44 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
[Tue Mar 19 14:53:44 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Tue Mar 19 14:53:44 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Tue Mar 19 14:53:44 2024]  svc_process+0x12d/0x170 [sunrpc]
[Tue Mar 19 14:53:44 2024]  nfsd+0x84/0xb0 [nfsd]
[Tue Mar 19 14:53:44 2024]  kthread+0xdd/0x100
[Tue Mar 19 14:53:44 2024]  ? __pfx_kthread+0x10/0x10
[Tue Mar 19 14:53:44 2024]  ret_from_fork+0x29/0x50
[Tue Mar 19 14:53:44 2024]  </TASK>
[Tue Mar 19 14:53:44 2024] ---[ end trace ed0b2b3f135c637d ]---

It again seems to have been triggered in nfsd_break_deleg_cb?

I also had the following perf command running a tmux on the server:

perf trace -e nfsd:nfsd_cb_recall_any

This has spewed a lot of messages. I'm including a short list here:

...

33464866.721 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210688785, bmval0: 1, addr: 0x7f331bb116c8)
33464866.724 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210688827, bmval0: 1, addr: 0x7f331bb11738)
33464866.729 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210688767, bmval0: 1, addr: 0x7f331bb117a8)
33464866.732 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210718132, bmval0: 1, addr: 0x7f331bb11818)
33464866.737 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210688952, bmval0: 1, addr: 0x7f331bb11888)
33464866.741 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210702355, bmval0: 1, addr: 0x7f331bb118f8)
33868414.001 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210688751, bmval0: 1, addr: 0x7f331be68620)
33868414.014 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210718536, bmval0: 1, addr: 0x7f331be68690)
33868414.018 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210719074, bmval0: 1, addr: 0x7f331be68700)
33868414.022 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210688916, bmval0: 1, addr: 0x7f331be68770)
33868414.026 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210688941, bmval0: 1, addr: 0x7f331be687e0)
...

33868414.924 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210688744, bmval0: 1, addr: 0x7f331be6d7f0)
33868414.929 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210717223, bmval0: 1, addr: 0x7f331be6d860)
33868414.934 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210716137, bmval0: 1, addr: 0x7f331be6d8d0)
34021240.903 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210688941, bmval0: 1, addr: 0x7f331c207de8)
34021240.917 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210718750, bmval0: 1, addr: 0x7f331c207e58)
34021240.922 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210688955, bmval0: 1, addr: 0x7f331c207ec8)
34021240.925 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
1710533037, cl_id: 210688975, bmval0: 1, addr: 0x7f331c207f38)
...

I assume the cl_id is the client id? How can I map this to a client from
/proc/fs/nfsd/clients?

If I understand it correctly, the recall_any should be called when
either the system starts to experience memory pressure, or it reaches
the delegation limits? I doubt the system is actually running out of
memory here as there are no other indications. Shouldn't I get those
"page allocation failure" messages if it does? How can I check the
number of delegations/leases currently issued, what the current maximum
is and how to increase it?

Regarding the recall any call: from what I've read on kernelnewbies,
this feature was introduced in the 6.2 kernel? When I look at the tree
for 6.1.x, it was backported in 6.1.81? Is there a way to disable this
support somehow?

Regards,

Rik


>
> -Dai
>
>>>
>>>
>>>> The nfsdclnts command for this client shows the following delegations:
>>>>
>>>> # nfsdclnts -f 155/states -t all
>>>> Inode number | Type   | Access | Deny | ip address | Filename
>>>> 169346743    | open   | r-     | --   | 10.87.31.152:819 |
>>>> disconnected dentry
>>>> 169346743    | deleg  | r      |      | 10.87.31.152:819 |
>>>> disconnected dentry
>>>> 169346746    | open   | r-     | --   | 10.87.31.152:819 |
>>>> disconnected dentry
>>>> 169346746    | deleg  | r      |      | 10.87.31.152:819 |
>>>> disconnected dentry
>>>>
>>>> I see a lot of recent patches regarding directory delegations. Could
>>>> this be related to this?
>>>>
>>>> Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory
>>>> delegation?
>>>>
>>>>
>>> No. Directory delegations are a new feature that's still under
>>> development. They use some of the same machinery as file delegations,
>>> but they wouldn't be a factor here.
>>>
>>>>> The system seems to have identified that the client is no longer
>>>>> reachable, but the client entry does not go away. When a mount was
>>>>> hanging on the client, there would be two directories in clients for
>>>>> the same client. Killing the mount command clears up the second
>>>>> entry.
>>>>>
>>>>> Even after running conntrack -D on the server to remove the tcp
>>>>> connection from the conntrack table, the entry doesn't go away and
>>>>> the
>>>>> client still can not mount anything from the server.
>>>>>
>>>>> A tcpdump on the client while a mount was running logged the
>>>>> following
>>>>> messages over and over again:
>>>>>
>>>>> request:
>>>>>
>>>>> Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024
>>>>> bits)
>>>>> Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
>>>>> ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
>>>>> Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
>>>>> Transmission Control Protocol, Src Port: 932, Dst Port: 2049, Seq: 1,
>>>>> Ack: 1, Len: 312
>>>>> Remote Procedure Call, Type:Call XID:0x1d3220c4
>>>>> Network File System
>>>>>      [Program Version: 4]
>>>>>      [V4 Procedure: COMPOUND (1)]
>>>>>      GSS Data, Ops(1): CREATE_SESSION
>>>>>          Length: 152
>>>>>          GSS Sequence Number: 76
>>>>>          Tag: <EMPTY>
>>>>>          minorversion: 2
>>>>>          Operations (count: 1): CREATE_SESSION
>>>>>          [Main Opcode: CREATE_SESSION (43)]
>>>>>      GSS Checksum:
>>>>> 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
>>>>>
>>>>>          GSS Token Length: 40
>>>>>          GSS-API Generic Security Service Application Program
>>>>> Interface
>>>>>              krb5_blob:
>>>>> 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
>>>>>
>>>>>
>>>>> response:
>>>>>
>>>>> Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648
>>>>> bits)
>>>>> Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
>>>>> HP_19:7d:4b (e0:73:e7:19:7d:4b)
>>>>> Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
>>>>> Transmission Control Protocol, Src Port: 2049, Dst Port: 932, Seq: 1,
>>>>> Ack: 313, Len: 140
>>>>> Remote Procedure Call, Type:Reply XID:0x1d3220c4
>>>>> Network File System
>>>>>      [Program Version: 4]
>>>>>      [V4 Procedure: COMPOUND (1)]
>>>>>      GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
>>>>>          Length: 24
>>>>>          GSS Sequence Number: 76
>>>>>          Status: NFS4ERR_DELAY (10008)
>>>>>          Tag: <EMPTY>
>>>>>          Operations (count: 1)
>>>>>          [Main Opcode: CREATE_SESSION (43)]
>>>>>      GSS Checksum:
>>>>> 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
>>>>>
>>>>>          GSS Token Length: 40
>>>>>          GSS-API Generic Security Service Application Program
>>>>> Interface
>>>>>              krb5_blob:
>>>>> 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
>>>>>
>>>>>
>>>>> I was hoping that giving the client a different IP address would
>>>>> resolve the issue for this client, but it didn't. Even though the
>>>>> client had a new IP address (hostname was kept the same), it
>>>>> failed to
>>>>> mount anything from the server.
>>>>>
>>> Changing the IP address won't help. The client is probably using the
>>> same long-form client id as before, so the server still identifies the
>>> client even with the address change.
>> How is the client id determined? Will changing the hostname of the
>> client trigger a change of the client id?
>>>
>>> Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
>>> The client is expected to back off and retry, so if the server keeps
>>> returning that repeatedly, then a hung mount command is expected.
>>>
>>> The question is why the server would keep returning DELAY. A lot of
>>> different problems ranging from memory allocation issues to protocol
>>> problems can result in that error. You may want to check the NFS server
>>> and see if anything was logged there.
>> There are no messages in the system logs that indicate any sort of
>> memory issue. We also increased the min_kbytes_free sysctl to 2G on
>> the server before we restarted it with the newer kernel.
>>>
>>> This is on a CREATE_SESSION call, so I wonder if the record held by the
>>> (courteous) server is somehow blocking the attempt to reestablish the
>>> session?
>>>
>>> Do you have a way to reproduce this? Since this is a centos kernel, you
>>> could follow the page here to open a bug:
>>
>> Unfortunately we haven't found a reliable way to reproduce it. But we
>> do seem to trigger it more and more lately.
>>
>> Regards,
>>
>> Rik
>>
>>>
>>> https://urldefense.com/v3/__https://wiki.centos.org/ReportBugs.html__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkWIqsboq$
>>>
>>>
>>>>> I created another dump of the workqueues and worker pools on the
>>>>> server:
>>>>>
>>>>> [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
>>>>> [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>>> active=1/256 refcnt=2
>>>>> [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
>>>>> [drm_kms_helper]
>>>>> [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient:
>>>>> flags=0x80
>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>>> active=1/256 refcnt=2
>>>>> [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
>>>>> [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>>> active=1/256 refcnt=3
>>>>> [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu
>>>>> BAR(362)
>>>>> [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
>>>>> [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0
>>>>> nice=-20
>>>>> active=1/256 refcnt=2
>>>>> [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
>>>>>
>>>>>
>>>>> In contrast to last time, it doesn't show anything regarding nfs this
>>>>> time.
>>>>>
>>>>> I also tried the suggestion from Dai Ngo (echo 3 >
>>>>> /proc/sys/vm/drop_caches), but that didn't seem to make any
>>>>> difference.
>>>>>
>>>>> We haven't restarted the server yet as it seems the impact seems to
>>>>> affect fewer clients that before. Is there anything we can run on the
>>>>> server to further debug this?
>>>>>
>>>>> In the past, the issue seemed to deteriorate rapidly and resulted in
>>>>> issues for almost all clients after about 20 minutes. This time the
>>>>> impact seems to be less, but it's not gone.
>>>>>
>>>>> How can we force the NFS server to forget about a specific client? I
>>>>> haven't tried to restart the nfs service yet as I'm afraid it will
>>>>> fail to stop as before.
>>>>>
>>> Not with that kernel. There are some new administrative interfaces that
>>> might allow that in the future, but they were just merged upstream and
>>> aren't in that kernel.
>>>
>>> --
>>> Jeff Layton <[email protected]>
>>
--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>


2024-03-19 20:55:44

by Jeffrey Layton

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

On Tue, 2024-03-19 at 20:41 +0100, Rik Theys wrote:
> Hi,
>
> On 3/19/24 18:09, Dai Ngo wrote:
> >
> > On 3/19/24 12:58 AM, Rik Theys wrote:
> > > Hi,
> > >
> > > On 3/18/24 22:54, Jeff Layton wrote:
> > > > On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
> > > > > Hi,
> > > > >
> > > > > On 3/18/24 21:21, Rik Theys wrote:
> > > > > > Hi Jeff,
> > > > > >
> > > > > > On 3/12/24 13:47, Jeff Layton wrote:
> > > > > > > On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
> > > > > > > > Hi Jeff,
> > > > > > > >
> > > > > > > > On 3/12/24 12:22, Jeff Layton wrote:
> > > > > > > > > On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
> > > > > > > > > > Since a few weeks our Rocky Linux 9 NFS server has periodically
> > > > > > > > > > logged hung nfsd tasks. The initial effect was that some clients
> > > > > > > > > > could no longer access the NFS server. This got worse and worse
> > > > > > > > > > (probably as more nfsd threads got blocked) and we had to restart
> > > > > > > > > > the server. Restarting the server also failed as the NFS server
> > > > > > > > > > service could no longer be stopped.
> > > > > > > > > >
> > > > > > > > > > The initial kernel we noticed this behavior on was
> > > > > > > > > > kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
> > > > > > > > > > kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
> > > > > > > > > > happened again on this newer kernel version:
> > > > > > > 419 is fairly up to date with nfsd changes. There are some known
> > > > > > > bugs
> > > > > > > around callbacks, and there is a draft MR in flight to fix it.
> > > > > > >
> > > > > > > What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ?
> > > > > > > If we
> > > > > > > can bracket the changes around a particular version, then that might
> > > > > > > help identify the problem.
> > > > > > >
> > > > > > > > > > [Mon Mar 11 14:10:08 2024]       Not tainted
> > > > > > > > > > 5.14.0-419.el9.x86_64 #1
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024] "echo 0 >
> > > > > > > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D
> > > > > > > > > > stack:0
> > > > > > > > > >      pid:8865  ppid:2      flags:0x00004000
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024] Call Trace:
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  <TASK>
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > __pfx_schedule_timeout+0x10/0x10
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]
> > > > > > > > > >  nfsd4_shutdown_callback+0x49/0x120
> > > > > > > > > > [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0
> > > > > > > > > > [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290
> > > > > > > > > > [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770
> > > > > > > > > > [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > nfsd4_decode_opaque+0x3a/0x90 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]
> > > > > > > > > >  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
> > > > > > > > > > [sunrpc]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  </TASK>
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
> > > > > > > > > > more than 122 seconds.
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]       Not tainted
> > > > > > > > > > 5.14.0-419.el9.x86_64 #1
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024] "echo 0 >
> > > > > > > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D
> > > > > > > > > > stack:0
> > > > > > > > > >      pid:8866  ppid:2      flags:0x00004000
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024] Call Trace:
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  <TASK>
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > __pfx_schedule_timeout+0x10/0x10
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  nfsd4_destroy_session+0x1a4/0x240
> > > > > > > > > > [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]
> > > > > > > > > >  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
> > > > > > > > > > [sunrpc]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
> > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  </TASK>
> > > > > > > > > >
> > > > > > > > > The above threads are trying to flush the workqueue, so that
> > > > > > > > > probably
> > > > > > > > > means that they are stuck waiting on a workqueue job to finish.
> > > > > > > > > >     The above is repeated a few times, and then this warning is
> > > > > > > > > > also logged:
> > > > > > > > > >     [Mon Mar 11 14:12:04 2024] ------------[ cut here
> > > > > > > > > > ]------------
> > > > > > > > > >     [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
> > > > > > > > > > fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
> > > > > > > > > > dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
> > > > > > > > > > iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter
> > > > > > > > > > nft_ct
> > > > > > > > > >     nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
> > > > > > > > > > nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
> > > > > > > > > > fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
> > > > > > > > > >     ibcrc32c dm_service_time dm_multipath intel_rapl_msr
> > > > > > > > > > intel_rapl_common intel_uncore_frequency
> > > > > > > > > > intel_uncore_frequency_common isst_if_common skx_edac nfit
> > > > > > > > > > libnvdimm ipmi_ssif x86_pkg_temp
> > > > > > > > > >     _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
> > > > > > > > > > dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
> > > > > > > > > > dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
> > > > > > > > > >     ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt
> > > > > > > > > > acpi_ipmi
> > > > > > > > > > mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
> > > > > > > > > > ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
> > > > > > > > > >     nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
> > > > > > > > > > mbcache jbd2 sd_mod sg lpfc
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc
> > > > > > > > > > nvme_fabrics
> > > > > > > > > > crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
> > > > > > > > > > ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
> > > > > > > > > >     el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
> > > > > > > > > > dm_region_hash dm_log dm_mod
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
> > > > > > > > > > tainted 5.14.0-419.el9.x86_64 #1
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
> > > > > > > > > > R740/00WGD1, BIOS 2.20.1 09/13/2023
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] RIP:
> > > > > > > > > > 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff
> > > > > > > > > > ff 48
> > > > > > > > > > 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
> > > > > > > > > > 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
> > > > > > > > > >     02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
> > > > > > > > > > 00010246
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
> > > > > > > > > > ffff8ada51930900 RCX: 0000000000000024
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
> > > > > > > > > > ffff8ad582933c00 RDI: 0000000000002000
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
> > > > > > > > > > ffff9929e0bb7b48 R09: 0000000000000000
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
> > > > > > > > > > 0000000000000000 R12: ffff8ad6f497c360
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
> > > > > > > > > > ffff8ae5942e0b10 R15: ffff8ad6f497c360
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
> > > > > > > > > > GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
> > > > > > > > > > 0000000080050033
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
> > > > > > > > > > 00000018e58de006 CR4: 00000000007706e0
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
> > > > > > > > > > 0000000000000000 DR2: 0000000000000000
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
> > > > > > > > > > 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] PKRU: 55555554
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] Call Trace:
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  <TASK>
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
> > > > > > > > > > [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
> > > > > > > > > > [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? nfsd_break_deleg_cb+0x170/0x190
> > > > > > > > > > [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ?
> > > > > > > > > > nfsd_file_lookup_locked+0x117/0x160 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830
> > > > > > > > > > [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0
> > > > > > > > > > [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]
> > > > > > > > > >  nfsd4_process_open2+0x430/0xa30 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]
> > > > > > > > > >  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
> > > > > > > > > > [sunrpc]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ?
> > > > > > > > > > __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  </TASK>
> > > > > > > > > >     [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651
> > > > > > > > > > ]---
> > > > > > > > > This is probably this WARN in nfsd_break_one_deleg:
> > > > > > > > >
> > > > > > > > > WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
> > > > > > > > >
> > > > > > > > > It means that a delegation break callback to the client
> > > > > > > > > couldn't be
> > > > > > > > > queued to the workqueue, and so it didn't run.
> > > > > > > > >
> > > > > > > > > > Could this be the same issue as described
> > > > > > > > > > here:https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/[email protected]/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdBV9En7$
> > > > > > > > > > ?
> > > > > > > > > Yes, most likely the same problem.
> > > > > > > > If I read that thread correctly, this issue was introduced between
> > > > > > > > 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
> > > > > > > > backported these changes, or were we hitting some other bug with
> > > > > > > > that
> > > > > > > > version? It seems the 6.1.x kernel is not affected? If so, that
> > > > > > > > would be
> > > > > > > > the recommended kernel to run?
> > > > > > > Anything is possible. We have to identify the problem first.
> > > > > > > > > > As described in that thread, I've tried to obtain the requested
> > > > > > > > > > information.
> > > > > > > > > >
> > > > > > > > > > Is it possible this is the issue that was fixed by the patches
> > > > > > > > > > described
> > > > > > > > > > here?https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkedtUP09$
> > > > > > > > > >
> > > > > > > > > Doubtful. Those are targeted toward a different set of issues.
> > > > > > > > >
> > > > > > > > > If you're willing, I do have some patches queued up for CentOS
> > > > > > > > > here
> > > > > > > > > that
> > > > > > > > > fix some backchannel problems that could be related. I'm mainly
> > > > > > > > > waiting
> > > > > > > > > on Chuck to send these to Linus and then we'll likely merge
> > > > > > > > > them into
> > > > > > > > > CentOS soon afterward:
> > > > > > > > >
> > > > > > > > > https://urldefense.com/v3/__https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdvDn8y7$
> > > > > > > > >
> > > > > > > > >
> > > > > > > > If you can send me a patch file, I can rebuild the C9S kernel
> > > > > > > > with that
> > > > > > > > patch and run it. It can take a while for the bug to trigger as I
> > > > > > > > believe it seems to be very workload dependent (we were running
> > > > > > > > very
> > > > > > > > stable for months and now hit this bug every other week).
> > > > > > > >
> > > > > > > >
> > > > > > > It's probably simpler to just pull down the build artifacts for
> > > > > > > that MR.
> > > > > > > You have to drill down through the CI for it, but they are here:
> > > > > > >
> > > > > > > https://urldefense.com/v3/__https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts*1194300175*publish_x86_64*6278921877*artifacts*__;Ly8vLy8!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkaP5eW8V$
> > > > > > >
> > > > > > >
> > > > > > > There's even a repo file you can install on the box to pull them
> > > > > > > down.
> > > > > > We installed this kernel on the server 3 days ago. Today, a user
> > > > > > informed us that their screen was black after logging in. Similar to
> > > > > > other occurrences of this issue, the mount command on the client was
> > > > > > hung. But in contrast to the other times, there were no messages in
> > > > > > the logs kernel logs on the server. Even restarting the client does
> > > > > > not resolve the issue.
> > > >
> > > > Ok, so you rebooted the client and it's still unable to mount? That
> > > > sounds like a server problem if so.
> > > >
> > > > Are both client and server running the same kernel?
> > > No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
> > > 5.14.0-362.18.1.el9_3.
> > > >
> > > > > > Something still seems to be wrong on the server though. When I
> > > > > > look at
> > > > > > the directories under /proc/fs/nfsd/clients, there's still a
> > > > > > directory
> > > > > > for the specific client, even though it's no longer running:
> > > > > >
> > > > > > # cat 155/info
> > > > > > clientid: 0xc8edb7f65f4a9ad
> > > > > > address: "10.87.31.152:819"
> > > > > > status: confirmed
> > > > > > seconds from last renew: 33163
> > > > > > name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
> > > > > > minor version: 2
> > > > > > Implementation domain: "kernel.org"
> > > > > > Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
> > > > > > PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
> > > > > > Implementation time: [0, 0]
> > > > > > callback state: DOWN
> > > > > > callback address: 10.87.31.152:0
> > > > > >
> > > > If you just shut down the client, the server won't immediately purge
> > > > its
> > > > record. In fact, assuming you're running the same kernel on the server,
> > > > it won't purge the client record until there is a conflicting request
> > > > for its state.
> > > Is there a way to force such a conflicting request (to get the client
> > > record to purge)?
> >
> > Try:
> >
> > # echo "expire" > /proc/fs/nfsd/clients/155/ctl
>
> I've tried that. The command hangs and can not be interrupted with ctrl-c.
>

I'd wager that's the wait_event() in force_expire_client. It seems like
that sleep should be killable.

> I've now also noticed in the dmesg output that the kernel issued the
> following WARNING a few hours ago. It wasn't directly triggered by the
> echo command above, but seems to have been triggered a few hours ago
> (probably when another client started to have the same problem as more
> clients are experiencing issues now).
>
> [Tue Mar 19 14:53:44 2024] ------------[ cut here ]------------
> [Tue Mar 19 14:53:44 2024] WARNING: CPU: 44 PID: 5843 at
> fs/nfsd/nfs4state.c:4920 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> [Tue Mar 19 14:53:44 2024] Modules linked in: nf_conntrack_netlink nfsv4
> dns_resolver nfs fscache netfs binfmt_misc xsk_diag rpcsec_gss_krb5
> rpcrdma rdma_cm iw_cm ib_cm ib_core bonding tls rfkill nft_counter
> nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat fat
> dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c
> dm_service_time dm_multipath intel_rapl_msr intel_rapl_common
> intel_uncore_frequency intel_uncore_frequency_common isst_if_common
> skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm_intel kvm dcdbas irqbypass ipmi_ssif rapl intel_cstate mgag200
> i2c_algo_bit drm_shmem_helper drm_kms_helper dell_smbios syscopyarea
> intel_uncore sysfillrect wmi_bmof dell_wmi_descriptor pcspkr sysimgblt
> fb_sys_fops mei_me i2c_i801 mei intel_pch_thermal acpi_ipmi i2c_smbus
> lpc_ich ipmi_si ipmi_devintf ipmi_msghandler joydev acpi_power_meter
> nfsd nfs_acl lockd auth_rpcgss grace drm fuse sunrpc ext4
> [Tue Mar 19 14:53:44 2024]  mbcache jbd2 sd_mod sg lpfc nvmet_fc nvmet
> nvme_fc nvme_fabrics crct10dif_pclmul crc32_pclmul nvme_core ixgbe
> crc32c_intel ahci libahci nvme_common megaraid_sas t10_pi
> ghash_clmulni_intel wdat_wdt libata scsi_transport_fc mdio dca wmi
> dm_mirror dm_region_hash dm_log dm_mod
> [Tue Mar 19 14:53:44 2024] CPU: 44 PID: 5843 Comm: nfsd Not tainted
> 5.14.0-427.3689_1194299994.el9.x86_64 #1
> [Tue Mar 19 14:53:44 2024] Hardware name: Dell Inc. PowerEdge
> R740/00WGD1, BIOS 2.20.1 09/13/2023
> [Tue Mar 19 14:53:44 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> [Tue Mar 19 14:53:44 2024] Code: 76 76 cd de e9 ff fe ff ff 48 89 df be
> 01 00 00 00 e8 34 a1 1b df 48 8d bb 98 00 00 00 e8 a8 fe 00 00 84 c0 0f
> 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be 02 00 00 00 48 89 df e8 0c a1
> 1b df e9 01
> [Tue Mar 19 14:53:44 2024] RSP: 0018:ffffb2878f2cfc38 EFLAGS: 00010246
> [Tue Mar 19 14:53:44 2024] RAX: 0000000000000000 RBX: ffff88d5171067b8
> RCX: 0000000000000000
> [Tue Mar 19 14:53:44 2024] RDX: ffff88d517106880 RSI: ffff88bdceec8600
> RDI: 0000000000002000
> [Tue Mar 19 14:53:44 2024] RBP: ffff88d68a38a284 R08: ffffb2878f2cfc00
> R09: 0000000000000000
> [Tue Mar 19 14:53:44 2024] R10: ffff88bf57dd7878 R11: 0000000000000000
> R12: ffff88d5b79c4798
> [Tue Mar 19 14:53:44 2024] R13: ffff88d68a38a270 R14: ffff88cab06ad0c8
> R15: ffff88d5b79c4798
> [Tue Mar 19 14:53:44 2024] FS:  0000000000000000(0000)
> GS:ffff88d4a1180000(0000) knlGS:0000000000000000
> [Tue Mar 19 14:53:44 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Tue Mar 19 14:53:44 2024] CR2: 00007fe46ef90000 CR3: 000000019d010004
> CR4: 00000000007706e0
> [Tue Mar 19 14:53:44 2024] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [Tue Mar 19 14:53:44 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
> [Tue Mar 19 14:53:44 2024] PKRU: 55555554
> [Tue Mar 19 14:53:44 2024] Call Trace:
> [Tue Mar 19 14:53:44 2024]  <TASK>
> [Tue Mar 19 14:53:44 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> [Tue Mar 19 14:53:44 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> [Tue Mar 19 14:53:44 2024]  ? __break_lease+0x16f/0x5f0
> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> [Tue Mar 19 14:53:44 2024]  ? __warn+0x81/0x110
> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> [Tue Mar 19 14:53:44 2024]  ? report_bug+0x10a/0x140
> [Tue Mar 19 14:53:44 2024]  ? handle_bug+0x3c/0x70
> [Tue Mar 19 14:53:44 2024]  ? exc_invalid_op+0x14/0x70
> [Tue Mar 19 14:53:44 2024]  ? asm_exc_invalid_op+0x16/0x20
> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x96/0x190 [nfsd]
> [Tue Mar 19 14:53:44 2024]  __break_lease+0x16f/0x5f0
> [Tue Mar 19 14:53:44 2024]  nfs4_get_vfs_file+0x164/0x3a0 [nfsd]
> [Tue Mar 19 14:53:44 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
> [Tue Mar 19 14:53:44 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
> [Tue Mar 19 14:53:44 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
> [Tue Mar 19 14:53:44 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> [Tue Mar 19 14:53:44 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> [Tue Mar 19 14:53:44 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
> [Tue Mar 19 14:53:44 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> [Tue Mar 19 14:53:44 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> [Tue Mar 19 14:53:44 2024]  svc_process+0x12d/0x170 [sunrpc]
> [Tue Mar 19 14:53:44 2024]  nfsd+0x84/0xb0 [nfsd]
> [Tue Mar 19 14:53:44 2024]  kthread+0xdd/0x100
> [Tue Mar 19 14:53:44 2024]  ? __pfx_kthread+0x10/0x10
> [Tue Mar 19 14:53:44 2024]  ret_from_fork+0x29/0x50
> [Tue Mar 19 14:53:44 2024]  </TASK>
> [Tue Mar 19 14:53:44 2024] ---[ end trace ed0b2b3f135c637d ]---
>
> It again seems to have been triggered in nfsd_break_deleg_cb?
>

Same problem as before. It tried to submit the workqueue job, but it was
already queued, so the submission failed.

> I also had the following perf command running a tmux on the server:
>
> perf trace -e nfsd:nfsd_cb_recall_any
>
> This has spewed a lot of messages. I'm including a short list here:
>
> ...
>
> 33464866.721 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688785, bmval0: 1, addr: 0x7f331bb116c8)
> 33464866.724 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688827, bmval0: 1, addr: 0x7f331bb11738)
> 33464866.729 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688767, bmval0: 1, addr: 0x7f331bb117a8)
> 33464866.732 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210718132, bmval0: 1, addr: 0x7f331bb11818)
> 33464866.737 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688952, bmval0: 1, addr: 0x7f331bb11888)
> 33464866.741 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210702355, bmval0: 1, addr: 0x7f331bb118f8)
> 33868414.001 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688751, bmval0: 1, addr: 0x7f331be68620)
> 33868414.014 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210718536, bmval0: 1, addr: 0x7f331be68690)
> 33868414.018 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210719074, bmval0: 1, addr: 0x7f331be68700)
> 33868414.022 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688916, bmval0: 1, addr: 0x7f331be68770)
> 33868414.026 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688941, bmval0: 1, addr: 0x7f331be687e0)
> ...
>
> 33868414.924 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688744, bmval0: 1, addr: 0x7f331be6d7f0)
> 33868414.929 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210717223, bmval0: 1, addr: 0x7f331be6d860)
> 33868414.934 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210716137, bmval0: 1, addr: 0x7f331be6d8d0)
> 34021240.903 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688941, bmval0: 1, addr: 0x7f331c207de8)
> 34021240.917 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210718750, bmval0: 1, addr: 0x7f331c207e58)
> 34021240.922 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688955, bmval0: 1, addr: 0x7f331c207ec8)
> 34021240.925 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688975, bmval0: 1, addr: 0x7f331c207f38)
> ...
>
> I assume the cl_id is the client id? How can I map this to a client from
> /proc/fs/nfsd/clients?
>
> If I understand it correctly, the recall_any should be called when
> either the system starts to experience memory pressure, or it reaches
> the delegation limits? I doubt the system is actually running out of
> memory here as there are no other indications. Shouldn't I get those
> "page allocation failure" messages if it does? How can I check the
> number of delegations/leases currently issued, what the current maximum
> is and how to increase it?

You probably log messages or anything for that. recall_any is hooked up
to a shrinker AFAIU, so we start sending those when the VM politely asks
us to release memory, not when it's an emergency.

The leases are (usually) shown in /proc/locks for most local
filesystems.

>
> Regarding the recall any call: from what I've read on kernelnewbies,
> this feature was introduced in the 6.2 kernel? When I look at the tree
> for 6.1.x, it was backported in 6.1.81? Is there a way to disable this
> support somehow?
>

Not currently. You can disable kernel leases altogether which will
disable delegations. You may want to start there if you're having
stability issues with them enabled.

That said, with a problem like this, it's easy to get lost in peripheral
details. I'm not clear on root cause of this problem yet.

You have a client that is unable to reestablish a session with the
server because the server is returning NFS4ERR_DELAY repeatedly. If
you're able, my suggestion would be to start by trying to determine the
cause of that problem, rather than guessing about different patches, or
turning off server functionality.

Note too that the kernel you're running on the server is a build
artifact from a merge request. Once you're ready to reboot, you may want
to update to the latest centos9 stream kernel, since those patches have
now been merged into it (along with some other NFS fixes).


> > > > > The nfsdclnts command for this client shows the following delegations:
> > > > >
> > > > > # nfsdclnts -f 155/states -t all
> > > > > Inode number | Type   | Access | Deny | ip address | Filename
> > > > > 169346743    | open   | r-     | --   | 10.87.31.152:819 |
> > > > > disconnected dentry
> > > > > 169346743    | deleg  | r      |      | 10.87.31.152:819 |
> > > > > disconnected dentry
> > > > > 169346746    | open   | r-     | --   | 10.87.31.152:819 |
> > > > > disconnected dentry
> > > > > 169346746    | deleg  | r      |      | 10.87.31.152:819 |
> > > > > disconnected dentry
> > > > >
> > > > > I see a lot of recent patches regarding directory delegations. Could
> > > > > this be related to this?
> > > > >
> > > > > Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory
> > > > > delegation?
> > > > >
> > > > >
> > > > No. Directory delegations are a new feature that's still under
> > > > development. They use some of the same machinery as file delegations,
> > > > but they wouldn't be a factor here.
> > > >
> > > > > > The system seems to have identified that the client is no longer
> > > > > > reachable, but the client entry does not go away. When a mount was
> > > > > > hanging on the client, there would be two directories in clients for
> > > > > > the same client. Killing the mount command clears up the second
> > > > > > entry.
> > > > > >
> > > > > > Even after running conntrack -D on the server to remove the tcp
> > > > > > connection from the conntrack table, the entry doesn't go away and
> > > > > > the
> > > > > > client still can not mount anything from the server.
> > > > > >
> > > > > > A tcpdump on the client while a mount was running logged the
> > > > > > following
> > > > > > messages over and over again:
> > > > > >
> > > > > > request:
> > > > > >
> > > > > > Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024
> > > > > > bits)
> > > > > > Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
> > > > > > ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
> > > > > > Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
> > > > > > Transmission Control Protocol, Src Port: 932, Dst Port: 2049, Seq: 1,
> > > > > > Ack: 1, Len: 312
> > > > > > Remote Procedure Call, Type:Call XID:0x1d3220c4
> > > > > > Network File System
> > > > > >      [Program Version: 4]
> > > > > >      [V4 Procedure: COMPOUND (1)]
> > > > > >      GSS Data, Ops(1): CREATE_SESSION
> > > > > >          Length: 152
> > > > > >          GSS Sequence Number: 76
> > > > > >          Tag: <EMPTY>
> > > > > >          minorversion: 2
> > > > > >          Operations (count: 1): CREATE_SESSION
> > > > > >          [Main Opcode: CREATE_SESSION (43)]
> > > > > >      GSS Checksum:
> > > > > > 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
> > > > > >
> > > > > >          GSS Token Length: 40
> > > > > >          GSS-API Generic Security Service Application Program
> > > > > > Interface
> > > > > >              krb5_blob:
> > > > > > 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
> > > > > >
> > > > > >
> > > > > > response:
> > > > > >
> > > > > > Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648
> > > > > > bits)
> > > > > > Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
> > > > > > HP_19:7d:4b (e0:73:e7:19:7d:4b)
> > > > > > Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
> > > > > > Transmission Control Protocol, Src Port: 2049, Dst Port: 932, Seq: 1,
> > > > > > Ack: 313, Len: 140
> > > > > > Remote Procedure Call, Type:Reply XID:0x1d3220c4
> > > > > > Network File System
> > > > > >      [Program Version: 4]
> > > > > >      [V4 Procedure: COMPOUND (1)]
> > > > > >      GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
> > > > > >          Length: 24
> > > > > >          GSS Sequence Number: 76
> > > > > >          Status: NFS4ERR_DELAY (10008)
> > > > > >          Tag: <EMPTY>
> > > > > >          Operations (count: 1)
> > > > > >          [Main Opcode: CREATE_SESSION (43)]
> > > > > >      GSS Checksum:
> > > > > > 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
> > > > > >
> > > > > >          GSS Token Length: 40
> > > > > >          GSS-API Generic Security Service Application Program
> > > > > > Interface
> > > > > >              krb5_blob:
> > > > > > 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
> > > > > >
> > > > > >
> > > > > > I was hoping that giving the client a different IP address would
> > > > > > resolve the issue for this client, but it didn't. Even though the
> > > > > > client had a new IP address (hostname was kept the same), it
> > > > > > failed to
> > > > > > mount anything from the server.
> > > > > >
> > > > Changing the IP address won't help. The client is probably using the
> > > > same long-form client id as before, so the server still identifies the
> > > > client even with the address change.
> > > How is the client id determined? Will changing the hostname of the
> > > client trigger a change of the client id?
> > > >
> > > > Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
> > > > The client is expected to back off and retry, so if the server keeps
> > > > returning that repeatedly, then a hung mount command is expected.
> > > >
> > > > The question is why the server would keep returning DELAY. A lot of
> > > > different problems ranging from memory allocation issues to protocol
> > > > problems can result in that error. You may want to check the NFS server
> > > > and see if anything was logged there.
> > > There are no messages in the system logs that indicate any sort of
> > > memory issue. We also increased the min_kbytes_free sysctl to 2G on
> > > the server before we restarted it with the newer kernel.
> > > >
> > > > This is on a CREATE_SESSION call, so I wonder if the record held by the
> > > > (courteous) server is somehow blocking the attempt to reestablish the
> > > > session?
> > > >
> > > > Do you have a way to reproduce this? Since this is a centos kernel, you
> > > > could follow the page here to open a bug:
> > >
> > > Unfortunately we haven't found a reliable way to reproduce it. But we
> > > do seem to trigger it more and more lately.
> > >
> > > Regards,
> > >
> > > Rik
> > >
> > > >
> > > > https://urldefense.com/v3/__https://wiki.centos.org/ReportBugs.html__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkWIqsboq$
> > > >
> > > >
> > > > > > I created another dump of the workqueues and worker pools on the
> > > > > > server:
> > > > > >
> > > > > > [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
> > > > > > [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
> > > > > > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> > > > > > active=1/256 refcnt=2
> > > > > > [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
> > > > > > [drm_kms_helper]
> > > > > > [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient:
> > > > > > flags=0x80
> > > > > > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> > > > > > active=1/256 refcnt=2
> > > > > > [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
> > > > > > [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
> > > > > > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
> > > > > > active=1/256 refcnt=3
> > > > > > [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu
> > > > > > BAR(362)
> > > > > > [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
> > > > > > [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0
> > > > > > nice=-20
> > > > > > active=1/256 refcnt=2
> > > > > > [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
> > > > > >
> > > > > >
> > > > > > In contrast to last time, it doesn't show anything regarding nfs this
> > > > > > time.
> > > > > >
> > > > > > I also tried the suggestion from Dai Ngo (echo 3 >
> > > > > > /proc/sys/vm/drop_caches), but that didn't seem to make any
> > > > > > difference.
> > > > > >
> > > > > > We haven't restarted the server yet as it seems the impact seems to
> > > > > > affect fewer clients that before. Is there anything we can run on the
> > > > > > server to further debug this?
> > > > > >
> > > > > > In the past, the issue seemed to deteriorate rapidly and resulted in
> > > > > > issues for almost all clients after about 20 minutes. This time the
> > > > > > impact seems to be less, but it's not gone.
> > > > > >
> > > > > > How can we force the NFS server to forget about a specific client? I
> > > > > > haven't tried to restart the nfs service yet as I'm afraid it will
> > > > > > fail to stop as before.
> > > > > >
> > > > Not with that kernel. There are some new administrative interfaces that
> > > > might allow that in the future, but they were just merged upstream and
> > > > aren't in that kernel.
> > > >
> > > > --
> > > > Jeff Layton <[email protected]>
> > >

--
Jeff Layton <[email protected]>

2024-03-19 21:45:07

by Dai Ngo

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning


On 3/19/24 12:41 PM, Rik Theys wrote:
> Hi,
>
> On 3/19/24 18:09, Dai Ngo wrote:
>>
>> On 3/19/24 12:58 AM, Rik Theys wrote:
>>> Hi,
>>>
>>> On 3/18/24 22:54, Jeff Layton wrote:
>>>> On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
>>>>> Hi,
>>>>>
>>>>> On 3/18/24 21:21, Rik Theys wrote:
>>>>>> Hi Jeff,
>>>>>>
>>>>>> On 3/12/24 13:47, Jeff Layton wrote:
>>>>>>> On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
>>>>>>>> Hi Jeff,
>>>>>>>>
>>>>>>>> On 3/12/24 12:22, Jeff Layton wrote:
>>>>>>>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>>>>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>>>>>>>> logged hung nfsd tasks. The initial effect was that some clients
>>>>>>>>>> could no longer access the NFS server. This got worse and worse
>>>>>>>>>> (probably as more nfsd threads got blocked) and we had to
>>>>>>>>>> restart
>>>>>>>>>> the server. Restarting the server also failed as the NFS server
>>>>>>>>>> service could no longer be stopped.
>>>>>>>>>>
>>>>>>>>>> The initial kernel we noticed this behavior on was
>>>>>>>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>>>>>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same
>>>>>>>>>> issue
>>>>>>>>>> happened again on this newer kernel version:
>>>>>>> 419 is fairly up to date with nfsd changes. There are some known
>>>>>>> bugs
>>>>>>> around callbacks, and there is a draft MR in flight to fix it.
>>>>>>>
>>>>>>> What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ?
>>>>>>> If we
>>>>>>> can bracket the changes around a particular version, then that
>>>>>>> might
>>>>>>> help identify the problem.
>>>>>>>
>>>>>>>>>> [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D
>>>>>>>>>> stack:0
>>>>>>>>>>      pid:8865  ppid:2      flags:0x00004000
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>> __pfx_schedule_timeout+0x10/0x10
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>  nfsd4_shutdown_callback+0x49/0x120
>>>>>>>>>> [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? nfsd4_cld_remove+0x54/0x1d0
>>>>>>>>>> [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>> nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290
>>>>>>>>>> [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd4_exchange_id+0x75f/0x770
>>>>>>>>>> [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>> nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>>> [sunrpc]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>> __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
>>>>>>>>>> more than 122 seconds.
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D
>>>>>>>>>> stack:0
>>>>>>>>>>      pid:8866  ppid:2      flags:0x00004000
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? select_idle_sibling+0x28/0x430
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>> __pfx_schedule_timeout+0x10/0x10
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>  nfsd4_destroy_session+0x1a4/0x240
>>>>>>>>>> [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>>> [sunrpc]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>> __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>>>>
>>>>>>>>> The above threads are trying to flush the workqueue, so that
>>>>>>>>> probably
>>>>>>>>> means that they are stuck waiting on a workqueue job to finish.
>>>>>>>>>>     The above is repeated a few times, and then this warning is
>>>>>>>>>> also logged:
>>>>>>>>>>     [Mon Mar 11 14:12:04 2024] ------------[ cut here
>>>>>>>>>> ]------------
>>>>>>>>>>     [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>>>>>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
>>>>>>>>>> dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
>>>>>>>>>> iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill
>>>>>>>>>> nft_counter nft_ct
>>>>>>>>>>     nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>>>>>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink
>>>>>>>>>> vfat
>>>>>>>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>>>>>>>     ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>>>>>>>> intel_rapl_common intel_uncore_frequency
>>>>>>>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>>>>>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>>>>>>>     _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>>>>>>>>>> dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
>>>>>>>>>> dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>>>>>>>     ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt
>>>>>>>>>> acpi_ipmi
>>>>>>>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>>>>>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>>>>>>>     nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
>>>>>>>>>> mbcache jbd2 sd_mod sg lpfc
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc
>>>>>>>>>> nvme_fabrics
>>>>>>>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core
>>>>>>>>>> crc32c_intel
>>>>>>>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>>>>>>>     el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>>>>>>>> dm_region_hash dm_log dm_mod
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>>>>>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc.
>>>>>>>>>> PowerEdge
>>>>>>>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RIP:
>>>>>>>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff
>>>>>>>>>> ff 48
>>>>>>>>>> 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8
>>>>>>>>>> c8 f9
>>>>>>>>>> 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>>>>>>>     02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80
>>>>>>>>>> EFLAGS:
>>>>>>>>>> 00010246
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>>>>>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>>>>>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>>>>>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>>>>>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>>>>>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
>>>>>>>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>>>>>>>>> 0000000080050033
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>>>>>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>>>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>>>>>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>> nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>>> [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>> nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>>> [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>> nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>>> [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>> nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd_file_do_acquire+0x790/0x830
>>>>>>>>>> [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfs4_get_vfs_file+0x315/0x3a0
>>>>>>>>>> [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]
>>>>>>>>>>  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]
>>>>>>>>>>  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>>> [sunrpc]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>> __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] ---[ end trace
>>>>>>>>>> 7a039e17443dc651 ]---
>>>>>>>>> This is probably this WARN in nfsd_break_one_deleg:
>>>>>>>>>
>>>>>>>>> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>>>>>>>>>
>>>>>>>>> It means that a delegation break callback to the client
>>>>>>>>> couldn't be
>>>>>>>>> queued to the workqueue, and so it didn't run.
>>>>>>>>>
>>>>>>>>>> Could this be the same issue as described
>>>>>>>>>> here:https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/[email protected]/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdBV9En7$
>>>>>>>>>> ?
>>>>>>>>> Yes, most likely the same problem.
>>>>>>>> If I read that thread correctly, this issue was introduced between
>>>>>>>> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
>>>>>>>> backported these changes, or were we hitting some other bug
>>>>>>>> with that
>>>>>>>> version? It seems the 6.1.x kernel is not affected? If so, that
>>>>>>>> would be
>>>>>>>> the recommended kernel to run?
>>>>>>> Anything is possible. We have to identify the problem first.
>>>>>>>>>> As described in that thread, I've tried to obtain the requested
>>>>>>>>>> information.
>>>>>>>>>>
>>>>>>>>>> Is it possible this is the issue that was fixed by the patches
>>>>>>>>>> described
>>>>>>>>>> here?https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkedtUP09$
>>>>>>>>>>
>>>>>>>>> Doubtful. Those are targeted toward a different set of issues.
>>>>>>>>>
>>>>>>>>> If you're willing, I do have some patches queued up for CentOS
>>>>>>>>> here
>>>>>>>>> that
>>>>>>>>> fix some backchannel problems that could be related. I'm mainly
>>>>>>>>> waiting
>>>>>>>>> on Chuck to send these to Linus and then we'll likely merge
>>>>>>>>> them into
>>>>>>>>> CentOS soon afterward:
>>>>>>>>>
>>>>>>>>> https://urldefense.com/v3/__https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdvDn8y7$
>>>>>>>>>
>>>>>>>>>
>>>>>>>> If you can send me a patch file, I can rebuild the C9S kernel
>>>>>>>> with that
>>>>>>>> patch and run it. It can take a while for the bug to trigger as I
>>>>>>>> believe it seems to be very workload dependent (we were running
>>>>>>>> very
>>>>>>>> stable for months and now hit this bug every other week).
>>>>>>>>
>>>>>>>>
>>>>>>> It's probably simpler to just pull down the build artifacts for
>>>>>>> that MR.
>>>>>>> You have to drill down through the CI for it, but they are here:
>>>>>>>
>>>>>>> https://urldefense.com/v3/__https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts*1194300175*publish_x86_64*6278921877*artifacts*__;Ly8vLy8!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkaP5eW8V$
>>>>>>>
>>>>>>>
>>>>>>> There's even a repo file you can install on the box to pull them
>>>>>>> down.
>>>>>> We installed this kernel on the server 3 days ago. Today, a user
>>>>>> informed us that their screen was black after logging in. Similar to
>>>>>> other occurrences of this issue, the mount command on the client was
>>>>>> hung. But in contrast to the other times, there were no messages in
>>>>>> the logs kernel logs on the server. Even restarting the client does
>>>>>> not resolve the issue.
>>>>
>>>> Ok, so you rebooted the client and it's still unable to mount? That
>>>> sounds like a server problem if so.
>>>>
>>>> Are both client and server running the same kernel?
>>> No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
>>> 5.14.0-362.18.1.el9_3.
>>>>
>>>>>> Something still seems to be wrong on the server though. When I
>>>>>> look at
>>>>>> the directories under /proc/fs/nfsd/clients, there's still a
>>>>>> directory
>>>>>> for the specific client, even though it's no longer running:
>>>>>>
>>>>>> # cat 155/info
>>>>>> clientid: 0xc8edb7f65f4a9ad
>>>>>> address: "10.87.31.152:819"
>>>>>> status: confirmed
>>>>>> seconds from last renew: 33163
>>>>>> name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
>>>>>> minor version: 2
>>>>>> Implementation domain: "kernel.org"
>>>>>> Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
>>>>>> PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
>>>>>> Implementation time: [0, 0]
>>>>>> callback state: DOWN
>>>>>> callback address: 10.87.31.152:0
>>>>>>
>>>> If you just shut down the client, the server won't immediately
>>>> purge its
>>>> record. In fact, assuming you're running the same kernel on the
>>>> server,
>>>> it won't purge the client record until there is a conflicting request
>>>> for its state.
>>> Is there a way to force such a conflicting request (to get the
>>> client record to purge)?
>>
>> Try:
>>
>> # echo "expire" > /proc/fs/nfsd/clients/155/ctl
>
> I've tried that. The command hangs and can not be interrupted with
> ctrl-c.
> I've now also noticed in the dmesg output that the kernel issued the
> following WARNING a few hours ago. It wasn't directly triggered by the
> echo command above, but seems to have been triggered a few hours ago
> (probably when another client started to have the same problem as more
> clients are experiencing issues now).

I think this warning message is harmless. However it indicates potential
problem with the workqueue which might be related to memory shortage.

What the output of 'cat /proc/meminfo' looks like?

Did you try 'echo 3 > /proc/sys/vm/drop_caches'?

>
> [Tue Mar 19 14:53:44 2024] ------------[ cut here ]------------
> [Tue Mar 19 14:53:44 2024] WARNING: CPU: 44 PID: 5843 at
> fs/nfsd/nfs4state.c:4920 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> [Tue Mar 19 14:53:44 2024] Modules linked in: nf_conntrack_netlink
> nfsv4 dns_resolver nfs fscache netfs binfmt_misc xsk_diag
> rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core bonding tls rfkill
> nft_counter nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables
> nfnetlink vfat fat dm_thin_pool dm_persistent_data dm_bio_prison
> dm_bufio libcrc32c dm_service_time dm_multipath intel_rapl_msr
> intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
> isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm dcdbas irqbypass ipmi_ssif
> rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper drm_kms_helper
> dell_smbios syscopyarea intel_uncore sysfillrect wmi_bmof
> dell_wmi_descriptor pcspkr sysimgblt fb_sys_fops mei_me i2c_i801 mei
> intel_pch_thermal acpi_ipmi i2c_smbus lpc_ich ipmi_si ipmi_devintf
> ipmi_msghandler joydev acpi_power_meter nfsd nfs_acl lockd auth_rpcgss
> grace drm fuse sunrpc ext4
> [Tue Mar 19 14:53:44 2024]  mbcache jbd2 sd_mod sg lpfc nvmet_fc nvmet
> nvme_fc nvme_fabrics crct10dif_pclmul crc32_pclmul nvme_core ixgbe
> crc32c_intel ahci libahci nvme_common megaraid_sas t10_pi
> ghash_clmulni_intel wdat_wdt libata scsi_transport_fc mdio dca wmi
> dm_mirror dm_region_hash dm_log dm_mod
> [Tue Mar 19 14:53:44 2024] CPU: 44 PID: 5843 Comm: nfsd Not tainted
> 5.14.0-427.3689_1194299994.el9.x86_64 #1
> [Tue Mar 19 14:53:44 2024] Hardware name: Dell Inc. PowerEdge
> R740/00WGD1, BIOS 2.20.1 09/13/2023
> [Tue Mar 19 14:53:44 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190
> [nfsd]
> [Tue Mar 19 14:53:44 2024] Code: 76 76 cd de e9 ff fe ff ff 48 89 df
> be 01 00 00 00 e8 34 a1 1b df 48 8d bb 98 00 00 00 e8 a8 fe 00 00 84
> c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be 02 00 00 00 48 89 df e8
> 0c a1 1b df e9 01
> [Tue Mar 19 14:53:44 2024] RSP: 0018:ffffb2878f2cfc38 EFLAGS: 00010246
> [Tue Mar 19 14:53:44 2024] RAX: 0000000000000000 RBX: ffff88d5171067b8
> RCX: 0000000000000000
> [Tue Mar 19 14:53:44 2024] RDX: ffff88d517106880 RSI: ffff88bdceec8600
> RDI: 0000000000002000
> [Tue Mar 19 14:53:44 2024] RBP: ffff88d68a38a284 R08: ffffb2878f2cfc00
> R09: 0000000000000000
> [Tue Mar 19 14:53:44 2024] R10: ffff88bf57dd7878 R11: 0000000000000000
> R12: ffff88d5b79c4798
> [Tue Mar 19 14:53:44 2024] R13: ffff88d68a38a270 R14: ffff88cab06ad0c8
> R15: ffff88d5b79c4798
> [Tue Mar 19 14:53:44 2024] FS:  0000000000000000(0000)
> GS:ffff88d4a1180000(0000) knlGS:0000000000000000
> [Tue Mar 19 14:53:44 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> [Tue Mar 19 14:53:44 2024] CR2: 00007fe46ef90000 CR3: 000000019d010004
> CR4: 00000000007706e0
> [Tue Mar 19 14:53:44 2024] DR0: 0000000000000000 DR1: 0000000000000000
> DR2: 0000000000000000
> [Tue Mar 19 14:53:44 2024] DR3: 0000000000000000 DR6: 00000000fffe0ff0
> DR7: 0000000000000400
> [Tue Mar 19 14:53:44 2024] PKRU: 55555554
> [Tue Mar 19 14:53:44 2024] Call Trace:
> [Tue Mar 19 14:53:44 2024]  <TASK>
> [Tue Mar 19 14:53:44 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> [Tue Mar 19 14:53:44 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> [Tue Mar 19 14:53:44 2024]  ? __break_lease+0x16f/0x5f0
> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> [Tue Mar 19 14:53:44 2024]  ? __warn+0x81/0x110
> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> [Tue Mar 19 14:53:44 2024]  ? report_bug+0x10a/0x140
> [Tue Mar 19 14:53:44 2024]  ? handle_bug+0x3c/0x70
> [Tue Mar 19 14:53:44 2024]  ? exc_invalid_op+0x14/0x70
> [Tue Mar 19 14:53:44 2024]  ? asm_exc_invalid_op+0x16/0x20
> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x96/0x190 [nfsd]
> [Tue Mar 19 14:53:44 2024]  __break_lease+0x16f/0x5f0
> [Tue Mar 19 14:53:44 2024]  nfs4_get_vfs_file+0x164/0x3a0 [nfsd]
> [Tue Mar 19 14:53:44 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
> [Tue Mar 19 14:53:44 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
> [Tue Mar 19 14:53:44 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
> [Tue Mar 19 14:53:44 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> [Tue Mar 19 14:53:44 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> [Tue Mar 19 14:53:44 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
> [Tue Mar 19 14:53:44 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> [Tue Mar 19 14:53:44 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> [Tue Mar 19 14:53:44 2024]  svc_process+0x12d/0x170 [sunrpc]
> [Tue Mar 19 14:53:44 2024]  nfsd+0x84/0xb0 [nfsd]
> [Tue Mar 19 14:53:44 2024]  kthread+0xdd/0x100
> [Tue Mar 19 14:53:44 2024]  ? __pfx_kthread+0x10/0x10
> [Tue Mar 19 14:53:44 2024]  ret_from_fork+0x29/0x50
> [Tue Mar 19 14:53:44 2024]  </TASK>
> [Tue Mar 19 14:53:44 2024] ---[ end trace ed0b2b3f135c637d ]---
>
> It again seems to have been triggered in nfsd_break_deleg_cb?
>
> I also had the following perf command running a tmux on the server:
>
> perf trace -e nfsd:nfsd_cb_recall_any
>
> This has spewed a lot of messages. I'm including a short list here:
>
> ...
>
> 33464866.721 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688785, bmval0: 1, addr: 0x7f331bb116c8)
> 33464866.724 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688827, bmval0: 1, addr: 0x7f331bb11738)
> 33464866.729 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688767, bmval0: 1, addr: 0x7f331bb117a8)
> 33464866.732 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210718132, bmval0: 1, addr: 0x7f331bb11818)
> 33464866.737 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688952, bmval0: 1, addr: 0x7f331bb11888)
> 33464866.741 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210702355, bmval0: 1, addr: 0x7f331bb118f8)
> 33868414.001 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688751, bmval0: 1, addr: 0x7f331be68620)
> 33868414.014 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210718536, bmval0: 1, addr: 0x7f331be68690)
> 33868414.018 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210719074, bmval0: 1, addr: 0x7f331be68700)
> 33868414.022 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688916, bmval0: 1, addr: 0x7f331be68770)
> 33868414.026 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688941, bmval0: 1, addr: 0x7f331be687e0)
> ...
>
> 33868414.924 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688744, bmval0: 1, addr: 0x7f331be6d7f0)
> 33868414.929 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210717223, bmval0: 1, addr: 0x7f331be6d860)
> 33868414.934 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210716137, bmval0: 1, addr: 0x7f331be6d8d0)
> 34021240.903 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688941, bmval0: 1, addr: 0x7f331c207de8)
> 34021240.917 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210718750, bmval0: 1, addr: 0x7f331c207e58)
> 34021240.922 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688955, bmval0: 1, addr: 0x7f331c207ec8)
> 34021240.925 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> 1710533037, cl_id: 210688975, bmval0: 1, addr: 0x7f331c207f38)
> ...
>
> I assume the cl_id is the client id? How can I map this to a client
> from /proc/fs/nfsd/clients?

The hex value of 'clientid' printed from /proc/fs/nfsd/clients/XX/info
is a 64-bit value composed of:

typedef struct {
u32 cl_boot;
u32 cl_id;
} clientid_t

For example:

clientid: 0xc8edb7f65f4a9ad

cl_boot: 65f4a9add (1710533037)
cl_id: c8edb7f (21068895)

This should match a trace event with:

nfsd:nfsd_cb_recall_any(cl_boot: 1710533037, cl_id: 21068895, bmval0: XX, addr: 0xYYYYY)

>
> If I understand it correctly, the recall_any should be called when
> either the system starts to experience memory pressure,

yes.

> or it reaches the delegation limits?

No, this feature was added to nfsd very recently. I don't think your kernel has it.

> I doubt the system is actually running out of memory here as there are
> no other indications.
> Shouldn't I get those "page allocation failure" messages if it does?
> How can I check the number of delegations/leases currently issued,
> what the current maximum is and how to increase it?

Max delegations is 4 per 1MB of available memory. There is no
admin tool to adjust this value.

I do not recommend running a production system with delegation
disabled. But for this specific issue, it might help to temporarily
disable delegation to isolate problem areas.

-Dai

>
> Regarding the recall any call: from what I've read on kernelnewbies,
> this feature was introduced in the 6.2 kernel? When I look at the tree
> for 6.1.x, it was backported in 6.1.81? Is there a way to disable this
> support somehow?
>
> Regards,
>
> Rik
>
>
>>
>> -Dai
>>
>>>>
>>>>
>>>>> The nfsdclnts command for this client shows the following
>>>>> delegations:
>>>>>
>>>>> # nfsdclnts -f 155/states -t all
>>>>> Inode number | Type   | Access | Deny | ip address | Filename
>>>>> 169346743    | open   | r-     | --   | 10.87.31.152:819 |
>>>>> disconnected dentry
>>>>> 169346743    | deleg  | r      |      | 10.87.31.152:819 |
>>>>> disconnected dentry
>>>>> 169346746    | open   | r-     | --   | 10.87.31.152:819 |
>>>>> disconnected dentry
>>>>> 169346746    | deleg  | r      |      | 10.87.31.152:819 |
>>>>> disconnected dentry
>>>>>
>>>>> I see a lot of recent patches regarding directory delegations. Could
>>>>> this be related to this?
>>>>>
>>>>> Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory
>>>>> delegation?
>>>>>
>>>>>
>>>> No. Directory delegations are a new feature that's still under
>>>> development. They use some of the same machinery as file delegations,
>>>> but they wouldn't be a factor here.
>>>>
>>>>>> The system seems to have identified that the client is no longer
>>>>>> reachable, but the client entry does not go away. When a mount was
>>>>>> hanging on the client, there would be two directories in clients for
>>>>>> the same client. Killing the mount command clears up the second
>>>>>> entry.
>>>>>>
>>>>>> Even after running conntrack -D on the server to remove the tcp
>>>>>> connection from the conntrack table, the entry doesn't go away
>>>>>> and the
>>>>>> client still can not mount anything from the server.
>>>>>>
>>>>>> A tcpdump on the client while a mount was running logged the
>>>>>> following
>>>>>> messages over and over again:
>>>>>>
>>>>>> request:
>>>>>>
>>>>>> Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024
>>>>>> bits)
>>>>>> Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
>>>>>> ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
>>>>>> Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
>>>>>> Transmission Control Protocol, Src Port: 932, Dst Port: 2049,
>>>>>> Seq: 1,
>>>>>> Ack: 1, Len: 312
>>>>>> Remote Procedure Call, Type:Call XID:0x1d3220c4
>>>>>> Network File System
>>>>>>      [Program Version: 4]
>>>>>>      [V4 Procedure: COMPOUND (1)]
>>>>>>      GSS Data, Ops(1): CREATE_SESSION
>>>>>>          Length: 152
>>>>>>          GSS Sequence Number: 76
>>>>>>          Tag: <EMPTY>
>>>>>>          minorversion: 2
>>>>>>          Operations (count: 1): CREATE_SESSION
>>>>>>          [Main Opcode: CREATE_SESSION (43)]
>>>>>>      GSS Checksum:
>>>>>> 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
>>>>>>
>>>>>>          GSS Token Length: 40
>>>>>>          GSS-API Generic Security Service Application Program
>>>>>> Interface
>>>>>>              krb5_blob:
>>>>>> 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
>>>>>>
>>>>>>
>>>>>> response:
>>>>>>
>>>>>> Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648
>>>>>> bits)
>>>>>> Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
>>>>>> HP_19:7d:4b (e0:73:e7:19:7d:4b)
>>>>>> Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
>>>>>> Transmission Control Protocol, Src Port: 2049, Dst Port: 932,
>>>>>> Seq: 1,
>>>>>> Ack: 313, Len: 140
>>>>>> Remote Procedure Call, Type:Reply XID:0x1d3220c4
>>>>>> Network File System
>>>>>>      [Program Version: 4]
>>>>>>      [V4 Procedure: COMPOUND (1)]
>>>>>>      GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
>>>>>>          Length: 24
>>>>>>          GSS Sequence Number: 76
>>>>>>          Status: NFS4ERR_DELAY (10008)
>>>>>>          Tag: <EMPTY>
>>>>>>          Operations (count: 1)
>>>>>>          [Main Opcode: CREATE_SESSION (43)]
>>>>>>      GSS Checksum:
>>>>>> 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
>>>>>>
>>>>>>          GSS Token Length: 40
>>>>>>          GSS-API Generic Security Service Application Program
>>>>>> Interface
>>>>>>              krb5_blob:
>>>>>> 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
>>>>>>
>>>>>>
>>>>>> I was hoping that giving the client a different IP address would
>>>>>> resolve the issue for this client, but it didn't. Even though the
>>>>>> client had a new IP address (hostname was kept the same), it
>>>>>> failed to
>>>>>> mount anything from the server.
>>>>>>
>>>> Changing the IP address won't help. The client is probably using the
>>>> same long-form client id as before, so the server still identifies the
>>>> client even with the address change.
>>> How is the client id determined? Will changing the hostname of the
>>> client trigger a change of the client id?
>>>>
>>>> Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
>>>> The client is expected to back off and retry, so if the server keeps
>>>> returning that repeatedly, then a hung mount command is expected.
>>>>
>>>> The question is why the server would keep returning DELAY. A lot of
>>>> different problems ranging from memory allocation issues to protocol
>>>> problems can result in that error. You may want to check the NFS
>>>> server
>>>> and see if anything was logged there.
>>> There are no messages in the system logs that indicate any sort of
>>> memory issue. We also increased the min_kbytes_free sysctl to 2G on
>>> the server before we restarted it with the newer kernel.
>>>>
>>>> This is on a CREATE_SESSION call, so I wonder if the record held by
>>>> the
>>>> (courteous) server is somehow blocking the attempt to reestablish the
>>>> session?
>>>>
>>>> Do you have a way to reproduce this? Since this is a centos kernel,
>>>> you
>>>> could follow the page here to open a bug:
>>>
>>> Unfortunately we haven't found a reliable way to reproduce it. But
>>> we do seem to trigger it more and more lately.
>>>
>>> Regards,
>>>
>>> Rik
>>>
>>>>
>>>> https://urldefense.com/v3/__https://wiki.centos.org/ReportBugs.html__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkWIqsboq$
>>>>
>>>>
>>>>>> I created another dump of the workqueues and worker pools on the
>>>>>> server:
>>>>>>
>>>>>> [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
>>>>>> [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
>>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>>>> active=1/256 refcnt=2
>>>>>> [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
>>>>>> [drm_kms_helper]
>>>>>> [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient:
>>>>>> flags=0x80
>>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>>>> active=1/256 refcnt=2
>>>>>> [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
>>>>>> [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
>>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>>>> active=1/256 refcnt=3
>>>>>> [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu
>>>>>> BAR(362)
>>>>>> [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
>>>>>> [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0
>>>>>> nice=-20
>>>>>> active=1/256 refcnt=2
>>>>>> [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
>>>>>>
>>>>>>
>>>>>> In contrast to last time, it doesn't show anything regarding nfs
>>>>>> this
>>>>>> time.
>>>>>>
>>>>>> I also tried the suggestion from Dai Ngo (echo 3 >
>>>>>> /proc/sys/vm/drop_caches), but that didn't seem to make any
>>>>>> difference.
>>>>>>
>>>>>> We haven't restarted the server yet as it seems the impact seems to
>>>>>> affect fewer clients that before. Is there anything we can run on
>>>>>> the
>>>>>> server to further debug this?
>>>>>>
>>>>>> In the past, the issue seemed to deteriorate rapidly and resulted in
>>>>>> issues for almost all clients after about 20 minutes. This time the
>>>>>> impact seems to be less, but it's not gone.
>>>>>>
>>>>>> How can we force the NFS server to forget about a specific client? I
>>>>>> haven't tried to restart the nfs service yet as I'm afraid it will
>>>>>> fail to stop as before.
>>>>>>
>>>> Not with that kernel. There are some new administrative interfaces
>>>> that
>>>> might allow that in the future, but they were just merged upstream and
>>>> aren't in that kernel.
>>>>
>>>> --
>>>> Jeff Layton <[email protected]>
>>>

2024-03-20 13:03:11

by Charles Hedrick

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning



> On Mar 19, 2024, at 7:36 AM, Jeff Layton <[email protected]> wrote:
>
> On Tue, 2024-03-19 at 11:58 +0100, Rik Theys wrote:
>> Hi,
>>
>> On 3/19/24 11:39, Jeff Layton wrote:
>>> On Tue, 2024-03-19 at 08:58 +0100, Rik Theys wrote:
>>>> Hi,
>>>>
>>>> On 3/18/24 22:54, Jeff Layton wrote:
>>>>> On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 3/18/24 21:21, Rik Theys wrote:
>>>>>>> Hi Jeff,
>>>>>>>
>>>>>>> On 3/12/24 13:47, Jeff Layton wrote:
>>>>>>>> On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
>>>>>>>>> Hi Jeff,
>>>>>>>>>
>>>>>>>>> On 3/12/24 12:22, Jeff Layton wrote:
>>>>>>>>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>>>>>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>>>>>>>>> logged hung nfsd tasks. The initial effect was that some clients
>>>>>>>>>>> could no longer access the NFS server. This got worse and worse
>>>>>>>>>>> (probably as more nfsd threads got blocked) and we had to restart
>>>>>>>>>>> the server. Restarting the server also failed as the NFS server
>>>>>>>>>>> service could no longer be stopped.
>>>>>>>>>>>
>>>>>>>>>>> The initial kernel we noticed this behavior on was
>>>>>>>>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>>>>>>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same issue
>>>>>>>>>>> happened again on this newer kernel version:
>>>>>>>> 419 is fairly up to date with nfsd changes. There are some known bugs
>>>>>>>> around callbacks, and there is a draft MR in flight to fix it.
>>>>>>>>
>>>>>>>> What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ? If we
>>>>>>>> can bracket the changes around a particular version, then that might
>>>>>>>> help identify the problem.
>>>>>>>>
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] Not tainted 5.14.0-419.el9.x86_64 #1
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024]task:nfsd state:D stack:0
>>>>>>>>>>> pid:8865 ppid:2 flags:0x00004000
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] <TASK>
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] __schedule+0x21b/0x550
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] schedule+0x2d/0x70
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] schedule_timeout+0x11f/0x160
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? select_idle_sibling+0x28/0x430
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? wake_affine+0x62/0x1f0
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] __wait_for_common+0x90/0x1d0
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_schedule_timeout+0x10/0x10
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] __flush_workqueue+0x13a/0x3f0
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd4_shutdown_callback+0x49/0x120
>>>>>>>>>>> [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ?
>>>>>>>>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] __destroy_client+0x1f3/0x290 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] svc_process_common+0x2ec/0x660
>>>>>>>>>>> [sunrpc]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] kthread+0xdd/0x100
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_kthread+0x10/0x10
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ret_from_fork+0x29/0x50
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] </TASK>
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
>>>>>>>>>>> more than 122 seconds.
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] Not tainted
>>>>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024]task:nfsd state:D stack:0
>>>>>>>>>>> pid:8866 ppid:2 flags:0x00004000
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] <TASK>
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] __schedule+0x21b/0x550
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] schedule+0x2d/0x70
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] schedule_timeout+0x11f/0x160
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? select_idle_sibling+0x28/0x430
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? tcp_recvmsg+0x196/0x210
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? wake_affine+0x62/0x1f0
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] __wait_for_common+0x90/0x1d0
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_schedule_timeout+0x10/0x10
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] __flush_workqueue+0x13a/0x3f0
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd4_destroy_session+0x1a4/0x240
>>>>>>>>>>> [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] svc_process_common+0x2ec/0x660
>>>>>>>>>>> [sunrpc]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] kthread+0xdd/0x100
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ? __pfx_kthread+0x10/0x10
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] ret_from_fork+0x29/0x50
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024] </TASK>
>>>>>>>>>>>
>>>>>>>>>> The above threads are trying to flush the workqueue, so that probably
>>>>>>>>>> means that they are stuck waiting on a workqueue job to finish.
>>>>>>>>>>> The above is repeated a few times, and then this warning is
>>>>>>>>>>> also logged:
>>>>>>>>>>> [Mon Mar 11 14:12:04 2024] ------------[ cut here ]------------
>>>>>>>>>>> [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>>>>>>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
>>>>>>>>>>> dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
>>>>>>>>>>> iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill nft_counter nft_ct
>>>>>>>>>>> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>>>>>>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink vfat
>>>>>>>>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>>>>>>>> ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>>>>>>>>> intel_rapl_common intel_uncore_frequency
>>>>>>>>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>>>>>>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>>>>>>>> _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>>>>>>>>>>> dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
>>>>>>>>>>> dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>>>>>>>> ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt acpi_ipmi
>>>>>>>>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>>>>>>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>>>>>>>> nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
>>>>>>>>>>> mbcache jbd2 sd_mod sg lpfc
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] nvmet_fc nvmet nvme_fc nvme_fabrics
>>>>>>>>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core crc32c_intel
>>>>>>>>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>>>>>>>> el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>>>>>>>>> dm_region_hash dm_log dm_mod
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>>>>>>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc. PowerEdge
>>>>>>>>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] RIP:
>>>>>>>>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff ff 48
>>>>>>>>>>> 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8 c8 f9
>>>>>>>>>>> 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>>>>>>>> 02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80 EFLAGS:
>>>>>>>>>>> 00010246
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>>>>>>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>>>>>>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>>>>>>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>>>>>>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>>>>>>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] FS: 0000000000000000(0000)
>>>>>>>>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] CS: 0010 DS: 0000 ES: 0000 CR0:
>>>>>>>>>>> 0000000080050033
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>>>>>>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>>>>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>>>>>>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] <TASK>
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? __break_lease+0x16f/0x5f0
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>>>> [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? __warn+0x81/0x110
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>>>> [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? report_bug+0x10a/0x140
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? handle_bug+0x3c/0x70
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? exc_invalid_op+0x14/0x70
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? asm_exc_invalid_op+0x16/0x20
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>>>> [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] __break_lease+0x16f/0x5f0
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ?
>>>>>>>>>>> nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? list_lru_del+0x101/0x150
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] nfsd_file_do_acquire+0x790/0x830
>>>>>>>>>>> [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? fh_verify+0x297/0x2f0 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] svc_process_common+0x2ec/0x660
>>>>>>>>>>> [sunrpc]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] svc_process+0x12d/0x170 [sunrpc]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] kthread+0xdd/0x100
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ? __pfx_kthread+0x10/0x10
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ret_from_fork+0x29/0x50
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] </TASK>
>>>>>>>>>>> [Mon Mar 11 14:12:05 2024] ---[ end trace 7a039e17443dc651 ]---
>>>>>>>>>> This is probably this WARN in nfsd_break_one_deleg:
>>>>>>>>>>
>>>>>>>>>> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>>>>>>>>>>
>>>>>>>>>> It means that a delegation break callback to the client couldn't be
>>>>>>>>>> queued to the workqueue, and so it didn't run.
>>>>>>>>>>
>>>>>>>>>>> Could this be the same issue as described
>>>>>>>>>>> here:https://lore.kernel.org/linux-nfs/[email protected]/
>>>>>>>>>>> ?
>>>>>>>>>> Yes, most likely the same problem.
>>>>>>>>> If I read that thread correctly, this issue was introduced between
>>>>>>>>> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
>>>>>>>>> backported these changes, or were we hitting some other bug with that
>>>>>>>>> version? It seems the 6.1.x kernel is not affected? If so, that
>>>>>>>>> would be
>>>>>>>>> the recommended kernel to run?
>>>>>>>> Anything is possible. We have to identify the problem first.
>>>>>>>>>>> As described in that thread, I've tried to obtain the requested
>>>>>>>>>>> information.
>>>>>>>>>>>
>>>>>>>>>>> Is it possible this is the issue that was fixed by the patches
>>>>>>>>>>> described
>>>>>>>>>>> here?https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/
>>>>>>>>>>>
>>>>>>>>>> Doubtful. Those are targeted toward a different set of issues.
>>>>>>>>>>
>>>>>>>>>> If you're willing, I do have some patches queued up for CentOS here
>>>>>>>>>> that
>>>>>>>>>> fix some backchannel problems that could be related. I'm mainly
>>>>>>>>>> waiting
>>>>>>>>>> on Chuck to send these to Linus and then we'll likely merge them into
>>>>>>>>>> CentOS soon afterward:
>>>>>>>>>>
>>>>>>>>>> https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> If you can send me a patch file, I can rebuild the C9S kernel with that
>>>>>>>>> patch and run it. It can take a while for the bug to trigger as I
>>>>>>>>> believe it seems to be very workload dependent (we were running very
>>>>>>>>> stable for months and now hit this bug every other week).
>>>>>>>>>
>>>>>>>>>
>>>>>>>> It's probably simpler to just pull down the build artifacts for that MR.
>>>>>>>> You have to drill down through the CI for it, but they are here:
>>>>>>>>
>>>>>>>> https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts/1194300175/publish_x86_64/6278921877/artifacts/
>>>>>>>>
>>>>>>>>
>>>>>>>> There's even a repo file you can install on the box to pull them down.
>>>>>>> We installed this kernel on the server 3 days ago. Today, a user
>>>>>>> informed us that their screen was black after logging in. Similar to
>>>>>>> other occurrences of this issue, the mount command on the client was
>>>>>>> hung. But in contrast to the other times, there were no messages in
>>>>>>> the logs kernel logs on the server. Even restarting the client does
>>>>>>> not resolve the issue.
>>>>> Ok, so you rebooted the client and it's still unable to mount? That
>>>>> sounds like a server problem if so.
>>>>>
>>>>> Are both client and server running the same kernel?
>>>> No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
>>>> 5.14.0-362.18.1.el9_3.
>>> Ok.
>>>
>>>>>>> Something still seems to be wrong on the server though. When I look at
>>>>>>> the directories under /proc/fs/nfsd/clients, there's still a directory
>>>>>>> for the specific client, even though it's no longer running:
>>>>>>>
>>>>>>> # cat 155/info
>>>>>>> clientid: 0xc8edb7f65f4a9ad
>>>>>>> address: "10.87.31.152:819"
>>>>>>> status: confirmed
>>>>>>> seconds from last renew: 33163
>>>>>>> name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
>>>>>>> minor version: 2
>>>>>>> Implementation domain: "kernel.org"
>>>>>>> Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
>>>>>>> PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
>>>>>>> Implementation time: [0, 0]
>>>>>>> callback state: DOWN
>>>>>>> callback address: 10.87.31.152:0
>>>>>>>
>>>>> If you just shut down the client, the server won't immediately purge its
>>>>> record. In fact, assuming you're running the same kernel on the server,
>>>>> it won't purge the client record until there is a conflicting request
>>>>> for its state.
>>>> Is there a way to force such a conflicting request (to get the client
>>>> record to purge)?
>>> From the server or a different client, you can try opening the inodes
>>> that the stuck client is holding open. If you open them for write, that
>>> may trigger the server to kick out the old client record.
>>>
>>> The problem is that they are disconnected dentries, so finding them to
>>> open via path may be difficult...
>>
>> I've located the file that matches one of these inodes. When I go to the
>>
>> location of the file on another NFS client and touch the file, the touch
>> command
>>
>> just hangs.
>>
>> So I assume the server is trying to recall the delegation from the
>> dis-functional client?
>>
>> When I run tcpdump on the dis-functional client, I see the
>> CREATE_SESSION / NFS4ERR_DELAY messages, but nothing to indicate the
>> server wants to revoke a delegation.
>>
>
> Right, according to the client record above, the callback channel is
> DOWN, so the server can't communicate with the client. Given that there
> are 33000s since the last lease renewal, the server should kick the
> client out that is blocking other activity, but that doesn't seem to be
> happening here.
>
>
>> The entry for the dis-functional client is not removed on the server.
>>
>> I there anything else I can do to provide more information about this
>> situation?
>>
>>
>
> The main function that handles the CREATE_SESSION call is
> nfsd4_create_session. It's somewhat complex and there are a number of
> reasons that function could return NFS4ERR_DELAY (aka nfserr_jukebox in
> the kernel code).
>
> What I'd do at this point is turn up tracepoints and see whether they
> shed any light on what's going wrong to make it continually delay your
> CREATE_SESSION calls. There aren't a lot of tracepoints in that code,
> however, so it may not show much.
>
> In the absence of that, you can try to use bpftrace or something similar
> to debug what's happening in that function.
>
>>
>>
>>>>>> The nfsdclnts command for this client shows the following delegations:
>>>>>>
>>>>>> # nfsdclnts -f 155/states -t all
>>>>>> Inode number | Type | Access | Deny | ip address | Filename
>>>>>> 169346743 | open | r- | -- | 10.87.31.152:819 |
>>>>>> disconnected dentry
>>>>>> 169346743 | deleg | r | | 10.87.31.152:819 |
>>>>>> disconnected dentry
>>>>>> 169346746 | open | r- | -- | 10.87.31.152:819 |
>>>>>> disconnected dentry
>>>>>> 169346746 | deleg | r | | 10.87.31.152:819 |
>>>>>> disconnected dentry
>>>>>>
>>>>>> I see a lot of recent patches regarding directory delegations. Could
>>>>>> this be related to this?
>>>>>>
>>>>>> Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory delegation?
>>>>>>
>>>>>>
>>>>> No. Directory delegations are a new feature that's still under
>>>>> development. They use some of the same machinery as file delegations,
>>>>> but they wouldn't be a factor here.
>>>>>
>>>>>>> The system seems to have identified that the client is no longer
>>>>>>> reachable, but the client entry does not go away. When a mount was
>>>>>>> hanging on the client, there would be two directories in clients for
>>>>>>> the same client. Killing the mount command clears up the second entry.
>>>>>>>
>>>>>>> Even after running conntrack -D on the server to remove the tcp
>>>>>>> connection from the conntrack table, the entry doesn't go away and the
>>>>>>> client still can not mount anything from the server.
>>>>>>>
>>>>>>> A tcpdump on the client while a mount was running logged the following
>>>>>>> messages over and over again:
>>>>>>>
>>>>>>> request:
>>>>>>>
>>>>>>> Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024 bits)
>>>>>>> Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
>>>>>>> ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
>>>>>>> Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
>>>>>>> Transmission Control Protocol, Src Port: 932, Dst Port: 2049, Seq: 1,
>>>>>>> Ack: 1, Len: 312
>>>>>>> Remote Procedure Call, Type:Call XID:0x1d3220c4
>>>>>>> Network File System
>>>>>>> [Program Version: 4]
>>>>>>> [V4 Procedure: COMPOUND (1)]
>>>>>>> GSS Data, Ops(1): CREATE_SESSION
>>>>>>> Length: 152
>>>>>>> GSS Sequence Number: 76
>>>>>>> Tag: <EMPTY>
>>>>>>> minorversion: 2
>>>>>>> Operations (count: 1): CREATE_SESSION
>>>>>>> [Main Opcode: CREATE_SESSION (43)]
>>>>>>> GSS Checksum:
>>>>>>> 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
>>>>>>> GSS Token Length: 40
>>>>>>> GSS-API Generic Security Service Application Program Interface
>>>>>>> krb5_blob:
>>>>>>> 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
>>>>>>>
>>>>>>> response:
>>>>>>>
>>>>>>> Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648 bits)
>>>>>>> Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
>>>>>>> HP_19:7d:4b (e0:73:e7:19:7d:4b)
>>>>>>> Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
>>>>>>> Transmission Control Protocol, Src Port: 2049, Dst Port: 932, Seq: 1,
>>>>>>> Ack: 313, Len: 140
>>>>>>> Remote Procedure Call, Type:Reply XID:0x1d3220c4
>>>>>>> Network File System
>>>>>>> [Program Version: 4]
>>>>>>> [V4 Procedure: COMPOUND (1)]
>>>>>>> GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
>>>>>>> Length: 24
>>>>>>> GSS Sequence Number: 76
>>>>>>> Status: NFS4ERR_DELAY (10008)
>>>>>>> Tag: <EMPTY>
>>>>>>> Operations (count: 1)
>>>>>>> [Main Opcode: CREATE_SESSION (43)]
>>>>>>> GSS Checksum:
>>>>>>> 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
>>>>>>> GSS Token Length: 40
>>>>>>> GSS-API Generic Security Service Application Program Interface
>>>>>>> krb5_blob:
>>>>>>> 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
>>>>>>>
>>>>>>> I was hoping that giving the client a different IP address would
>>>>>>> resolve the issue for this client, but it didn't. Even though the
>>>>>>> client had a new IP address (hostname was kept the same), it failed to
>>>>>>> mount anything from the server.
>>>>>>>
>>>>> Changing the IP address won't help. The client is probably using the
>>>>> same long-form client id as before, so the server still identifies the
>>>>> client even with the address change.
>>>> How is the client id determined? Will changing the hostname of the
>>>> client trigger a change of the client id?
>>> In the client record you showed a bit above, there is a "name" field:
>>>
>>> name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
>>>
>>> That's the string the server uses to uniquely identify the client. So
>>> yes, changing the hostname should change that string.
>>>
>>>>> Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
>>>>> The client is expected to back off and retry, so if the server keeps
>>>>> returning that repeatedly, then a hung mount command is expected.
>>>>>
>>>>> The question is why the server would keep returning DELAY. A lot of
>>>>> different problems ranging from memory allocation issues to protocol
>>>>> problems can result in that error. You may want to check the NFS server
>>>>> and see if anything was logged there.
>>>> There are no messages in the system logs that indicate any sort of
>>>> memory issue. We also increased the min_kbytes_free sysctl to 2G on the
>>>> server before we restarted it with the newer kernel.
>>> Ok, I didn't expect to see anything like that, but it was a possibility.
>>>
>>>>> This is on a CREATE_SESSION call, so I wonder if the record held by the
>>>>> (courteous) server is somehow blocking the attempt to reestablish the
>>>>> session?
>>>>>
>>>>> Do you have a way to reproduce this? Since this is a centos kernel, you
>>>>> could follow the page here to open a bug:
>>>> Unfortunately we haven't found a reliable way to reproduce it. But we do
>>>> seem to trigger it more and more lately.
>>>>
>>>>
>>> Bummer, ok. Let us know if you figure out a way to reproduce it.
>>>
>>>>> https://wiki.centos.org/ReportBugs.html
>>>>>
>>>>>
>>>>>>> I created another dump of the workqueues and worker pools on the server:
>>>>>>>
>>>>>>> [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker pools:
>>>>>>> [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
>>>>>>> [Mon Mar 18 14:59:33 2024] pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>>>>> active=1/256 refcnt=2
>>>>>>> [Mon Mar 18 14:59:33 2024] pending: drm_fb_helper_damage_work
>>>>>>> [drm_kms_helper]
>>>>>>> [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient: flags=0x80
>>>>>>> [Mon Mar 18 14:59:33 2024] pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>>>>> active=1/256 refcnt=2
>>>>>>> [Mon Mar 18 14:59:33 2024] pending: fb_flashcursor
>>>>>>> [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
>>>>>>> [Mon Mar 18 14:59:33 2024] pwq 54: cpus=27 node=1 flags=0x0 nice=0
>>>>>>> active=1/256 refcnt=3
>>>>>>> [Mon Mar 18 14:59:33 2024] pending: lru_add_drain_per_cpu BAR(362)
>>>>>>> [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
>>>>>>> [Mon Mar 18 14:59:33 2024] pwq 55: cpus=27 node=1 flags=0x0 nice=-20
>>>>>>> active=1/256 refcnt=2
>>>>>>> [Mon Mar 18 14:59:33 2024] pending: blk_mq_timeout_work
>>>>>>>
>>>>>>>
>>>>>>> In contrast to last time, it doesn't show anything regarding nfs this
>>>>>>> time.
>>>>>>>
>>>>>>> I also tried the suggestion from Dai Ngo (echo 3 >
>>>>>>> /proc/sys/vm/drop_caches), but that didn't seem to make any difference.
>>>>>>>
>>>>>>> We haven't restarted the server yet as it seems the impact seems to
>>>>>>> affect fewer clients that before. Is there anything we can run on the
>>>>>>> server to further debug this?
>>>>>>>
>>>>>>> In the past, the issue seemed to deteriorate rapidly and resulted in
>>>>>>> issues for almost all clients after about 20 minutes. This time the
>>>>>>> impact seems to be less, but it's not gone.
>>>>>>>
>>>>>>> How can we force the NFS server to forget about a specific client? I
>>>>>>> haven't tried to restart the nfs service yet as I'm afraid it will
>>>>>>> fail to stop as before.
>>>>>>>
>>>>> Not with that kernel. There are some new administrative interfaces that
>>>>> might allow that in the future, but they were just merged upstream and
>>>>> aren't in that kernel.
>>>>>
>>>>> --
>>>>> Jeff Layton <[email protected]>
>>
>
> --
> Jeff Layton <[email protected]>

For years we’ve seen hangs with NFS4 on one particular server. The symptoms look precisely like this. Rebooting the client doesn’t help. Echo expire hangs. I haven’t been able to get a usable backtrace. What’s characteristic of this server is that delegations are off but there’s a high rate of locking and unlocking. Could that cause the same problem?

I’d be happy to collect data if you can suggest a way. Typical traces aren’t going to be useful unless they are very specific, because this is a very busy server, which we can’t take down for debugging.
>

2024-03-20 19:41:23

by Rik Theys

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

Hi,

On 3/19/24 22:42, Dai Ngo wrote:
>
> On 3/19/24 12:41 PM, Rik Theys wrote:
>> Hi,
>>
>> On 3/19/24 18:09, Dai Ngo wrote:
>>>
>>> On 3/19/24 12:58 AM, Rik Theys wrote:
>>>> Hi,
>>>>
>>>> On 3/18/24 22:54, Jeff Layton wrote:
>>>>> On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 3/18/24 21:21, Rik Theys wrote:
>>>>>>> Hi Jeff,
>>>>>>>
>>>>>>> On 3/12/24 13:47, Jeff Layton wrote:
>>>>>>>> On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
>>>>>>>>> Hi Jeff,
>>>>>>>>>
>>>>>>>>> On 3/12/24 12:22, Jeff Layton wrote:
>>>>>>>>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>>>>>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>>>>>>>>> logged hung nfsd tasks. The initial effect was that some
>>>>>>>>>>> clients
>>>>>>>>>>> could no longer access the NFS server. This got worse and worse
>>>>>>>>>>> (probably as more nfsd threads got blocked) and we had to
>>>>>>>>>>> restart
>>>>>>>>>>> the server. Restarting the server also failed as the NFS server
>>>>>>>>>>> service could no longer be stopped.
>>>>>>>>>>>
>>>>>>>>>>> The initial kernel we noticed this behavior on was
>>>>>>>>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>>>>>>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same
>>>>>>>>>>> issue
>>>>>>>>>>> happened again on this newer kernel version:
>>>>>>>> 419 is fairly up to date with nfsd changes. There are some
>>>>>>>> known bugs
>>>>>>>> around callbacks, and there is a draft MR in flight to fix it.
>>>>>>>>
>>>>>>>> What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ?
>>>>>>>> If we
>>>>>>>> can bracket the changes around a particular version, then that
>>>>>>>> might
>>>>>>>> help identify the problem.
>>>>>>>>
>>>>>>>>>>> [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D
>>>>>>>>>>> stack:0
>>>>>>>>>>>      pid:8865  ppid:2      flags:0x00004000
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>> select_idle_sibling+0x28/0x430
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>> __pfx_schedule_timeout+0x10/0x10
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>>  nfsd4_shutdown_callback+0x49/0x120
>>>>>>>>>>> [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>> nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>> nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290
>>>>>>>>>>> [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>>  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>> nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>>  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>>>> [sunrpc]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>> __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170
>>>>>>>>>>> [sunrpc]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
>>>>>>>>>>> more than 122 seconds.
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D
>>>>>>>>>>> stack:0
>>>>>>>>>>>      pid:8866  ppid:2      flags:0x00004000
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>> select_idle_sibling+0x28/0x430
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>> __pfx_schedule_timeout+0x10/0x10
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>>  nfsd4_destroy_session+0x1a4/0x240
>>>>>>>>>>> [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>>  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>>>> [sunrpc]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>> __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170
>>>>>>>>>>> [sunrpc]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>>>>>
>>>>>>>>>> The above threads are trying to flush the workqueue, so that
>>>>>>>>>> probably
>>>>>>>>>> means that they are stuck waiting on a workqueue job to finish.
>>>>>>>>>>>     The above is repeated a few times, and then this warning is
>>>>>>>>>>> also logged:
>>>>>>>>>>>     [Mon Mar 11 14:12:04 2024] ------------[ cut here
>>>>>>>>>>> ]------------
>>>>>>>>>>>     [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>>>>>>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
>>>>>>>>>>> dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
>>>>>>>>>>> iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill
>>>>>>>>>>> nft_counter nft_ct
>>>>>>>>>>>     nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>>>>>>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink
>>>>>>>>>>> vfat
>>>>>>>>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>>>>>>>>     ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>>>>>>>>> intel_rapl_common intel_uncore_frequency
>>>>>>>>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>>>>>>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>>>>>>>>     _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>>>>>>>>>>> dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
>>>>>>>>>>> dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>>>>>>>>     ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt
>>>>>>>>>>> acpi_ipmi
>>>>>>>>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>>>>>>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>>>>>>>>     nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
>>>>>>>>>>> mbcache jbd2 sd_mod sg lpfc
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc
>>>>>>>>>>> nvme_fabrics
>>>>>>>>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core
>>>>>>>>>>> crc32c_intel
>>>>>>>>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>>>>>>>>     el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>>>>>>>>> dm_region_hash dm_log dm_mod
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>>>>>>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc.
>>>>>>>>>>> PowerEdge
>>>>>>>>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RIP:
>>>>>>>>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff
>>>>>>>>>>> ff 48
>>>>>>>>>>> 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8
>>>>>>>>>>> c8 f9
>>>>>>>>>>> 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>>>>>>>>     02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80
>>>>>>>>>>> EFLAGS:
>>>>>>>>>>> 00010246
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>>>>>>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>>>>>>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>>>>>>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>>>>>>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>>>>>>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
>>>>>>>>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>>>>>>>>>> 0000000080050033
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>>>>>>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>>>>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>>>>>>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>> show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>> show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>> nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>>>> [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>> nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>>>> [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>> nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>>>> [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>> nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]
>>>>>>>>>>>  nfsd_file_do_acquire+0x790/0x830
>>>>>>>>>>> [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]
>>>>>>>>>>>  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]
>>>>>>>>>>>  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]
>>>>>>>>>>>  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>>>> [sunrpc]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>> __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170
>>>>>>>>>>> [sunrpc]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] ---[ end trace
>>>>>>>>>>> 7a039e17443dc651 ]---
>>>>>>>>>> This is probably this WARN in nfsd_break_one_deleg:
>>>>>>>>>>
>>>>>>>>>> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>>>>>>>>>>
>>>>>>>>>> It means that a delegation break callback to the client
>>>>>>>>>> couldn't be
>>>>>>>>>> queued to the workqueue, and so it didn't run.
>>>>>>>>>>
>>>>>>>>>>> Could this be the same issue as described
>>>>>>>>>>> here:https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/[email protected]/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdBV9En7$
>>>>>>>>>>> ?
>>>>>>>>>> Yes, most likely the same problem.
>>>>>>>>> If I read that thread correctly, this issue was introduced
>>>>>>>>> between
>>>>>>>>> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
>>>>>>>>> backported these changes, or were we hitting some other bug
>>>>>>>>> with that
>>>>>>>>> version? It seems the 6.1.x kernel is not affected? If so, that
>>>>>>>>> would be
>>>>>>>>> the recommended kernel to run?
>>>>>>>> Anything is possible. We have to identify the problem first.
>>>>>>>>>>> As described in that thread, I've tried to obtain the requested
>>>>>>>>>>> information.
>>>>>>>>>>>
>>>>>>>>>>> Is it possible this is the issue that was fixed by the patches
>>>>>>>>>>> described
>>>>>>>>>>> here?https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkedtUP09$
>>>>>>>>>>>
>>>>>>>>>> Doubtful. Those are targeted toward a different set of issues.
>>>>>>>>>>
>>>>>>>>>> If you're willing, I do have some patches queued up for
>>>>>>>>>> CentOS here
>>>>>>>>>> that
>>>>>>>>>> fix some backchannel problems that could be related. I'm mainly
>>>>>>>>>> waiting
>>>>>>>>>> on Chuck to send these to Linus and then we'll likely merge
>>>>>>>>>> them into
>>>>>>>>>> CentOS soon afterward:
>>>>>>>>>>
>>>>>>>>>> https://urldefense.com/v3/__https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdvDn8y7$
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> If you can send me a patch file, I can rebuild the C9S kernel
>>>>>>>>> with that
>>>>>>>>> patch and run it. It can take a while for the bug to trigger as I
>>>>>>>>> believe it seems to be very workload dependent (we were
>>>>>>>>> running very
>>>>>>>>> stable for months and now hit this bug every other week).
>>>>>>>>>
>>>>>>>>>
>>>>>>>> It's probably simpler to just pull down the build artifacts for
>>>>>>>> that MR.
>>>>>>>> You have to drill down through the CI for it, but they are here:
>>>>>>>>
>>>>>>>> https://urldefense.com/v3/__https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts*1194300175*publish_x86_64*6278921877*artifacts*__;Ly8vLy8!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkaP5eW8V$
>>>>>>>>
>>>>>>>>
>>>>>>>> There's even a repo file you can install on the box to pull
>>>>>>>> them down.
>>>>>>> We installed this kernel on the server 3 days ago. Today, a user
>>>>>>> informed us that their screen was black after logging in.
>>>>>>> Similar to
>>>>>>> other occurrences of this issue, the mount command on the client
>>>>>>> was
>>>>>>> hung. But in contrast to the other times, there were no messages in
>>>>>>> the logs kernel logs on the server. Even restarting the client does
>>>>>>> not resolve the issue.
>>>>>
>>>>> Ok, so you rebooted the client and it's still unable to mount? That
>>>>> sounds like a server problem if so.
>>>>>
>>>>> Are both client and server running the same kernel?
>>>> No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
>>>> 5.14.0-362.18.1.el9_3.
>>>>>
>>>>>>> Something still seems to be wrong on the server though. When I
>>>>>>> look at
>>>>>>> the directories under /proc/fs/nfsd/clients, there's still a
>>>>>>> directory
>>>>>>> for the specific client, even though it's no longer running:
>>>>>>>
>>>>>>> # cat 155/info
>>>>>>> clientid: 0xc8edb7f65f4a9ad
>>>>>>> address: "10.87.31.152:819"
>>>>>>> status: confirmed
>>>>>>> seconds from last renew: 33163
>>>>>>> name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
>>>>>>> minor version: 2
>>>>>>> Implementation domain: "kernel.org"
>>>>>>> Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
>>>>>>> PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
>>>>>>> Implementation time: [0, 0]
>>>>>>> callback state: DOWN
>>>>>>> callback address: 10.87.31.152:0
>>>>>>>
>>>>> If you just shut down the client, the server won't immediately
>>>>> purge its
>>>>> record. In fact, assuming you're running the same kernel on the
>>>>> server,
>>>>> it won't purge the client record until there is a conflicting request
>>>>> for its state.
>>>> Is there a way to force such a conflicting request (to get the
>>>> client record to purge)?
>>>
>>> Try:
>>>
>>> # echo "expire" > /proc/fs/nfsd/clients/155/ctl
>>
>> I've tried that. The command hangs and can not be interrupted with
>> ctrl-c.
>> I've now also noticed in the dmesg output that the kernel issued the
>> following WARNING a few hours ago. It wasn't directly triggered by
>> the echo command above, but seems to have been triggered a few hours
>> ago (probably when another client started to have the same problem as
>> more clients are experiencing issues now).
>
> I think this warning message is harmless. However it indicates potential
> problem with the workqueue which might be related to memory shortage.
>
> What the output of 'cat /proc/meminfo' looks like?

I doubt the current values are useful, but they are:

MemTotal:       196110860 kB
MemFree:        29357112 kB
MemAvailable:   179529420 kB
Buffers:        11996096 kB
Cached:         130589396 kB
SwapCached:           52 kB
Active:          1136988 kB
Inactive:       144192468 kB
Active(anon):     698564 kB
Inactive(anon):  2657256 kB
Active(file):     438424 kB
Inactive(file): 141535212 kB
Unevictable:       72140 kB
Mlocked:           69068 kB
SwapTotal:      67108860 kB
SwapFree:       67106276 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:             80812 kB
Writeback:             0 kB
AnonPages:       2806592 kB
Mapped:           322700 kB
Shmem:            599308 kB
KReclaimable:   16977000 kB
Slab:           18898736 kB
SReclaimable:   16977000 kB
SUnreclaim:      1921736 kB
KernelStack:       18128 kB
PageTables:        31716 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    165164288 kB
Committed_AS:    5223940 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      300064 kB
VmallocChunk:          0 kB
Percpu:            45888 kB
HardwareCorrupted:     0 kB
AnonHugePages:   2451456 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
Unaccepted:            0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:     1303552 kB
DirectMap2M:    28715008 kB
DirectMap1G:    171966464 kB


>
> Did you try 'echo 3 > /proc/sys/vm/drop_caches'?

Yes, I tried that when the first client hit the issue, but it didn't
result in any unlocking of the client.


>
>>
>> [Tue Mar 19 14:53:44 2024] ------------[ cut here ]------------
>> [Tue Mar 19 14:53:44 2024] WARNING: CPU: 44 PID: 5843 at
>> fs/nfsd/nfs4state.c:4920 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>> [Tue Mar 19 14:53:44 2024] Modules linked in: nf_conntrack_netlink
>> nfsv4 dns_resolver nfs fscache netfs binfmt_misc xsk_diag
>> rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core bonding tls
>> rfkill nft_counter nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
>> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables
>> nfnetlink vfat fat dm_thin_pool dm_persistent_data dm_bio_prison
>> dm_bufio libcrc32c dm_service_time dm_multipath intel_rapl_msr
>> intel_rapl_common intel_uncore_frequency
>> intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm
>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm dcdbas
>> irqbypass ipmi_ssif rapl intel_cstate mgag200 i2c_algo_bit
>> drm_shmem_helper drm_kms_helper dell_smbios syscopyarea intel_uncore
>> sysfillrect wmi_bmof dell_wmi_descriptor pcspkr sysimgblt fb_sys_fops
>> mei_me i2c_i801 mei intel_pch_thermal acpi_ipmi i2c_smbus lpc_ich
>> ipmi_si ipmi_devintf ipmi_msghandler joydev acpi_power_meter nfsd
>> nfs_acl lockd auth_rpcgss grace drm fuse sunrpc ext4
>> [Tue Mar 19 14:53:44 2024]  mbcache jbd2 sd_mod sg lpfc nvmet_fc
>> nvmet nvme_fc nvme_fabrics crct10dif_pclmul crc32_pclmul nvme_core
>> ixgbe crc32c_intel ahci libahci nvme_common megaraid_sas t10_pi
>> ghash_clmulni_intel wdat_wdt libata scsi_transport_fc mdio dca wmi
>> dm_mirror dm_region_hash dm_log dm_mod
>> [Tue Mar 19 14:53:44 2024] CPU: 44 PID: 5843 Comm: nfsd Not tainted
>> 5.14.0-427.3689_1194299994.el9.x86_64 #1
>> [Tue Mar 19 14:53:44 2024] Hardware name: Dell Inc. PowerEdge
>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>> [Tue Mar 19 14:53:44 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190
>> [nfsd]
>> [Tue Mar 19 14:53:44 2024] Code: 76 76 cd de e9 ff fe ff ff 48 89 df
>> be 01 00 00 00 e8 34 a1 1b df 48 8d bb 98 00 00 00 e8 a8 fe 00 00 84
>> c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be 02 00 00 00 48 89 df
>> e8 0c a1 1b df e9 01
>> [Tue Mar 19 14:53:44 2024] RSP: 0018:ffffb2878f2cfc38 EFLAGS: 00010246
>> [Tue Mar 19 14:53:44 2024] RAX: 0000000000000000 RBX:
>> ffff88d5171067b8 RCX: 0000000000000000
>> [Tue Mar 19 14:53:44 2024] RDX: ffff88d517106880 RSI:
>> ffff88bdceec8600 RDI: 0000000000002000
>> [Tue Mar 19 14:53:44 2024] RBP: ffff88d68a38a284 R08:
>> ffffb2878f2cfc00 R09: 0000000000000000
>> [Tue Mar 19 14:53:44 2024] R10: ffff88bf57dd7878 R11:
>> 0000000000000000 R12: ffff88d5b79c4798
>> [Tue Mar 19 14:53:44 2024] R13: ffff88d68a38a270 R14:
>> ffff88cab06ad0c8 R15: ffff88d5b79c4798
>> [Tue Mar 19 14:53:44 2024] FS:  0000000000000000(0000)
>> GS:ffff88d4a1180000(0000) knlGS:0000000000000000
>> [Tue Mar 19 14:53:44 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>> 0000000080050033
>> [Tue Mar 19 14:53:44 2024] CR2: 00007fe46ef90000 CR3:
>> 000000019d010004 CR4: 00000000007706e0
>> [Tue Mar 19 14:53:44 2024] DR0: 0000000000000000 DR1:
>> 0000000000000000 DR2: 0000000000000000
>> [Tue Mar 19 14:53:44 2024] DR3: 0000000000000000 DR6:
>> 00000000fffe0ff0 DR7: 0000000000000400
>> [Tue Mar 19 14:53:44 2024] PKRU: 55555554
>> [Tue Mar 19 14:53:44 2024] Call Trace:
>> [Tue Mar 19 14:53:44 2024]  <TASK>
>> [Tue Mar 19 14:53:44 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>> [Tue Mar 19 14:53:44 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>> [Tue Mar 19 14:53:44 2024]  ? __break_lease+0x16f/0x5f0
>> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  ? __warn+0x81/0x110
>> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  ? report_bug+0x10a/0x140
>> [Tue Mar 19 14:53:44 2024]  ? handle_bug+0x3c/0x70
>> [Tue Mar 19 14:53:44 2024]  ? exc_invalid_op+0x14/0x70
>> [Tue Mar 19 14:53:44 2024]  ? asm_exc_invalid_op+0x16/0x20
>> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x96/0x190 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  __break_lease+0x16f/0x5f0
>> [Tue Mar 19 14:53:44 2024]  nfs4_get_vfs_file+0x164/0x3a0 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>> [Tue Mar 19 14:53:44 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  svc_process+0x12d/0x170 [sunrpc]
>> [Tue Mar 19 14:53:44 2024]  nfsd+0x84/0xb0 [nfsd]
>> [Tue Mar 19 14:53:44 2024]  kthread+0xdd/0x100
>> [Tue Mar 19 14:53:44 2024]  ? __pfx_kthread+0x10/0x10
>> [Tue Mar 19 14:53:44 2024]  ret_from_fork+0x29/0x50
>> [Tue Mar 19 14:53:44 2024]  </TASK>
>> [Tue Mar 19 14:53:44 2024] ---[ end trace ed0b2b3f135c637d ]---
>>
>> It again seems to have been triggered in nfsd_break_deleg_cb?
>>
>> I also had the following perf command running a tmux on the server:
>>
>> perf trace -e nfsd:nfsd_cb_recall_any
>>
>> This has spewed a lot of messages. I'm including a short list here:
>>
>> ...
>>
>> 33464866.721 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210688785, bmval0: 1, addr: 0x7f331bb116c8)
>> 33464866.724 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210688827, bmval0: 1, addr: 0x7f331bb11738)
>> 33464866.729 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210688767, bmval0: 1, addr: 0x7f331bb117a8)
>> 33464866.732 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210718132, bmval0: 1, addr: 0x7f331bb11818)
>> 33464866.737 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210688952, bmval0: 1, addr: 0x7f331bb11888)
>> 33464866.741 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210702355, bmval0: 1, addr: 0x7f331bb118f8)
>> 33868414.001 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210688751, bmval0: 1, addr: 0x7f331be68620)
>> 33868414.014 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210718536, bmval0: 1, addr: 0x7f331be68690)
>> 33868414.018 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210719074, bmval0: 1, addr: 0x7f331be68700)
>> 33868414.022 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210688916, bmval0: 1, addr: 0x7f331be68770)
>> 33868414.026 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210688941, bmval0: 1, addr: 0x7f331be687e0)
>> ...
>>
>> 33868414.924 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210688744, bmval0: 1, addr: 0x7f331be6d7f0)
>> 33868414.929 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210717223, bmval0: 1, addr: 0x7f331be6d860)
>> 33868414.934 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210716137, bmval0: 1, addr: 0x7f331be6d8d0)
>> 34021240.903 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210688941, bmval0: 1, addr: 0x7f331c207de8)
>> 34021240.917 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210718750, bmval0: 1, addr: 0x7f331c207e58)
>> 34021240.922 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210688955, bmval0: 1, addr: 0x7f331c207ec8)
>> 34021240.925 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>> 1710533037, cl_id: 210688975, bmval0: 1, addr: 0x7f331c207f38)
>> ...
>>
>> I assume the cl_id is the client id? How can I map this to a client
>> from /proc/fs/nfsd/clients?
>
> The hex value of 'clientid' printed from /proc/fs/nfsd/clients/XX/info
> is a 64-bit value composed of:
>
> typedef struct {
>         u32             cl_boot;
>         u32             cl_id;
> } clientid_t
>
> For example:
>
> clientid: 0xc8edb7f65f4a9ad
>
> cl_boot:  65f4a9add (1710533037)
> cl_id:      c8edb7f (21068895)
>
> This should match a trace event with:
>
> nfsd:nfsd_cb_recall_any(cl_boot: 1710533037, cl_id: 21068895, bmval0:
> XX, addr: 0xYYYYY)
>
>>
>> If I understand it correctly, the recall_any should be called when
>> either the system starts to experience memory pressure,
>
> yes.
It seems odd that the system gets in such a state that has such high
memory pressure. It doesn't run much else than NFS and Samba.
>
>> or it reaches the delegation limits?
>
> No, this feature was added to nfsd very recently. I don't think your
> kernel has it.
>
>> I doubt the system is actually running out of memory here as there
>> are no other indications.
>> Shouldn't I get those "page allocation failure" messages if it does?
>> How can I check the number of delegations/leases currently issued,
>> what the current maximum is and how to increase it?
>
> Max delegations is 4 per 1MB of available memory. There is no
> admin tool to adjust this value.
/proc/locks currently has about 130k DELEG lines, so that should be a
lot lower than the limit on a 192G ram server.
>
> I do not recommend running a production system with delegation
> disabled. But for this specific issue, it might help to temporarily
> disable delegation to isolate problem areas.


I'm going to reboot the system with the 6.1.82 kernel (kernel-lt from
elrepo). Maybe it has less new modern developments that may have
introduced this.

I've been able to reproduce the situation on an additional client now
that the issue happens on the server:

1. Log in on a client and mount the NFS share.
2. Open a file from the NFS share in vim so the client gets a read
delegation from the server
3. Verify on the server in /proc/fs/nfsd/clients/*/states that the
client has a delegation for the file
4. Forcefully reboot the client by running 'echo b > /proc/sysrq-trigger'
5. Watch the /proc/fs/nfsd/clients/*/info file on the server.

The "seconds from last renew" will go up and at some point the callback
state changes to "FAULT". Even when the lease delegation time (90s by
default?) is over, the

seconds from last renew keeps increasing. At some point the callback
state changes to "DOWN". When the client is up again and remounts the
share, the mount hangs on the client

and on the server I notice there's a second directory for this client in
the clients directory, even though the clientid is the same. The
callback state for this new client is "UNKNOWN" and the callback address
is "(einval)".

This is on a client running Fedora 39 with the 6.7.9 kernel.


I don't know yet if the same procedure can be used to trigger the
behavior after the server is rebooted. I'm going to try to reproduce
this on another system first.

I would expect the delegations to expire automatically after 90s, but
they remain in the states file of the "DOWN" client.

Regards,

Rik

>
> -Dai
>
>>
>> Regarding the recall any call: from what I've read on kernelnewbies,
>> this feature was introduced in the 6.2 kernel? When I look at the
>> tree for 6.1.x, it was backported in 6.1.81? Is there a way to
>> disable this support somehow?
>>
>> Regards,
>>
>> Rik
>>
>>
>>>
>>> -Dai
>>>
>>>>>
>>>>>
>>>>>> The nfsdclnts command for this client shows the following
>>>>>> delegations:
>>>>>>
>>>>>> # nfsdclnts -f 155/states -t all
>>>>>> Inode number | Type   | Access | Deny | ip address | Filename
>>>>>> 169346743    | open   | r-     | --   | 10.87.31.152:819 |
>>>>>> disconnected dentry
>>>>>> 169346743    | deleg  | r      |      | 10.87.31.152:819 |
>>>>>> disconnected dentry
>>>>>> 169346746    | open   | r-     | --   | 10.87.31.152:819 |
>>>>>> disconnected dentry
>>>>>> 169346746    | deleg  | r      |      | 10.87.31.152:819 |
>>>>>> disconnected dentry
>>>>>>
>>>>>> I see a lot of recent patches regarding directory delegations. Could
>>>>>> this be related to this?
>>>>>>
>>>>>> Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory
>>>>>> delegation?
>>>>>>
>>>>>>
>>>>> No. Directory delegations are a new feature that's still under
>>>>> development. They use some of the same machinery as file delegations,
>>>>> but they wouldn't be a factor here.
>>>>>
>>>>>>> The system seems to have identified that the client is no longer
>>>>>>> reachable, but the client entry does not go away. When a mount was
>>>>>>> hanging on the client, there would be two directories in clients
>>>>>>> for
>>>>>>> the same client. Killing the mount command clears up the second
>>>>>>> entry.
>>>>>>>
>>>>>>> Even after running conntrack -D on the server to remove the tcp
>>>>>>> connection from the conntrack table, the entry doesn't go away
>>>>>>> and the
>>>>>>> client still can not mount anything from the server.
>>>>>>>
>>>>>>> A tcpdump on the client while a mount was running logged the
>>>>>>> following
>>>>>>> messages over and over again:
>>>>>>>
>>>>>>> request:
>>>>>>>
>>>>>>> Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024
>>>>>>> bits)
>>>>>>> Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
>>>>>>> ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
>>>>>>> Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
>>>>>>> Transmission Control Protocol, Src Port: 932, Dst Port: 2049,
>>>>>>> Seq: 1,
>>>>>>> Ack: 1, Len: 312
>>>>>>> Remote Procedure Call, Type:Call XID:0x1d3220c4
>>>>>>> Network File System
>>>>>>>      [Program Version: 4]
>>>>>>>      [V4 Procedure: COMPOUND (1)]
>>>>>>>      GSS Data, Ops(1): CREATE_SESSION
>>>>>>>          Length: 152
>>>>>>>          GSS Sequence Number: 76
>>>>>>>          Tag: <EMPTY>
>>>>>>>          minorversion: 2
>>>>>>>          Operations (count: 1): CREATE_SESSION
>>>>>>>          [Main Opcode: CREATE_SESSION (43)]
>>>>>>>      GSS Checksum:
>>>>>>> 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
>>>>>>>
>>>>>>>          GSS Token Length: 40
>>>>>>>          GSS-API Generic Security Service Application Program
>>>>>>> Interface
>>>>>>>              krb5_blob:
>>>>>>> 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
>>>>>>>
>>>>>>>
>>>>>>> response:
>>>>>>>
>>>>>>> Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648
>>>>>>> bits)
>>>>>>> Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
>>>>>>> HP_19:7d:4b (e0:73:e7:19:7d:4b)
>>>>>>> Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
>>>>>>> Transmission Control Protocol, Src Port: 2049, Dst Port: 932,
>>>>>>> Seq: 1,
>>>>>>> Ack: 313, Len: 140
>>>>>>> Remote Procedure Call, Type:Reply XID:0x1d3220c4
>>>>>>> Network File System
>>>>>>>      [Program Version: 4]
>>>>>>>      [V4 Procedure: COMPOUND (1)]
>>>>>>>      GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
>>>>>>>          Length: 24
>>>>>>>          GSS Sequence Number: 76
>>>>>>>          Status: NFS4ERR_DELAY (10008)
>>>>>>>          Tag: <EMPTY>
>>>>>>>          Operations (count: 1)
>>>>>>>          [Main Opcode: CREATE_SESSION (43)]
>>>>>>>      GSS Checksum:
>>>>>>> 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
>>>>>>>
>>>>>>>          GSS Token Length: 40
>>>>>>>          GSS-API Generic Security Service Application Program
>>>>>>> Interface
>>>>>>>              krb5_blob:
>>>>>>> 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
>>>>>>>
>>>>>>>
>>>>>>> I was hoping that giving the client a different IP address would
>>>>>>> resolve the issue for this client, but it didn't. Even though the
>>>>>>> client had a new IP address (hostname was kept the same), it
>>>>>>> failed to
>>>>>>> mount anything from the server.
>>>>>>>
>>>>> Changing the IP address won't help. The client is probably using the
>>>>> same long-form client id as before, so the server still identifies
>>>>> the
>>>>> client even with the address change.
>>>> How is the client id determined? Will changing the hostname of the
>>>> client trigger a change of the client id?
>>>>>
>>>>> Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
>>>>> The client is expected to back off and retry, so if the server keeps
>>>>> returning that repeatedly, then a hung mount command is expected.
>>>>>
>>>>> The question is why the server would keep returning DELAY. A lot of
>>>>> different problems ranging from memory allocation issues to protocol
>>>>> problems can result in that error. You may want to check the NFS
>>>>> server
>>>>> and see if anything was logged there.
>>>> There are no messages in the system logs that indicate any sort of
>>>> memory issue. We also increased the min_kbytes_free sysctl to 2G on
>>>> the server before we restarted it with the newer kernel.
>>>>>
>>>>> This is on a CREATE_SESSION call, so I wonder if the record held
>>>>> by the
>>>>> (courteous) server is somehow blocking the attempt to reestablish the
>>>>> session?
>>>>>
>>>>> Do you have a way to reproduce this? Since this is a centos
>>>>> kernel, you
>>>>> could follow the page here to open a bug:
>>>>
>>>> Unfortunately we haven't found a reliable way to reproduce it. But
>>>> we do seem to trigger it more and more lately.
>>>>
>>>> Regards,
>>>>
>>>> Rik
>>>>
>>>>>
>>>>> https://urldefense.com/v3/__https://wiki.centos.org/ReportBugs.html__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkWIqsboq$
>>>>>
>>>>>
>>>>>>> I created another dump of the workqueues and worker pools on the
>>>>>>> server:
>>>>>>>
>>>>>>> [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker
>>>>>>> pools:
>>>>>>> [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
>>>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0
>>>>>>> nice=0
>>>>>>> active=1/256 refcnt=2
>>>>>>> [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
>>>>>>> [drm_kms_helper]
>>>>>>> [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient:
>>>>>>> flags=0x80
>>>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0
>>>>>>> nice=0
>>>>>>> active=1/256 refcnt=2
>>>>>>> [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
>>>>>>> [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
>>>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0
>>>>>>> nice=0
>>>>>>> active=1/256 refcnt=3
>>>>>>> [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu
>>>>>>> BAR(362)
>>>>>>> [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
>>>>>>> [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0
>>>>>>> nice=-20
>>>>>>> active=1/256 refcnt=2
>>>>>>> [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
>>>>>>>
>>>>>>>
>>>>>>> In contrast to last time, it doesn't show anything regarding nfs
>>>>>>> this
>>>>>>> time.
>>>>>>>
>>>>>>> I also tried the suggestion from Dai Ngo (echo 3 >
>>>>>>> /proc/sys/vm/drop_caches), but that didn't seem to make any
>>>>>>> difference.
>>>>>>>
>>>>>>> We haven't restarted the server yet as it seems the impact seems to
>>>>>>> affect fewer clients that before. Is there anything we can run
>>>>>>> on the
>>>>>>> server to further debug this?
>>>>>>>
>>>>>>> In the past, the issue seemed to deteriorate rapidly and
>>>>>>> resulted in
>>>>>>> issues for almost all clients after about 20 minutes. This time the
>>>>>>> impact seems to be less, but it's not gone.
>>>>>>>
>>>>>>> How can we force the NFS server to forget about a specific
>>>>>>> client? I
>>>>>>> haven't tried to restart the nfs service yet as I'm afraid it will
>>>>>>> fail to stop as before.
>>>>>>>
>>>>> Not with that kernel. There are some new administrative interfaces
>>>>> that
>>>>> might allow that in the future, but they were just merged upstream
>>>>> and
>>>>> aren't in that kernel.
>>>>>
>>>>> --
>>>>> Jeff Layton <[email protected]>
>>>>
--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>


2024-03-21 20:49:10

by Jeffrey Layton

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

On Wed, 2024-03-20 at 20:41 +0100, Rik Theys wrote:
> Hi,
>
> On 3/19/24 22:42, Dai Ngo wrote:
> >
> > On 3/19/24 12:41 PM, Rik Theys wrote:
> > > Hi,
> > >
> > > On 3/19/24 18:09, Dai Ngo wrote:
> > > >
> > > > On 3/19/24 12:58 AM, Rik Theys wrote:
> > > > > Hi,
> > > > >
> > > > > On 3/18/24 22:54, Jeff Layton wrote:
> > > > > > On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On 3/18/24 21:21, Rik Theys wrote:
> > > > > > > > Hi Jeff,
> > > > > > > >
> > > > > > > > On 3/12/24 13:47, Jeff Layton wrote:
> > > > > > > > > On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
> > > > > > > > > > Hi Jeff,
> > > > > > > > > >
> > > > > > > > > > On 3/12/24 12:22, Jeff Layton wrote:
> > > > > > > > > > > On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
> > > > > > > > > > > > Since a few weeks our Rocky Linux 9 NFS server has periodically
> > > > > > > > > > > > logged hung nfsd tasks. The initial effect was that some
> > > > > > > > > > > > clients
> > > > > > > > > > > > could no longer access the NFS server. This got worse and worse
> > > > > > > > > > > > (probably as more nfsd threads got blocked) and we had to
> > > > > > > > > > > > restart
> > > > > > > > > > > > the server. Restarting the server also failed as the NFS server
> > > > > > > > > > > > service could no longer be stopped.
> > > > > > > > > > > >
> > > > > > > > > > > > The initial kernel we noticed this behavior on was
> > > > > > > > > > > > kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
> > > > > > > > > > > > kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same
> > > > > > > > > > > > issue
> > > > > > > > > > > > happened again on this newer kernel version:
> > > > > > > > > 419 is fairly up to date with nfsd changes. There are some
> > > > > > > > > known bugs
> > > > > > > > > around callbacks, and there is a draft MR in flight to fix it.
> > > > > > > > >
> > > > > > > > > What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ?
> > > > > > > > > If we
> > > > > > > > > can bracket the changes around a particular version, then that
> > > > > > > > > might
> > > > > > > > > help identify the problem.
> > > > > > > > >
> > > > > > > > > > > > [Mon Mar 11 14:10:08 2024]       Not tainted
> > > > > > > > > > > > 5.14.0-419.el9.x86_64 #1
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024] "echo 0 >
> > > > > > > > > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D
> > > > > > > > > > > > stack:0
> > > > > > > > > > > >      pid:8865  ppid:2      flags:0x00004000
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024] Call Trace:
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  <TASK>
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > > > select_idle_sibling+0x28/0x430
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > > > __pfx_schedule_timeout+0x10/0x10
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]
> > > > > > > > > > > >  nfsd4_shutdown_callback+0x49/0x120
> > > > > > > > > > > > [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > > > nfsd4_cld_remove+0x54/0x1d0 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > > > nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > > > nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290
> > > > > > > > > > > > [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]
> > > > > > > > > > > >  nfsd4_exchange_id+0x75f/0x770 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > > > nfsd4_decode_opaque+0x3a/0x90 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]
> > > > > > > > > > > >  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
> > > > > > > > > > > > [sunrpc]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > > > __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170
> > > > > > > > > > > > [sunrpc]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  </TASK>
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
> > > > > > > > > > > > more than 122 seconds.
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]       Not tainted
> > > > > > > > > > > > 5.14.0-419.el9.x86_64 #1
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024] "echo 0 >
> > > > > > > > > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D
> > > > > > > > > > > > stack:0
> > > > > > > > > > > >      pid:8866  ppid:2      flags:0x00004000
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024] Call Trace:
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  <TASK>
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > > > select_idle_sibling+0x28/0x430
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > > > __pfx_schedule_timeout+0x10/0x10
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]
> > > > > > > > > > > >  nfsd4_destroy_session+0x1a4/0x240
> > > > > > > > > > > > [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]
> > > > > > > > > > > >  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
> > > > > > > > > > > > [sunrpc]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ?
> > > > > > > > > > > > __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170
> > > > > > > > > > > > [sunrpc]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
> > > > > > > > > > > >     [Mon Mar 11 14:10:08 2024]  </TASK>
> > > > > > > > > > > >
> > > > > > > > > > > The above threads are trying to flush the workqueue, so that
> > > > > > > > > > > probably
> > > > > > > > > > > means that they are stuck waiting on a workqueue job to finish.
> > > > > > > > > > > >     The above is repeated a few times, and then this warning is
> > > > > > > > > > > > also logged:
> > > > > > > > > > > >     [Mon Mar 11 14:12:04 2024] ------------[ cut here
> > > > > > > > > > > > ]------------
> > > > > > > > > > > >     [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
> > > > > > > > > > > > fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
> > > > > > > > > > > > dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
> > > > > > > > > > > > iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill
> > > > > > > > > > > > nft_counter nft_ct
> > > > > > > > > > > >     nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
> > > > > > > > > > > > nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink
> > > > > > > > > > > > vfat
> > > > > > > > > > > > fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
> > > > > > > > > > > >     ibcrc32c dm_service_time dm_multipath intel_rapl_msr
> > > > > > > > > > > > intel_rapl_common intel_uncore_frequency
> > > > > > > > > > > > intel_uncore_frequency_common isst_if_common skx_edac nfit
> > > > > > > > > > > > libnvdimm ipmi_ssif x86_pkg_temp
> > > > > > > > > > > >     _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
> > > > > > > > > > > > dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
> > > > > > > > > > > > dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
> > > > > > > > > > > >     ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt
> > > > > > > > > > > > acpi_ipmi
> > > > > > > > > > > > mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
> > > > > > > > > > > > ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
> > > > > > > > > > > >     nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
> > > > > > > > > > > > mbcache jbd2 sd_mod sg lpfc
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc
> > > > > > > > > > > > nvme_fabrics
> > > > > > > > > > > > crct10dif_pclmul ahci libahci crc32_pclmul nvme_core
> > > > > > > > > > > > crc32c_intel
> > > > > > > > > > > > ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
> > > > > > > > > > > >     el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
> > > > > > > > > > > > dm_region_hash dm_log dm_mod
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
> > > > > > > > > > > > tainted 5.14.0-419.el9.x86_64 #1
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc.
> > > > > > > > > > > > PowerEdge
> > > > > > > > > > > > R740/00WGD1, BIOS 2.20.1 09/13/2023
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] RIP:
> > > > > > > > > > > > 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff
> > > > > > > > > > > > ff 48
> > > > > > > > > > > > 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8
> > > > > > > > > > > > c8 f9
> > > > > > > > > > > > 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
> > > > > > > > > > > >     02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80
> > > > > > > > > > > > EFLAGS:
> > > > > > > > > > > > 00010246
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
> > > > > > > > > > > > ffff8ada51930900 RCX: 0000000000000024
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
> > > > > > > > > > > > ffff8ad582933c00 RDI: 0000000000002000
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
> > > > > > > > > > > > ffff9929e0bb7b48 R09: 0000000000000000
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
> > > > > > > > > > > > 0000000000000000 R12: ffff8ad6f497c360
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
> > > > > > > > > > > > ffff8ae5942e0b10 R15: ffff8ad6f497c360
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
> > > > > > > > > > > > GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
> > > > > > > > > > > > 0000000080050033
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
> > > > > > > > > > > > 00000018e58de006 CR4: 00000000007706e0
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
> > > > > > > > > > > > 0000000000000000 DR2: 0000000000000000
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
> > > > > > > > > > > > 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] PKRU: 55555554
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] Call Trace:
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  <TASK>
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ?
> > > > > > > > > > > > show_trace_log_lvl+0x1c4/0x2df
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ?
> > > > > > > > > > > > show_trace_log_lvl+0x1c4/0x2df
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ?
> > > > > > > > > > > > nfsd_break_deleg_cb+0x170/0x190
> > > > > > > > > > > > [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ?
> > > > > > > > > > > > nfsd_break_deleg_cb+0x170/0x190
> > > > > > > > > > > > [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ?
> > > > > > > > > > > > nfsd_break_deleg_cb+0x170/0x190
> > > > > > > > > > > > [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ?
> > > > > > > > > > > > nfsd_file_lookup_locked+0x117/0x160 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]
> > > > > > > > > > > >  nfsd_file_do_acquire+0x790/0x830
> > > > > > > > > > > > [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]
> > > > > > > > > > > >  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]
> > > > > > > > > > > >  nfsd4_process_open2+0x430/0xa30 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]
> > > > > > > > > > > >  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
> > > > > > > > > > > > [sunrpc]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ?
> > > > > > > > > > > > __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170
> > > > > > > > > > > > [sunrpc]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024]  </TASK>
> > > > > > > > > > > >     [Mon Mar 11 14:12:05 2024] ---[ end trace
> > > > > > > > > > > > 7a039e17443dc651 ]---
> > > > > > > > > > > This is probably this WARN in nfsd_break_one_deleg:
> > > > > > > > > > >
> > > > > > > > > > > WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
> > > > > > > > > > >
> > > > > > > > > > > It means that a delegation break callback to the client
> > > > > > > > > > > couldn't be
> > > > > > > > > > > queued to the workqueue, and so it didn't run.
> > > > > > > > > > >
> > > > > > > > > > > > Could this be the same issue as described
> > > > > > > > > > > > here:https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/[email protected]/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdBV9En7$
> > > > > > > > > > > > ?
> > > > > > > > > > > Yes, most likely the same problem.
> > > > > > > > > > If I read that thread correctly, this issue was introduced
> > > > > > > > > > between
> > > > > > > > > > 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
> > > > > > > > > > backported these changes, or were we hitting some other bug
> > > > > > > > > > with that
> > > > > > > > > > version? It seems the 6.1.x kernel is not affected? If so, that
> > > > > > > > > > would be
> > > > > > > > > > the recommended kernel to run?
> > > > > > > > > Anything is possible. We have to identify the problem first.
> > > > > > > > > > > > As described in that thread, I've tried to obtain the requested
> > > > > > > > > > > > information.
> > > > > > > > > > > >
> > > > > > > > > > > > Is it possible this is the issue that was fixed by the patches
> > > > > > > > > > > > described
> > > > > > > > > > > > here?https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkedtUP09$
> > > > > > > > > > > >
> > > > > > > > > > > Doubtful. Those are targeted toward a different set of issues.
> > > > > > > > > > >
> > > > > > > > > > > If you're willing, I do have some patches queued up for
> > > > > > > > > > > CentOS here
> > > > > > > > > > > that
> > > > > > > > > > > fix some backchannel problems that could be related. I'm mainly
> > > > > > > > > > > waiting
> > > > > > > > > > > on Chuck to send these to Linus and then we'll likely merge
> > > > > > > > > > > them into
> > > > > > > > > > > CentOS soon afterward:
> > > > > > > > > > >
> > > > > > > > > > > https://urldefense.com/v3/__https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdvDn8y7$
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > If you can send me a patch file, I can rebuild the C9S kernel
> > > > > > > > > > with that
> > > > > > > > > > patch and run it. It can take a while for the bug to trigger as I
> > > > > > > > > > believe it seems to be very workload dependent (we were
> > > > > > > > > > running very
> > > > > > > > > > stable for months and now hit this bug every other week).
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > It's probably simpler to just pull down the build artifacts for
> > > > > > > > > that MR.
> > > > > > > > > You have to drill down through the CI for it, but they are here:
> > > > > > > > >
> > > > > > > > > https://urldefense.com/v3/__https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts*1194300175*publish_x86_64*6278921877*artifacts*__;Ly8vLy8!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkaP5eW8V$
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > There's even a repo file you can install on the box to pull
> > > > > > > > > them down.
> > > > > > > > We installed this kernel on the server 3 days ago. Today, a user
> > > > > > > > informed us that their screen was black after logging in.
> > > > > > > > Similar to
> > > > > > > > other occurrences of this issue, the mount command on the client
> > > > > > > > was
> > > > > > > > hung. But in contrast to the other times, there were no messages in
> > > > > > > > the logs kernel logs on the server. Even restarting the client does
> > > > > > > > not resolve the issue.
> > > > > >
> > > > > > Ok, so you rebooted the client and it's still unable to mount? That
> > > > > > sounds like a server problem if so.
> > > > > >
> > > > > > Are both client and server running the same kernel?
> > > > > No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
> > > > > 5.14.0-362.18.1.el9_3.
> > > > > >
> > > > > > > > Something still seems to be wrong on the server though. When I
> > > > > > > > look at
> > > > > > > > the directories under /proc/fs/nfsd/clients, there's still a
> > > > > > > > directory
> > > > > > > > for the specific client, even though it's no longer running:
> > > > > > > >
> > > > > > > > # cat 155/info
> > > > > > > > clientid: 0xc8edb7f65f4a9ad
> > > > > > > > address: "10.87.31.152:819"
> > > > > > > > status: confirmed
> > > > > > > > seconds from last renew: 33163
> > > > > > > > name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
> > > > > > > > minor version: 2
> > > > > > > > Implementation domain: "kernel.org"
> > > > > > > > Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
> > > > > > > > PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
> > > > > > > > Implementation time: [0, 0]
> > > > > > > > callback state: DOWN
> > > > > > > > callback address: 10.87.31.152:0
> > > > > > > >
> > > > > > If you just shut down the client, the server won't immediately
> > > > > > purge its
> > > > > > record. In fact, assuming you're running the same kernel on the
> > > > > > server,
> > > > > > it won't purge the client record until there is a conflicting request
> > > > > > for its state.
> > > > > Is there a way to force such a conflicting request (to get the
> > > > > client record to purge)?
> > > >
> > > > Try:
> > > >
> > > > # echo "expire" > /proc/fs/nfsd/clients/155/ctl
> > >
> > > I've tried that. The command hangs and can not be interrupted with
> > > ctrl-c.
> > > I've now also noticed in the dmesg output that the kernel issued the
> > > following WARNING a few hours ago. It wasn't directly triggered by
> > > the echo command above, but seems to have been triggered a few hours
> > > ago (probably when another client started to have the same problem as
> > > more clients are experiencing issues now).
> >
> > I think this warning message is harmless. However it indicates potential
> > problem with the workqueue which might be related to memory shortage.
> >
> > What the output of 'cat /proc/meminfo' looks like?
>
> I doubt the current values are useful, but they are:
>
> MemTotal:       196110860 kB
> MemFree:        29357112 kB
> MemAvailable:   179529420 kB
> Buffers:        11996096 kB
> Cached:         130589396 kB
> SwapCached:           52 kB
> Active:          1136988 kB
> Inactive:       144192468 kB
> Active(anon):     698564 kB
> Inactive(anon):  2657256 kB
> Active(file):     438424 kB
> Inactive(file): 141535212 kB
> Unevictable:       72140 kB
> Mlocked:           69068 kB
> SwapTotal:      67108860 kB
> SwapFree:       67106276 kB
> Zswap:                 0 kB
> Zswapped:              0 kB
> Dirty:             80812 kB
> Writeback:             0 kB
> AnonPages:       2806592 kB
> Mapped:           322700 kB
> Shmem:            599308 kB
> KReclaimable:   16977000 kB
> Slab:           18898736 kB
> SReclaimable:   16977000 kB
> SUnreclaim:      1921736 kB
> KernelStack:       18128 kB
> PageTables:        31716 kB
> SecPageTables:         0 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:    165164288 kB
> Committed_AS:    5223940 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:      300064 kB
> VmallocChunk:          0 kB
> Percpu:            45888 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:   2451456 kB
> ShmemHugePages:        0 kB
> ShmemPmdMapped:        0 kB
> FileHugePages:         0 kB
> FilePmdMapped:         0 kB
> CmaTotal:              0 kB
> CmaFree:               0 kB
> Unaccepted:            0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> Hugetlb:               0 kB
> DirectMap4k:     1303552 kB
> DirectMap2M:    28715008 kB
> DirectMap1G:    171966464 kB
>
>
> >
> > Did you try 'echo 3 > /proc/sys/vm/drop_caches'?
>
> Yes, I tried that when the first client hit the issue, but it didn't
> result in any unlocking of the client.
>
>
> >
> > >
> > > [Tue Mar 19 14:53:44 2024] ------------[ cut here ]------------
> > > [Tue Mar 19 14:53:44 2024] WARNING: CPU: 44 PID: 5843 at
> > > fs/nfsd/nfs4state.c:4920 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > [Tue Mar 19 14:53:44 2024] Modules linked in: nf_conntrack_netlink
> > > nfsv4 dns_resolver nfs fscache netfs binfmt_misc xsk_diag
> > > rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core bonding tls
> > > rfkill nft_counter nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> > > nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables
> > > nfnetlink vfat fat dm_thin_pool dm_persistent_data dm_bio_prison
> > > dm_bufio libcrc32c dm_service_time dm_multipath intel_rapl_msr
> > > intel_rapl_common intel_uncore_frequency
> > > intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm
> > > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm dcdbas
> > > irqbypass ipmi_ssif rapl intel_cstate mgag200 i2c_algo_bit
> > > drm_shmem_helper drm_kms_helper dell_smbios syscopyarea intel_uncore
> > > sysfillrect wmi_bmof dell_wmi_descriptor pcspkr sysimgblt fb_sys_fops
> > > mei_me i2c_i801 mei intel_pch_thermal acpi_ipmi i2c_smbus lpc_ich
> > > ipmi_si ipmi_devintf ipmi_msghandler joydev acpi_power_meter nfsd
> > > nfs_acl lockd auth_rpcgss grace drm fuse sunrpc ext4
> > > [Tue Mar 19 14:53:44 2024]  mbcache jbd2 sd_mod sg lpfc nvmet_fc
> > > nvmet nvme_fc nvme_fabrics crct10dif_pclmul crc32_pclmul nvme_core
> > > ixgbe crc32c_intel ahci libahci nvme_common megaraid_sas t10_pi
> > > ghash_clmulni_intel wdat_wdt libata scsi_transport_fc mdio dca wmi
> > > dm_mirror dm_region_hash dm_log dm_mod
> > > [Tue Mar 19 14:53:44 2024] CPU: 44 PID: 5843 Comm: nfsd Not tainted
> > > 5.14.0-427.3689_1194299994.el9.x86_64 #1
> > > [Tue Mar 19 14:53:44 2024] Hardware name: Dell Inc. PowerEdge
> > > R740/00WGD1, BIOS 2.20.1 09/13/2023
> > > [Tue Mar 19 14:53:44 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190
> > > [nfsd]
> > > [Tue Mar 19 14:53:44 2024] Code: 76 76 cd de e9 ff fe ff ff 48 89 df
> > > be 01 00 00 00 e8 34 a1 1b df 48 8d bb 98 00 00 00 e8 a8 fe 00 00 84
> > > c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be 02 00 00 00 48 89 df
> > > e8 0c a1 1b df e9 01
> > > [Tue Mar 19 14:53:44 2024] RSP: 0018:ffffb2878f2cfc38 EFLAGS: 00010246
> > > [Tue Mar 19 14:53:44 2024] RAX: 0000000000000000 RBX:
> > > ffff88d5171067b8 RCX: 0000000000000000
> > > [Tue Mar 19 14:53:44 2024] RDX: ffff88d517106880 RSI:
> > > ffff88bdceec8600 RDI: 0000000000002000
> > > [Tue Mar 19 14:53:44 2024] RBP: ffff88d68a38a284 R08:
> > > ffffb2878f2cfc00 R09: 0000000000000000
> > > [Tue Mar 19 14:53:44 2024] R10: ffff88bf57dd7878 R11:
> > > 0000000000000000 R12: ffff88d5b79c4798
> > > [Tue Mar 19 14:53:44 2024] R13: ffff88d68a38a270 R14:
> > > ffff88cab06ad0c8 R15: ffff88d5b79c4798
> > > [Tue Mar 19 14:53:44 2024] FS:  0000000000000000(0000)
> > > GS:ffff88d4a1180000(0000) knlGS:0000000000000000
> > > [Tue Mar 19 14:53:44 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
> > > 0000000080050033
> > > [Tue Mar 19 14:53:44 2024] CR2: 00007fe46ef90000 CR3:
> > > 000000019d010004 CR4: 00000000007706e0
> > > [Tue Mar 19 14:53:44 2024] DR0: 0000000000000000 DR1:
> > > 0000000000000000 DR2: 0000000000000000
> > > [Tue Mar 19 14:53:44 2024] DR3: 0000000000000000 DR6:
> > > 00000000fffe0ff0 DR7: 0000000000000400
> > > [Tue Mar 19 14:53:44 2024] PKRU: 55555554
> > > [Tue Mar 19 14:53:44 2024] Call Trace:
> > > [Tue Mar 19 14:53:44 2024]  <TASK>
> > > [Tue Mar 19 14:53:44 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> > > [Tue Mar 19 14:53:44 2024]  ? show_trace_log_lvl+0x1c4/0x2df
> > > [Tue Mar 19 14:53:44 2024]  ? __break_lease+0x16f/0x5f0
> > > [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  ? __warn+0x81/0x110
> > > [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  ? report_bug+0x10a/0x140
> > > [Tue Mar 19 14:53:44 2024]  ? handle_bug+0x3c/0x70
> > > [Tue Mar 19 14:53:44 2024]  ? exc_invalid_op+0x14/0x70
> > > [Tue Mar 19 14:53:44 2024]  ? asm_exc_invalid_op+0x16/0x20
> > > [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x96/0x190 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  __break_lease+0x16f/0x5f0
> > > [Tue Mar 19 14:53:44 2024]  nfs4_get_vfs_file+0x164/0x3a0 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
> > > [Tue Mar 19 14:53:44 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  svc_process+0x12d/0x170 [sunrpc]
> > > [Tue Mar 19 14:53:44 2024]  nfsd+0x84/0xb0 [nfsd]
> > > [Tue Mar 19 14:53:44 2024]  kthread+0xdd/0x100
> > > [Tue Mar 19 14:53:44 2024]  ? __pfx_kthread+0x10/0x10
> > > [Tue Mar 19 14:53:44 2024]  ret_from_fork+0x29/0x50
> > > [Tue Mar 19 14:53:44 2024]  </TASK>
> > > [Tue Mar 19 14:53:44 2024] ---[ end trace ed0b2b3f135c637d ]---
> > >
> > > It again seems to have been triggered in nfsd_break_deleg_cb?
> > >
> > > I also had the following perf command running a tmux on the server:
> > >
> > > perf trace -e nfsd:nfsd_cb_recall_any
> > >
> > > This has spewed a lot of messages. I'm including a short list here:
> > >
> > > ...
> > >
> > > 33464866.721 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210688785, bmval0: 1, addr: 0x7f331bb116c8)
> > > 33464866.724 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210688827, bmval0: 1, addr: 0x7f331bb11738)
> > > 33464866.729 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210688767, bmval0: 1, addr: 0x7f331bb117a8)
> > > 33464866.732 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210718132, bmval0: 1, addr: 0x7f331bb11818)
> > > 33464866.737 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210688952, bmval0: 1, addr: 0x7f331bb11888)
> > > 33464866.741 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210702355, bmval0: 1, addr: 0x7f331bb118f8)
> > > 33868414.001 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210688751, bmval0: 1, addr: 0x7f331be68620)
> > > 33868414.014 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210718536, bmval0: 1, addr: 0x7f331be68690)
> > > 33868414.018 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210719074, bmval0: 1, addr: 0x7f331be68700)
> > > 33868414.022 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210688916, bmval0: 1, addr: 0x7f331be68770)
> > > 33868414.026 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210688941, bmval0: 1, addr: 0x7f331be687e0)
> > > ...
> > >
> > > 33868414.924 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210688744, bmval0: 1, addr: 0x7f331be6d7f0)
> > > 33868414.929 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210717223, bmval0: 1, addr: 0x7f331be6d860)
> > > 33868414.934 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210716137, bmval0: 1, addr: 0x7f331be6d8d0)
> > > 34021240.903 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210688941, bmval0: 1, addr: 0x7f331c207de8)
> > > 34021240.917 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210718750, bmval0: 1, addr: 0x7f331c207e58)
> > > 34021240.922 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210688955, bmval0: 1, addr: 0x7f331c207ec8)
> > > 34021240.925 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
> > > 1710533037, cl_id: 210688975, bmval0: 1, addr: 0x7f331c207f38)
> > > ...
> > >
> > > I assume the cl_id is the client id? How can I map this to a client
> > > from /proc/fs/nfsd/clients?
> >
> > The hex value of 'clientid' printed from /proc/fs/nfsd/clients/XX/info
> > is a 64-bit value composed of:
> >
> > typedef struct {
> >         u32             cl_boot;
> >         u32             cl_id;
> > } clientid_t
> >
> > For example:
> >
> > clientid: 0xc8edb7f65f4a9ad
> >
> > cl_boot:  65f4a9add (1710533037)
> > cl_id:      c8edb7f (21068895)
> >
> > This should match a trace event with:
> >
> > nfsd:nfsd_cb_recall_any(cl_boot: 1710533037, cl_id: 21068895, bmval0:
> > XX, addr: 0xYYYYY)
> >
> > >
> > > If I understand it correctly, the recall_any should be called when
> > > either the system starts to experience memory pressure,
> >
> > yes.
> It seems odd that the system gets in such a state that has such high
> memory pressure. It doesn't run much else than NFS and Samba.
> >
> > > or it reaches the delegation limits?
> >
> > No, this feature was added to nfsd very recently. I don't think your
> > kernel has it.
> >
> > > I doubt the system is actually running out of memory here as there
> > > are no other indications.
> > > Shouldn't I get those "page allocation failure" messages if it does?
> > > How can I check the number of delegations/leases currently issued,
> > > what the current maximum is and how to increase it?
> >
> > Max delegations is 4 per 1MB of available memory. There is no
> > admin tool to adjust this value.
> /proc/locks currently has about 130k DELEG lines, so that should be a
> lot lower than the limit on a 192G ram server.
>
>
> >
> > I do not recommend running a production system with delegation
> > disabled. But for this specific issue, it might help to temporarily
> > disable delegation to isolate problem areas.
>
>
> I'm going to reboot the system with the 6.1.82 kernel (kernel-lt from
> elrepo). Maybe it has less new modern developments that may have
> introduced this.
>

If v6.1-ish kernel turns out to not help, then you may want to give a
v6.7 or v6.8 kernel a try. It helps if we know whether this problem is
reproducible on in more up to date kernels.

> I've been able to reproduce the situation on an additional client now
> that the issue happens on the server:
>
> 1. Log in on a client and mount the NFS share.
> 2. Open a file from the NFS share in vim so the client gets a read
> delegation from the server
> 3. Verify on the server in /proc/fs/nfsd/clients/*/states that the
> client has a delegation for the file
> 4. Forcefully reboot the client by running 'echo b > /proc/sysrq-trigger'
> 5. Watch the /proc/fs/nfsd/clients/*/info file on the server.
>
> The "seconds from last renew" will go up and at some point the callback
> state changes to "FAULT". Even when the lease delegation time (90s by
> default?) is over, the
>
> seconds from last renew keeps increasing. At some point the callback
> state changes to "DOWN". When the client is up again and remounts the
> share, the mount hangs on the client
>
> and on the server I notice there's a second directory for this client in
> the clients directory, even though the clientid is the same. The
> callback state for this new client is "UNKNOWN" and the callback address
> is "(einval)".
>
> This is on a client running Fedora 39 with the 6.7.9 kernel.
>

I'm a little unclear...do the above steps work correctly when the server
isn't in this state? I assume the above steps are not sufficient to
cause a problem when the server is behaving normally?

>
> I don't know yet if the same procedure can be used to trigger the
> behavior after the server is rebooted. I'm going to try to reproduce
> this on another system first.
>
> I would expect the delegations to expire automatically after 90s, but
> they remain in the states file of the "DOWN" client.
>

That would have been true a year or so ago, but there were some recent
changes to make the server more "courteous" toward clients that lose
contact for a while. If there are no conflicting requests for the state
they hold then the server will hold onto the lease (basically)
indefinitely, until there is such a conflict.

The client _should_ be able to log in and it cancel the old client
record though. It sounds like that's not working properly for some
reason and it's interfering with the ability to do a CREATE_SESSION.

>
> >
> > -Dai
> >
> > >
> > > Regarding the recall any call: from what I've read on kernelnewbies,
> > > this feature was introduced in the 6.2 kernel? When I look at the
> > > tree for 6.1.x, it was backported in 6.1.81? Is there a way to
> > > disable this support somehow?
> > >
> > > Regards,
> > >
> > > Rik
> > >
> > >
> > > >
> > > > -Dai
> > > >
> > > > > >
> > > > > >
> > > > > > > The nfsdclnts command for this client shows the following
> > > > > > > delegations:
> > > > > > >
> > > > > > > # nfsdclnts -f 155/states -t all
> > > > > > > Inode number | Type   | Access | Deny | ip address | Filename
> > > > > > > 169346743    | open   | r-     | --   | 10.87.31.152:819 |
> > > > > > > disconnected dentry
> > > > > > > 169346743    | deleg  | r      |      | 10.87.31.152:819 |
> > > > > > > disconnected dentry
> > > > > > > 169346746    | open   | r-     | --   | 10.87.31.152:819 |
> > > > > > > disconnected dentry
> > > > > > > 169346746    | deleg  | r      |      | 10.87.31.152:819 |
> > > > > > > disconnected dentry
> > > > > > >
> > > > > > > I see a lot of recent patches regarding directory delegations. Could
> > > > > > > this be related to this?
> > > > > > >
> > > > > > > Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory
> > > > > > > delegation?
> > > > > > >
> > > > > > >
> > > > > > No. Directory delegations are a new feature that's still under
> > > > > > development. They use some of the same machinery as file delegations,
> > > > > > but they wouldn't be a factor here.
> > > > > >
> > > > > > > > The system seems to have identified that the client is no longer
> > > > > > > > reachable, but the client entry does not go away. When a mount was
> > > > > > > > hanging on the client, there would be two directories in clients
> > > > > > > > for
> > > > > > > > the same client. Killing the mount command clears up the second
> > > > > > > > entry.
> > > > > > > >
> > > > > > > > Even after running conntrack -D on the server to remove the tcp
> > > > > > > > connection from the conntrack table, the entry doesn't go away
> > > > > > > > and the
> > > > > > > > client still can not mount anything from the server.
> > > > > > > >
> > > > > > > > A tcpdump on the client while a mount was running logged the
> > > > > > > > following
> > > > > > > > messages over and over again:
> > > > > > > >
> > > > > > > > request:
> > > > > > > >
> > > > > > > > Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024
> > > > > > > > bits)
> > > > > > > > Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
> > > > > > > > ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
> > > > > > > > Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
> > > > > > > > Transmission Control Protocol, Src Port: 932, Dst Port: 2049,
> > > > > > > > Seq: 1,
> > > > > > > > Ack: 1, Len: 312
> > > > > > > > Remote Procedure Call, Type:Call XID:0x1d3220c4
> > > > > > > > Network File System
> > > > > > > >      [Program Version: 4]
> > > > > > > >      [V4 Procedure: COMPOUND (1)]
> > > > > > > >      GSS Data, Ops(1): CREATE_SESSION
> > > > > > > >          Length: 152
> > > > > > > >          GSS Sequence Number: 76
> > > > > > > >          Tag: <EMPTY>
> > > > > > > >          minorversion: 2
> > > > > > > >          Operations (count: 1): CREATE_SESSION
> > > > > > > >          [Main Opcode: CREATE_SESSION (43)]
> > > > > > > >      GSS Checksum:
> > > > > > > > 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
> > > > > > > >
> > > > > > > >          GSS Token Length: 40
> > > > > > > >          GSS-API Generic Security Service Application Program
> > > > > > > > Interface
> > > > > > > >              krb5_blob:
> > > > > > > > 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
> > > > > > > >
> > > > > > > >
> > > > > > > > response:
> > > > > > > >
> > > > > > > > Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648
> > > > > > > > bits)
> > > > > > > > Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
> > > > > > > > HP_19:7d:4b (e0:73:e7:19:7d:4b)
> > > > > > > > Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
> > > > > > > > Transmission Control Protocol, Src Port: 2049, Dst Port: 932,
> > > > > > > > Seq: 1,
> > > > > > > > Ack: 313, Len: 140
> > > > > > > > Remote Procedure Call, Type:Reply XID:0x1d3220c4
> > > > > > > > Network File System
> > > > > > > >      [Program Version: 4]
> > > > > > > >      [V4 Procedure: COMPOUND (1)]
> > > > > > > >      GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
> > > > > > > >          Length: 24
> > > > > > > >          GSS Sequence Number: 76
> > > > > > > >          Status: NFS4ERR_DELAY (10008)
> > > > > > > >          Tag: <EMPTY>
> > > > > > > >          Operations (count: 1)
> > > > > > > >          [Main Opcode: CREATE_SESSION (43)]
> > > > > > > >      GSS Checksum:
> > > > > > > > 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
> > > > > > > >
> > > > > > > >          GSS Token Length: 40
> > > > > > > >          GSS-API Generic Security Service Application Program
> > > > > > > > Interface
> > > > > > > >              krb5_blob:
> > > > > > > > 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
> > > > > > > >
> > > > > > > >
> > > > > > > > I was hoping that giving the client a different IP address would
> > > > > > > > resolve the issue for this client, but it didn't. Even though the
> > > > > > > > client had a new IP address (hostname was kept the same), it
> > > > > > > > failed to
> > > > > > > > mount anything from the server.
> > > > > > > >
> > > > > > Changing the IP address won't help. The client is probably using the
> > > > > > same long-form client id as before, so the server still identifies
> > > > > > the
> > > > > > client even with the address change.
> > > > > How is the client id determined? Will changing the hostname of the
> > > > > client trigger a change of the client id?
> > > > > >
> > > > > > Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
> > > > > > The client is expected to back off and retry, so if the server keeps
> > > > > > returning that repeatedly, then a hung mount command is expected.
> > > > > >
> > > > > > The question is why the server would keep returning DELAY. A lot of
> > > > > > different problems ranging from memory allocation issues to protocol
> > > > > > problems can result in that error. You may want to check the NFS
> > > > > > server
> > > > > > and see if anything was logged there.
> > > > > There are no messages in the system logs that indicate any sort of
> > > > > memory issue. We also increased the min_kbytes_free sysctl to 2G on
> > > > > the server before we restarted it with the newer kernel.
> > > > > >
> > > > > > This is on a CREATE_SESSION call, so I wonder if the record held
> > > > > > by the
> > > > > > (courteous) server is somehow blocking the attempt to reestablish the
> > > > > > session?
> > > > > >
> > > > > > Do you have a way to reproduce this? Since this is a centos
> > > > > > kernel, you
> > > > > > could follow the page here to open a bug:
> > > > >
> > > > > Unfortunately we haven't found a reliable way to reproduce it. But
> > > > > we do seem to trigger it more and more lately.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rik
> > > > >
> > > > > >
> > > > > > https://urldefense.com/v3/__https://wiki.centos.org/ReportBugs.html__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkWIqsboq$
> > > > > >
> > > > > >
> > > > > > > > I created another dump of the workqueues and worker pools on the
> > > > > > > > server:
> > > > > > > >
> > > > > > > > [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker
> > > > > > > > pools:
> > > > > > > > [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
> > > > > > > > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0
> > > > > > > > nice=0
> > > > > > > > active=1/256 refcnt=2
> > > > > > > > [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
> > > > > > > > [drm_kms_helper]
> > > > > > > > [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient:
> > > > > > > > flags=0x80
> > > > > > > > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0
> > > > > > > > nice=0
> > > > > > > > active=1/256 refcnt=2
> > > > > > > > [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
> > > > > > > > [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
> > > > > > > > [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0
> > > > > > > > nice=0
> > > > > > > > active=1/256 refcnt=3
> > > > > > > > [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu
> > > > > > > > BAR(362)
> > > > > > > > [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
> > > > > > > > [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0
> > > > > > > > nice=-20
> > > > > > > > active=1/256 refcnt=2
> > > > > > > > [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
> > > > > > > >
> > > > > > > >
> > > > > > > > In contrast to last time, it doesn't show anything regarding nfs
> > > > > > > > this
> > > > > > > > time.
> > > > > > > >
> > > > > > > > I also tried the suggestion from Dai Ngo (echo 3 >
> > > > > > > > /proc/sys/vm/drop_caches), but that didn't seem to make any
> > > > > > > > difference.
> > > > > > > >
> > > > > > > > We haven't restarted the server yet as it seems the impact seems to
> > > > > > > > affect fewer clients that before. Is there anything we can run
> > > > > > > > on the
> > > > > > > > server to further debug this?
> > > > > > > >
> > > > > > > > In the past, the issue seemed to deteriorate rapidly and
> > > > > > > > resulted in
> > > > > > > > issues for almost all clients after about 20 minutes. This time the
> > > > > > > > impact seems to be less, but it's not gone.
> > > > > > > >
> > > > > > > > How can we force the NFS server to forget about a specific
> > > > > > > > client? I
> > > > > > > > haven't tried to restart the nfs service yet as I'm afraid it will
> > > > > > > > fail to stop as before.
> > > > > > > >
> > > > > > Not with that kernel. There are some new administrative interfaces
> > > > > > that
> > > > > > might allow that in the future, but they were just merged upstream
> > > > > > and
> > > > > > aren't in that kernel.
> > > > > >
> > > > > > --
> > > > > > Jeff Layton <[email protected]>
> > > > >

--
Jeff Layton <[email protected]>

2024-03-21 21:14:45

by Rik Theys

[permalink] [raw]
Subject: Re: nfsd hangs and nfsd_break_deleg_cb+0x170/0x190 warning

Hi,

On 3/21/24 21:48, Jeff Layton wrote:
> On Wed, 2024-03-20 at 20:41 +0100, Rik Theys wrote:
>> Hi,
>>
>> On 3/19/24 22:42, Dai Ngo wrote:
>>> On 3/19/24 12:41 PM, Rik Theys wrote:
>>>> Hi,
>>>>
>>>> On 3/19/24 18:09, Dai Ngo wrote:
>>>>> On 3/19/24 12:58 AM, Rik Theys wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 3/18/24 22:54, Jeff Layton wrote:
>>>>>>> On Mon, 2024-03-18 at 22:15 +0100, Rik Theys wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 3/18/24 21:21, Rik Theys wrote:
>>>>>>>>> Hi Jeff,
>>>>>>>>>
>>>>>>>>> On 3/12/24 13:47, Jeff Layton wrote:
>>>>>>>>>> On Tue, 2024-03-12 at 13:24 +0100, Rik Theys wrote:
>>>>>>>>>>> Hi Jeff,
>>>>>>>>>>>
>>>>>>>>>>> On 3/12/24 12:22, Jeff Layton wrote:
>>>>>>>>>>>> On Mon, 2024-03-11 at 19:43 +0100, Rik Theys wrote:
>>>>>>>>>>>>> Since a few weeks our Rocky Linux 9 NFS server has periodically
>>>>>>>>>>>>> logged hung nfsd tasks. The initial effect was that some
>>>>>>>>>>>>> clients
>>>>>>>>>>>>> could no longer access the NFS server. This got worse and worse
>>>>>>>>>>>>> (probably as more nfsd threads got blocked) and we had to
>>>>>>>>>>>>> restart
>>>>>>>>>>>>> the server. Restarting the server also failed as the NFS server
>>>>>>>>>>>>> service could no longer be stopped.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The initial kernel we noticed this behavior on was
>>>>>>>>>>>>> kernel-5.14.0-362.18.1.el9_3.x86_64. Since then we've installed
>>>>>>>>>>>>> kernel-5.14.0-419.el9.x86_64 from CentOS Stream 9. The same
>>>>>>>>>>>>> issue
>>>>>>>>>>>>> happened again on this newer kernel version:
>>>>>>>>>> 419 is fairly up to date with nfsd changes. There are some
>>>>>>>>>> known bugs
>>>>>>>>>> around callbacks, and there is a draft MR in flight to fix it.
>>>>>>>>>>
>>>>>>>>>> What kernel were you on prior to 5.14.0-362.18.1.el9_3.x86_64 ?
>>>>>>>>>> If we
>>>>>>>>>> can bracket the changes around a particular version, then that
>>>>>>>>>> might
>>>>>>>>>> help identify the problem.
>>>>>>>>>>
>>>>>>>>>>>>> [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D
>>>>>>>>>>>>> stack:0
>>>>>>>>>>>>>      pid:8865  ppid:2      flags:0x00004000
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>>>> select_idle_sibling+0x28/0x430
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>>>> __pfx_schedule_timeout+0x10/0x10
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>>>>  nfsd4_shutdown_callback+0x49/0x120
>>>>>>>>>>>>> [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>>>> nfsd4_cld_remove+0x54/0x1d0 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>>>> nfsd4_return_all_client_layouts+0xc4/0xf0 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>>>> nfsd4_shutdown_copy+0x68/0xc0 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __destroy_client+0x1f3/0x290
>>>>>>>>>>>>> [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>>>>  nfsd4_exchange_id+0x75f/0x770 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>>>> nfsd4_decode_opaque+0x3a/0x90 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>>>>  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>>>>>> [sunrpc]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>>>> __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170
>>>>>>>>>>>>> [sunrpc]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] INFO: task nfsd:8866 blocked for
>>>>>>>>>>>>> more than 122 seconds.
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]       Not tainted
>>>>>>>>>>>>> 5.14.0-419.el9.x86_64 #1
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] "echo 0 >
>>>>>>>>>>>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]task:nfsd            state:D
>>>>>>>>>>>>> stack:0
>>>>>>>>>>>>>      pid:8866  ppid:2      flags:0x00004000
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024] Call Trace:
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  <TASK>
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __schedule+0x21b/0x550
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule+0x2d/0x70
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  schedule_timeout+0x11f/0x160
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>>>> select_idle_sibling+0x28/0x430
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? tcp_recvmsg+0x196/0x210
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? wake_affine+0x62/0x1f0
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __wait_for_common+0x90/0x1d0
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>>>> __pfx_schedule_timeout+0x10/0x10
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  __flush_workqueue+0x13a/0x3f0
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>>>>  nfsd4_destroy_session+0x1a4/0x240
>>>>>>>>>>>>> [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]
>>>>>>>>>>>>>  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>>>>>> [sunrpc]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ?
>>>>>>>>>>>>> __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  svc_process+0x12d/0x170
>>>>>>>>>>>>> [sunrpc]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  kthread+0xdd/0x100
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>>>>>     [Mon Mar 11 14:10:08 2024]  </TASK>
>>>>>>>>>>>>>
>>>>>>>>>>>> The above threads are trying to flush the workqueue, so that
>>>>>>>>>>>> probably
>>>>>>>>>>>> means that they are stuck waiting on a workqueue job to finish.
>>>>>>>>>>>>>     The above is repeated a few times, and then this warning is
>>>>>>>>>>>>> also logged:
>>>>>>>>>>>>>     [Mon Mar 11 14:12:04 2024] ------------[ cut here
>>>>>>>>>>>>> ]------------
>>>>>>>>>>>>>     [Mon Mar 11 14:12:04 2024] WARNING: CPU: 39 PID: 8844 at
>>>>>>>>>>>>> fs/nfsd/nfs4state.c:4919 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Modules linked in: nfsv4
>>>>>>>>>>>>> dns_resolver nfs fscache netfs rpcsec_gss_krb5 rpcrdma rdma_cm
>>>>>>>>>>>>> iw_cm ib_cm ib_core binfmt_misc bonding tls rfkill
>>>>>>>>>>>>> nft_counter nft_ct
>>>>>>>>>>>>>     nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_reject_inet
>>>>>>>>>>>>> nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables nfnetlink
>>>>>>>>>>>>> vfat
>>>>>>>>>>>>> fat dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio l
>>>>>>>>>>>>>     ibcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>>>>>>>>>>> intel_rapl_common intel_uncore_frequency
>>>>>>>>>>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit
>>>>>>>>>>>>> libnvdimm ipmi_ssif x86_pkg_temp
>>>>>>>>>>>>>     _thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
>>>>>>>>>>>>> dcdbas rapl intel_cstate mgag200 i2c_algo_bit drm_shmem_helper
>>>>>>>>>>>>> dell_smbios drm_kms_helper dell_wmi_descriptor wmi_bmof intel_u
>>>>>>>>>>>>>     ncore syscopyarea pcspkr sysfillrect mei_me sysimgblt
>>>>>>>>>>>>> acpi_ipmi
>>>>>>>>>>>>> mei fb_sys_fops i2c_i801 ipmi_si intel_pch_thermal lpc_ich
>>>>>>>>>>>>> ipmi_devintf i2c_smbus ipmi_msghandler joydev acpi_power_meter
>>>>>>>>>>>>>     nfsd auth_rpcgss nfs_acl drm lockd grace fuse sunrpc ext4
>>>>>>>>>>>>> mbcache jbd2 sd_mod sg lpfc
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nvmet_fc nvmet nvme_fc
>>>>>>>>>>>>> nvme_fabrics
>>>>>>>>>>>>> crct10dif_pclmul ahci libahci crc32_pclmul nvme_core
>>>>>>>>>>>>> crc32c_intel
>>>>>>>>>>>>> ixgbe megaraid_sas libata nvme_common ghash_clmulni_int
>>>>>>>>>>>>>     el t10_pi wdat_wdt scsi_transport_fc mdio wmi dca dm_mirror
>>>>>>>>>>>>> dm_region_hash dm_log dm_mod
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] CPU: 39 PID: 8844 Comm: nfsd Not
>>>>>>>>>>>>> tainted 5.14.0-419.el9.x86_64 #1
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Hardware name: Dell Inc.
>>>>>>>>>>>>> PowerEdge
>>>>>>>>>>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RIP:
>>>>>>>>>>>>> 0010:nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Code: a6 95 c5 f3 e9 ff fe ff
>>>>>>>>>>>>> ff 48
>>>>>>>>>>>>> 89 df be 01 00 00 00 e8 34 b5 13 f4 48 8d bb 98 00 00 00 e8
>>>>>>>>>>>>> c8 f9
>>>>>>>>>>>>> 00 00 84 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be
>>>>>>>>>>>>>     02 00 00 00 48 89 df e8 0c b5 13 f4 e9 01
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RSP: 0018:ffff9929e0bb7b80
>>>>>>>>>>>>> EFLAGS:
>>>>>>>>>>>>> 00010246
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RAX: 0000000000000000 RBX:
>>>>>>>>>>>>> ffff8ada51930900 RCX: 0000000000000024
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RDX: ffff8ada519309c8 RSI:
>>>>>>>>>>>>> ffff8ad582933c00 RDI: 0000000000002000
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] RBP: ffff8ad46bf21574 R08:
>>>>>>>>>>>>> ffff9929e0bb7b48 R09: 0000000000000000
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] R10: ffff8aec859a2948 R11:
>>>>>>>>>>>>> 0000000000000000 R12: ffff8ad6f497c360
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] R13: ffff8ad46bf21560 R14:
>>>>>>>>>>>>> ffff8ae5942e0b10 R15: ffff8ad6f497c360
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] FS:  0000000000000000(0000)
>>>>>>>>>>>>> GS:ffff8b031fcc0000(0000) knlGS:0000000000000000
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>>>>>>>>>>>> 0000000080050033
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] CR2: 00007fafe2060744 CR3:
>>>>>>>>>>>>> 00000018e58de006 CR4: 00000000007706e0
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] DR0: 0000000000000000 DR1:
>>>>>>>>>>>>> 0000000000000000 DR2: 0000000000000000
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] DR3: 0000000000000000 DR6:
>>>>>>>>>>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] PKRU: 55555554
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] Call Trace:
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  <TASK>
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>>>> show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>>>> show_trace_log_lvl+0x1c4/0x2df
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __break_lease+0x16f/0x5f0
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>>>> nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>>>>>> [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __warn+0x81/0x110
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>>>> nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>>>>>> [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? report_bug+0x10a/0x140
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? handle_bug+0x3c/0x70
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? exc_invalid_op+0x14/0x70
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>>>> nfsd_break_deleg_cb+0x170/0x190
>>>>>>>>>>>>> [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  __break_lease+0x16f/0x5f0
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>>>> nfsd_file_lookup_locked+0x117/0x160 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? list_lru_del+0x101/0x150
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]
>>>>>>>>>>>>>  nfsd_file_do_acquire+0x790/0x830
>>>>>>>>>>>>> [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]
>>>>>>>>>>>>>  nfs4_get_vfs_file+0x315/0x3a0 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]
>>>>>>>>>>>>>  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]
>>>>>>>>>>>>>  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  svc_process_common+0x2ec/0x660
>>>>>>>>>>>>> [sunrpc]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ?
>>>>>>>>>>>>> __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  svc_process+0x12d/0x170
>>>>>>>>>>>>> [sunrpc]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  nfsd+0x84/0xb0 [nfsd]
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  kthread+0xdd/0x100
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ? __pfx_kthread+0x10/0x10
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  ret_from_fork+0x29/0x50
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024]  </TASK>
>>>>>>>>>>>>>     [Mon Mar 11 14:12:05 2024] ---[ end trace
>>>>>>>>>>>>> 7a039e17443dc651 ]---
>>>>>>>>>>>> This is probably this WARN in nfsd_break_one_deleg:
>>>>>>>>>>>>
>>>>>>>>>>>> WARN_ON_ONCE(!nfsd4_run_cb(&dp->dl_recall));
>>>>>>>>>>>>
>>>>>>>>>>>> It means that a delegation break callback to the client
>>>>>>>>>>>> couldn't be
>>>>>>>>>>>> queued to the workqueue, and so it didn't run.
>>>>>>>>>>>>
>>>>>>>>>>>>> Could this be the same issue as described
>>>>>>>>>>>>> here:https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/[email protected]/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdBV9En7$
>>>>>>>>>>>>> ?
>>>>>>>>>>>> Yes, most likely the same problem.
>>>>>>>>>>> If I read that thread correctly, this issue was introduced
>>>>>>>>>>> between
>>>>>>>>>>> 6.1.63 and 6.6.3? Is it possible the EL9 5.14.0-362.18.1.el9_3
>>>>>>>>>>> backported these changes, or were we hitting some other bug
>>>>>>>>>>> with that
>>>>>>>>>>> version? It seems the 6.1.x kernel is not affected? If so, that
>>>>>>>>>>> would be
>>>>>>>>>>> the recommended kernel to run?
>>>>>>>>>> Anything is possible. We have to identify the problem first.
>>>>>>>>>>>>> As described in that thread, I've tried to obtain the requested
>>>>>>>>>>>>> information.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is it possible this is the issue that was fixed by the patches
>>>>>>>>>>>>> described
>>>>>>>>>>>>> here?https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/2024022054-cause-suffering-eae8@gregkh/__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkedtUP09$
>>>>>>>>>>>>>
>>>>>>>>>>>> Doubtful. Those are targeted toward a different set of issues.
>>>>>>>>>>>>
>>>>>>>>>>>> If you're willing, I do have some patches queued up for
>>>>>>>>>>>> CentOS here
>>>>>>>>>>>> that
>>>>>>>>>>>> fix some backchannel problems that could be related. I'm mainly
>>>>>>>>>>>> waiting
>>>>>>>>>>>> on Chuck to send these to Linus and then we'll likely merge
>>>>>>>>>>>> them into
>>>>>>>>>>>> CentOS soon afterward:
>>>>>>>>>>>>
>>>>>>>>>>>> https://urldefense.com/v3/__https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3689__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkdvDn8y7$
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> If you can send me a patch file, I can rebuild the C9S kernel
>>>>>>>>>>> with that
>>>>>>>>>>> patch and run it. It can take a while for the bug to trigger as I
>>>>>>>>>>> believe it seems to be very workload dependent (we were
>>>>>>>>>>> running very
>>>>>>>>>>> stable for months and now hit this bug every other week).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> It's probably simpler to just pull down the build artifacts for
>>>>>>>>>> that MR.
>>>>>>>>>> You have to drill down through the CI for it, but they are here:
>>>>>>>>>>
>>>>>>>>>> https://urldefense.com/v3/__https://s3.amazonaws.com/arr-cki-prod-trusted-artifacts/index.html?prefix=trusted-artifacts*1194300175*publish_x86_64*6278921877*artifacts*__;Ly8vLy8!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkaP5eW8V$
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> There's even a repo file you can install on the box to pull
>>>>>>>>>> them down.
>>>>>>>>> We installed this kernel on the server 3 days ago. Today, a user
>>>>>>>>> informed us that their screen was black after logging in.
>>>>>>>>> Similar to
>>>>>>>>> other occurrences of this issue, the mount command on the client
>>>>>>>>> was
>>>>>>>>> hung. But in contrast to the other times, there were no messages in
>>>>>>>>> the logs kernel logs on the server. Even restarting the client does
>>>>>>>>> not resolve the issue.
>>>>>>> Ok, so you rebooted the client and it's still unable to mount? That
>>>>>>> sounds like a server problem if so.
>>>>>>>
>>>>>>> Are both client and server running the same kernel?
>>>>>> No, the server runs 5.14.0-427.3689_1194299994.el9 and the client
>>>>>> 5.14.0-362.18.1.el9_3.
>>>>>>>>> Something still seems to be wrong on the server though. When I
>>>>>>>>> look at
>>>>>>>>> the directories under /proc/fs/nfsd/clients, there's still a
>>>>>>>>> directory
>>>>>>>>> for the specific client, even though it's no longer running:
>>>>>>>>>
>>>>>>>>> # cat 155/info
>>>>>>>>> clientid: 0xc8edb7f65f4a9ad
>>>>>>>>> address: "10.87.31.152:819"
>>>>>>>>> status: confirmed
>>>>>>>>> seconds from last renew: 33163
>>>>>>>>> name: "Linux NFSv4.2 bersalis.esat.kuleuven.be"
>>>>>>>>> minor version: 2
>>>>>>>>> Implementation domain: "kernel.org"
>>>>>>>>> Implementation name: "Linux 5.14.0-362.18.1.el9_3.0.1.x86_64 #1 SMP
>>>>>>>>> PREEMPT_DYNAMIC Sun Feb 11 13:49:23 UTC 2024 x86_64"
>>>>>>>>> Implementation time: [0, 0]
>>>>>>>>> callback state: DOWN
>>>>>>>>> callback address: 10.87.31.152:0
>>>>>>>>>
>>>>>>> If you just shut down the client, the server won't immediately
>>>>>>> purge its
>>>>>>> record. In fact, assuming you're running the same kernel on the
>>>>>>> server,
>>>>>>> it won't purge the client record until there is a conflicting request
>>>>>>> for its state.
>>>>>> Is there a way to force such a conflicting request (to get the
>>>>>> client record to purge)?
>>>>> Try:
>>>>>
>>>>> # echo "expire" > /proc/fs/nfsd/clients/155/ctl
>>>> I've tried that. The command hangs and can not be interrupted with
>>>> ctrl-c.
>>>> I've now also noticed in the dmesg output that the kernel issued the
>>>> following WARNING a few hours ago. It wasn't directly triggered by
>>>> the echo command above, but seems to have been triggered a few hours
>>>> ago (probably when another client started to have the same problem as
>>>> more clients are experiencing issues now).
>>> I think this warning message is harmless. However it indicates potential
>>> problem with the workqueue which might be related to memory shortage.
>>>
>>> What the output of 'cat /proc/meminfo' looks like?
>> I doubt the current values are useful, but they are:
>>
>> MemTotal:       196110860 kB
>> MemFree:        29357112 kB
>> MemAvailable:   179529420 kB
>> Buffers:        11996096 kB
>> Cached:         130589396 kB
>> SwapCached:           52 kB
>> Active:          1136988 kB
>> Inactive:       144192468 kB
>> Active(anon):     698564 kB
>> Inactive(anon):  2657256 kB
>> Active(file):     438424 kB
>> Inactive(file): 141535212 kB
>> Unevictable:       72140 kB
>> Mlocked:           69068 kB
>> SwapTotal:      67108860 kB
>> SwapFree:       67106276 kB
>> Zswap:                 0 kB
>> Zswapped:              0 kB
>> Dirty:             80812 kB
>> Writeback:             0 kB
>> AnonPages:       2806592 kB
>> Mapped:           322700 kB
>> Shmem:            599308 kB
>> KReclaimable:   16977000 kB
>> Slab:           18898736 kB
>> SReclaimable:   16977000 kB
>> SUnreclaim:      1921736 kB
>> KernelStack:       18128 kB
>> PageTables:        31716 kB
>> SecPageTables:         0 kB
>> NFS_Unstable:          0 kB
>> Bounce:                0 kB
>> WritebackTmp:          0 kB
>> CommitLimit:    165164288 kB
>> Committed_AS:    5223940 kB
>> VmallocTotal:   34359738367 kB
>> VmallocUsed:      300064 kB
>> VmallocChunk:          0 kB
>> Percpu:            45888 kB
>> HardwareCorrupted:     0 kB
>> AnonHugePages:   2451456 kB
>> ShmemHugePages:        0 kB
>> ShmemPmdMapped:        0 kB
>> FileHugePages:         0 kB
>> FilePmdMapped:         0 kB
>> CmaTotal:              0 kB
>> CmaFree:               0 kB
>> Unaccepted:            0 kB
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:       2048 kB
>> Hugetlb:               0 kB
>> DirectMap4k:     1303552 kB
>> DirectMap2M:    28715008 kB
>> DirectMap1G:    171966464 kB
>>
>>
>>> Did you try 'echo 3 > /proc/sys/vm/drop_caches'?
>> Yes, I tried that when the first client hit the issue, but it didn't
>> result in any unlocking of the client.
>>
>>
>>>> [Tue Mar 19 14:53:44 2024] ------------[ cut here ]------------
>>>> [Tue Mar 19 14:53:44 2024] WARNING: CPU: 44 PID: 5843 at
>>>> fs/nfsd/nfs4state.c:4920 nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024] Modules linked in: nf_conntrack_netlink
>>>> nfsv4 dns_resolver nfs fscache netfs binfmt_misc xsk_diag
>>>> rpcsec_gss_krb5 rpcrdma rdma_cm iw_cm ib_cm ib_core bonding tls
>>>> rfkill nft_counter nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
>>>> nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_tables
>>>> nfnetlink vfat fat dm_thin_pool dm_persistent_data dm_bio_prison
>>>> dm_bufio libcrc32c dm_service_time dm_multipath intel_rapl_msr
>>>> intel_rapl_common intel_uncore_frequency
>>>> intel_uncore_frequency_common isst_if_common skx_edac nfit libnvdimm
>>>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm dcdbas
>>>> irqbypass ipmi_ssif rapl intel_cstate mgag200 i2c_algo_bit
>>>> drm_shmem_helper drm_kms_helper dell_smbios syscopyarea intel_uncore
>>>> sysfillrect wmi_bmof dell_wmi_descriptor pcspkr sysimgblt fb_sys_fops
>>>> mei_me i2c_i801 mei intel_pch_thermal acpi_ipmi i2c_smbus lpc_ich
>>>> ipmi_si ipmi_devintf ipmi_msghandler joydev acpi_power_meter nfsd
>>>> nfs_acl lockd auth_rpcgss grace drm fuse sunrpc ext4
>>>> [Tue Mar 19 14:53:44 2024]  mbcache jbd2 sd_mod sg lpfc nvmet_fc
>>>> nvmet nvme_fc nvme_fabrics crct10dif_pclmul crc32_pclmul nvme_core
>>>> ixgbe crc32c_intel ahci libahci nvme_common megaraid_sas t10_pi
>>>> ghash_clmulni_intel wdat_wdt libata scsi_transport_fc mdio dca wmi
>>>> dm_mirror dm_region_hash dm_log dm_mod
>>>> [Tue Mar 19 14:53:44 2024] CPU: 44 PID: 5843 Comm: nfsd Not tainted
>>>> 5.14.0-427.3689_1194299994.el9.x86_64 #1
>>>> [Tue Mar 19 14:53:44 2024] Hardware name: Dell Inc. PowerEdge
>>>> R740/00WGD1, BIOS 2.20.1 09/13/2023
>>>> [Tue Mar 19 14:53:44 2024] RIP: 0010:nfsd_break_deleg_cb+0x170/0x190
>>>> [nfsd]
>>>> [Tue Mar 19 14:53:44 2024] Code: 76 76 cd de e9 ff fe ff ff 48 89 df
>>>> be 01 00 00 00 e8 34 a1 1b df 48 8d bb 98 00 00 00 e8 a8 fe 00 00 84
>>>> c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff be 02 00 00 00 48 89 df
>>>> e8 0c a1 1b df e9 01
>>>> [Tue Mar 19 14:53:44 2024] RSP: 0018:ffffb2878f2cfc38 EFLAGS: 00010246
>>>> [Tue Mar 19 14:53:44 2024] RAX: 0000000000000000 RBX:
>>>> ffff88d5171067b8 RCX: 0000000000000000
>>>> [Tue Mar 19 14:53:44 2024] RDX: ffff88d517106880 RSI:
>>>> ffff88bdceec8600 RDI: 0000000000002000
>>>> [Tue Mar 19 14:53:44 2024] RBP: ffff88d68a38a284 R08:
>>>> ffffb2878f2cfc00 R09: 0000000000000000
>>>> [Tue Mar 19 14:53:44 2024] R10: ffff88bf57dd7878 R11:
>>>> 0000000000000000 R12: ffff88d5b79c4798
>>>> [Tue Mar 19 14:53:44 2024] R13: ffff88d68a38a270 R14:
>>>> ffff88cab06ad0c8 R15: ffff88d5b79c4798
>>>> [Tue Mar 19 14:53:44 2024] FS:  0000000000000000(0000)
>>>> GS:ffff88d4a1180000(0000) knlGS:0000000000000000
>>>> [Tue Mar 19 14:53:44 2024] CS:  0010 DS: 0000 ES: 0000 CR0:
>>>> 0000000080050033
>>>> [Tue Mar 19 14:53:44 2024] CR2: 00007fe46ef90000 CR3:
>>>> 000000019d010004 CR4: 00000000007706e0
>>>> [Tue Mar 19 14:53:44 2024] DR0: 0000000000000000 DR1:
>>>> 0000000000000000 DR2: 0000000000000000
>>>> [Tue Mar 19 14:53:44 2024] DR3: 0000000000000000 DR6:
>>>> 00000000fffe0ff0 DR7: 0000000000000400
>>>> [Tue Mar 19 14:53:44 2024] PKRU: 55555554
>>>> [Tue Mar 19 14:53:44 2024] Call Trace:
>>>> [Tue Mar 19 14:53:44 2024]  <TASK>
>>>> [Tue Mar 19 14:53:44 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>> [Tue Mar 19 14:53:44 2024]  ? show_trace_log_lvl+0x1c4/0x2df
>>>> [Tue Mar 19 14:53:44 2024]  ? __break_lease+0x16f/0x5f0
>>>> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  ? __warn+0x81/0x110
>>>> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  ? report_bug+0x10a/0x140
>>>> [Tue Mar 19 14:53:44 2024]  ? handle_bug+0x3c/0x70
>>>> [Tue Mar 19 14:53:44 2024]  ? exc_invalid_op+0x14/0x70
>>>> [Tue Mar 19 14:53:44 2024]  ? asm_exc_invalid_op+0x16/0x20
>>>> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x170/0x190 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  ? nfsd_break_deleg_cb+0x96/0x190 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  __break_lease+0x16f/0x5f0
>>>> [Tue Mar 19 14:53:44 2024]  nfs4_get_vfs_file+0x164/0x3a0 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  nfsd4_process_open2+0x430/0xa30 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  ? fh_verify+0x297/0x2f0 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  nfsd4_open+0x3ce/0x4b0 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  nfsd4_proc_compound+0x44b/0x700 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  nfsd_dispatch+0x94/0x1c0 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  svc_process_common+0x2ec/0x660 [sunrpc]
>>>> [Tue Mar 19 14:53:44 2024]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  svc_process+0x12d/0x170 [sunrpc]
>>>> [Tue Mar 19 14:53:44 2024]  nfsd+0x84/0xb0 [nfsd]
>>>> [Tue Mar 19 14:53:44 2024]  kthread+0xdd/0x100
>>>> [Tue Mar 19 14:53:44 2024]  ? __pfx_kthread+0x10/0x10
>>>> [Tue Mar 19 14:53:44 2024]  ret_from_fork+0x29/0x50
>>>> [Tue Mar 19 14:53:44 2024]  </TASK>
>>>> [Tue Mar 19 14:53:44 2024] ---[ end trace ed0b2b3f135c637d ]---
>>>>
>>>> It again seems to have been triggered in nfsd_break_deleg_cb?
>>>>
>>>> I also had the following perf command running a tmux on the server:
>>>>
>>>> perf trace -e nfsd:nfsd_cb_recall_any
>>>>
>>>> This has spewed a lot of messages. I'm including a short list here:
>>>>
>>>> ...
>>>>
>>>> 33464866.721 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210688785, bmval0: 1, addr: 0x7f331bb116c8)
>>>> 33464866.724 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210688827, bmval0: 1, addr: 0x7f331bb11738)
>>>> 33464866.729 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210688767, bmval0: 1, addr: 0x7f331bb117a8)
>>>> 33464866.732 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210718132, bmval0: 1, addr: 0x7f331bb11818)
>>>> 33464866.737 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210688952, bmval0: 1, addr: 0x7f331bb11888)
>>>> 33464866.741 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210702355, bmval0: 1, addr: 0x7f331bb118f8)
>>>> 33868414.001 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210688751, bmval0: 1, addr: 0x7f331be68620)
>>>> 33868414.014 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210718536, bmval0: 1, addr: 0x7f331be68690)
>>>> 33868414.018 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210719074, bmval0: 1, addr: 0x7f331be68700)
>>>> 33868414.022 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210688916, bmval0: 1, addr: 0x7f331be68770)
>>>> 33868414.026 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210688941, bmval0: 1, addr: 0x7f331be687e0)
>>>> ...
>>>>
>>>> 33868414.924 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210688744, bmval0: 1, addr: 0x7f331be6d7f0)
>>>> 33868414.929 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210717223, bmval0: 1, addr: 0x7f331be6d860)
>>>> 33868414.934 kthreadd/1597068 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210716137, bmval0: 1, addr: 0x7f331be6d8d0)
>>>> 34021240.903 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210688941, bmval0: 1, addr: 0x7f331c207de8)
>>>> 34021240.917 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210718750, bmval0: 1, addr: 0x7f331c207e58)
>>>> 34021240.922 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210688955, bmval0: 1, addr: 0x7f331c207ec8)
>>>> 34021240.925 kworker/u98:5/1591466 nfsd:nfsd_cb_recall_any(cl_boot:
>>>> 1710533037, cl_id: 210688975, bmval0: 1, addr: 0x7f331c207f38)
>>>> ...
>>>>
>>>> I assume the cl_id is the client id? How can I map this to a client
>>>> from /proc/fs/nfsd/clients?
>>> The hex value of 'clientid' printed from /proc/fs/nfsd/clients/XX/info
>>> is a 64-bit value composed of:
>>>
>>> typedef struct {
>>>         u32             cl_boot;
>>>         u32             cl_id;
>>> } clientid_t
>>>
>>> For example:
>>>
>>> clientid: 0xc8edb7f65f4a9ad
>>>
>>> cl_boot:  65f4a9add (1710533037)
>>> cl_id:      c8edb7f (21068895)
>>>
>>> This should match a trace event with:
>>>
>>> nfsd:nfsd_cb_recall_any(cl_boot: 1710533037, cl_id: 21068895, bmval0:
>>> XX, addr: 0xYYYYY)
>>>
>>>> If I understand it correctly, the recall_any should be called when
>>>> either the system starts to experience memory pressure,
>>> yes.
>> It seems odd that the system gets in such a state that has such high
>> memory pressure. It doesn't run much else than NFS and Samba.
>>>> or it reaches the delegation limits?
>>> No, this feature was added to nfsd very recently. I don't think your
>>> kernel has it.
>>>
>>>> I doubt the system is actually running out of memory here as there
>>>> are no other indications.
>>>> Shouldn't I get those "page allocation failure" messages if it does?
>>>> How can I check the number of delegations/leases currently issued,
>>>> what the current maximum is and how to increase it?
>>> Max delegations is 4 per 1MB of available memory. There is no
>>> admin tool to adjust this value.
>> /proc/locks currently has about 130k DELEG lines, so that should be a
>> lot lower than the limit on a 192G ram server.
>>
>>
>>> I do not recommend running a production system with delegation
>>> disabled. But for this specific issue, it might help to temporarily
>>> disable delegation to isolate problem areas.
>>
>> I'm going to reboot the system with the 6.1.82 kernel (kernel-lt from
>> elrepo). Maybe it has less new modern developments that may have
>> introduced this.
>>
> If v6.1-ish kernel turns out to not help, then you may want to give a
> v6.7 or v6.8 kernel a try. It helps if we know whether this problem is
> reproducible on in more up to date kernels.

Unfortunately the 6.1.82 kernel resulted in an issue with krb nfs
clients so I had to reboot the system again (see my other mail on the
linux-nfs list). It's now running the latest CentOS Stream 9 kernel (430).

I don't know how up to date it is on NFS patches. You mentioned there
were still additional nfs fixes between the 427 merge request version
you provided earlier and this one, but I failed to find any in the
changelog (which unfortunately seems to be truncated now).

I'm aware that there's a potential data corruption bug in the 430 version?

>
>> I've been able to reproduce the situation on an additional client now
>> that the issue happens on the server:
>>
>> 1. Log in on a client and mount the NFS share.
>> 2. Open a file from the NFS share in vim so the client gets a read
>> delegation from the server
>> 3. Verify on the server in /proc/fs/nfsd/clients/*/states that the
>> client has a delegation for the file
>> 4. Forcefully reboot the client by running 'echo b > /proc/sysrq-trigger'
>> 5. Watch the /proc/fs/nfsd/clients/*/info file on the server.
>>
>> The "seconds from last renew" will go up and at some point the callback
>> state changes to "FAULT". Even when the lease delegation time (90s by
>> default?) is over, the
>>
>> seconds from last renew keeps increasing. At some point the callback
>> state changes to "DOWN". When the client is up again and remounts the
>> share, the mount hangs on the client
>>
>> and on the server I notice there's a second directory for this client in
>> the clients directory, even though the clientid is the same. The
>> callback state for this new client is "UNKNOWN" and the callback address
>> is "(einval)".
>>
>> This is on a client running Fedora 39 with the 6.7.9 kernel.
>>
> I'm a little unclear...do the above steps work correctly when the server
> isn't in this state? I assume the above steps are not sufficient to
> cause a problem when the server is behaving normally?
These steps indeed don't work then the server is behaving normally. I'm
trying to reproduce the issue on a test system, but I'm unable to
trigger it there so far.
>
>> I don't know yet if the same procedure can be used to trigger the
>> behavior after the server is rebooted. I'm going to try to reproduce
>> this on another system first.
>>
>> I would expect the delegations to expire automatically after 90s, but
>> they remain in the states file of the "DOWN" client.
>>
> That would have been true a year or so ago, but there were some recent
> changes to make the server more "courteous" toward clients that lose
> contact for a while. If there are no conflicting requests for the state
> they hold then the server will hold onto the lease (basically)
> indefinitely, until there is such a conflict.
>
> The client _should_ be able to log in and it cancel the old client
> record though. It sounds like that's not working properly for some
> reason and it's interfering with the ability to do a CREATE_SESSION.

What happens if the server can't reach the original client at that point?

I've also noticed that the callback information seems to show a port
number for the callback channel. If I'm not mistaken NFS 4.2 also does
this over the regular 2049 port now?

Regards,

Rik

>
>>> -Dai
>>>
>>>> Regarding the recall any call: from what I've read on kernelnewbies,
>>>> this feature was introduced in the 6.2 kernel? When I look at the
>>>> tree for 6.1.x, it was backported in 6.1.81? Is there a way to
>>>> disable this support somehow?
>>>>
>>>> Regards,
>>>>
>>>> Rik
>>>>
>>>>
>>>>> -Dai
>>>>>
>>>>>>>
>>>>>>>> The nfsdclnts command for this client shows the following
>>>>>>>> delegations:
>>>>>>>>
>>>>>>>> # nfsdclnts -f 155/states -t all
>>>>>>>> Inode number | Type   | Access | Deny | ip address | Filename
>>>>>>>> 169346743    | open   | r-     | --   | 10.87.31.152:819 |
>>>>>>>> disconnected dentry
>>>>>>>> 169346743    | deleg  | r      |      | 10.87.31.152:819 |
>>>>>>>> disconnected dentry
>>>>>>>> 169346746    | open   | r-     | --   | 10.87.31.152:819 |
>>>>>>>> disconnected dentry
>>>>>>>> 169346746    | deleg  | r      |      | 10.87.31.152:819 |
>>>>>>>> disconnected dentry
>>>>>>>>
>>>>>>>> I see a lot of recent patches regarding directory delegations. Could
>>>>>>>> this be related to this?
>>>>>>>>
>>>>>>>> Will a 5.14.0-362.18.1.el9_3.0.1 kernel try to use a directory
>>>>>>>> delegation?
>>>>>>>>
>>>>>>>>
>>>>>>> No. Directory delegations are a new feature that's still under
>>>>>>> development. They use some of the same machinery as file delegations,
>>>>>>> but they wouldn't be a factor here.
>>>>>>>
>>>>>>>>> The system seems to have identified that the client is no longer
>>>>>>>>> reachable, but the client entry does not go away. When a mount was
>>>>>>>>> hanging on the client, there would be two directories in clients
>>>>>>>>> for
>>>>>>>>> the same client. Killing the mount command clears up the second
>>>>>>>>> entry.
>>>>>>>>>
>>>>>>>>> Even after running conntrack -D on the server to remove the tcp
>>>>>>>>> connection from the conntrack table, the entry doesn't go away
>>>>>>>>> and the
>>>>>>>>> client still can not mount anything from the server.
>>>>>>>>>
>>>>>>>>> A tcpdump on the client while a mount was running logged the
>>>>>>>>> following
>>>>>>>>> messages over and over again:
>>>>>>>>>
>>>>>>>>> request:
>>>>>>>>>
>>>>>>>>> Frame 1: 378 bytes on wire (3024 bits), 378 bytes captured (3024
>>>>>>>>> bits)
>>>>>>>>> Ethernet II, Src: HP_19:7d:4b (e0:73:e7:19:7d:4b), Dst:
>>>>>>>>> ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00)
>>>>>>>>> Internet Protocol Version 4, Src: 10.87.31.152, Dst: 10.86.18.14
>>>>>>>>> Transmission Control Protocol, Src Port: 932, Dst Port: 2049,
>>>>>>>>> Seq: 1,
>>>>>>>>> Ack: 1, Len: 312
>>>>>>>>> Remote Procedure Call, Type:Call XID:0x1d3220c4
>>>>>>>>> Network File System
>>>>>>>>>      [Program Version: 4]
>>>>>>>>>      [V4 Procedure: COMPOUND (1)]
>>>>>>>>>      GSS Data, Ops(1): CREATE_SESSION
>>>>>>>>>          Length: 152
>>>>>>>>>          GSS Sequence Number: 76
>>>>>>>>>          Tag: <EMPTY>
>>>>>>>>>          minorversion: 2
>>>>>>>>>          Operations (count: 1): CREATE_SESSION
>>>>>>>>>          [Main Opcode: CREATE_SESSION (43)]
>>>>>>>>>      GSS Checksum:
>>>>>>>>> 00000028040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb2…
>>>>>>>>>
>>>>>>>>>          GSS Token Length: 40
>>>>>>>>>          GSS-API Generic Security Service Application Program
>>>>>>>>> Interface
>>>>>>>>>              krb5_blob:
>>>>>>>>> 040404ffffffffff000000002c19055f1f8d442d594c13849628affc2797cbb23fa080b0…
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> response:
>>>>>>>>>
>>>>>>>>> Frame 2: 206 bytes on wire (1648 bits), 206 bytes captured (1648
>>>>>>>>> bits)
>>>>>>>>> Ethernet II, Src: ArubaaHe_f9:8e:00 (88:3a:30:f9:8e:00), Dst:
>>>>>>>>> HP_19:7d:4b (e0:73:e7:19:7d:4b)
>>>>>>>>> Internet Protocol Version 4, Src: 10.86.18.14, Dst: 10.87.31.152
>>>>>>>>> Transmission Control Protocol, Src Port: 2049, Dst Port: 932,
>>>>>>>>> Seq: 1,
>>>>>>>>> Ack: 313, Len: 140
>>>>>>>>> Remote Procedure Call, Type:Reply XID:0x1d3220c4
>>>>>>>>> Network File System
>>>>>>>>>      [Program Version: 4]
>>>>>>>>>      [V4 Procedure: COMPOUND (1)]
>>>>>>>>>      GSS Data, Ops(1): CREATE_SESSION(NFS4ERR_DELAY)
>>>>>>>>>          Length: 24
>>>>>>>>>          GSS Sequence Number: 76
>>>>>>>>>          Status: NFS4ERR_DELAY (10008)
>>>>>>>>>          Tag: <EMPTY>
>>>>>>>>>          Operations (count: 1)
>>>>>>>>>          [Main Opcode: CREATE_SESSION (43)]
>>>>>>>>>      GSS Checksum:
>>>>>>>>> 00000028040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f6274222…
>>>>>>>>>
>>>>>>>>>          GSS Token Length: 40
>>>>>>>>>          GSS-API Generic Security Service Application Program
>>>>>>>>> Interface
>>>>>>>>>              krb5_blob:
>>>>>>>>> 040405ffffffffff000000000aa742d0798deaad1a8aa2d7c3a91bf4f627422226d74923…
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I was hoping that giving the client a different IP address would
>>>>>>>>> resolve the issue for this client, but it didn't. Even though the
>>>>>>>>> client had a new IP address (hostname was kept the same), it
>>>>>>>>> failed to
>>>>>>>>> mount anything from the server.
>>>>>>>>>
>>>>>>> Changing the IP address won't help. The client is probably using the
>>>>>>> same long-form client id as before, so the server still identifies
>>>>>>> the
>>>>>>> client even with the address change.
>>>>>> How is the client id determined? Will changing the hostname of the
>>>>>> client trigger a change of the client id?
>>>>>>> Unfortunately, the cause of an NFS4ERR_DELAY error is tough to guess.
>>>>>>> The client is expected to back off and retry, so if the server keeps
>>>>>>> returning that repeatedly, then a hung mount command is expected.
>>>>>>>
>>>>>>> The question is why the server would keep returning DELAY. A lot of
>>>>>>> different problems ranging from memory allocation issues to protocol
>>>>>>> problems can result in that error. You may want to check the NFS
>>>>>>> server
>>>>>>> and see if anything was logged there.
>>>>>> There are no messages in the system logs that indicate any sort of
>>>>>> memory issue. We also increased the min_kbytes_free sysctl to 2G on
>>>>>> the server before we restarted it with the newer kernel.
>>>>>>> This is on a CREATE_SESSION call, so I wonder if the record held
>>>>>>> by the
>>>>>>> (courteous) server is somehow blocking the attempt to reestablish the
>>>>>>> session?
>>>>>>>
>>>>>>> Do you have a way to reproduce this? Since this is a centos
>>>>>>> kernel, you
>>>>>>> could follow the page here to open a bug:
>>>>>> Unfortunately we haven't found a reliable way to reproduce it. But
>>>>>> we do seem to trigger it more and more lately.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Rik
>>>>>>
>>>>>>> https://urldefense.com/v3/__https://wiki.centos.org/ReportBugs.html__;!!ACWV5N9M2RV99hQ!LV3yWeoSOhNAkRHkxFCH2tlm0iNFVD78mxnSLyP6lrX7yBVeA2TOJ4nv6oZsqLwP4kW56CMpDWhkjjwSkWIqsboq$
>>>>>>>
>>>>>>>
>>>>>>>>> I created another dump of the workqueues and worker pools on the
>>>>>>>>> server:
>>>>>>>>>
>>>>>>>>> [Mon Mar 18 14:59:33 2024] Showing busy workqueues and worker
>>>>>>>>> pools:
>>>>>>>>> [Mon Mar 18 14:59:33 2024] workqueue events: flags=0x0
>>>>>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0
>>>>>>>>> nice=0
>>>>>>>>> active=1/256 refcnt=2
>>>>>>>>> [Mon Mar 18 14:59:33 2024]     pending: drm_fb_helper_damage_work
>>>>>>>>> [drm_kms_helper]
>>>>>>>>> [Mon Mar 18 14:59:33 2024] workqueue events_power_efficient:
>>>>>>>>> flags=0x80
>>>>>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0
>>>>>>>>> nice=0
>>>>>>>>> active=1/256 refcnt=2
>>>>>>>>> [Mon Mar 18 14:59:33 2024]     pending: fb_flashcursor
>>>>>>>>> [Mon Mar 18 14:59:33 2024] workqueue mm_percpu_wq: flags=0x8
>>>>>>>>> [Mon Mar 18 14:59:33 2024]   pwq 54: cpus=27 node=1 flags=0x0
>>>>>>>>> nice=0
>>>>>>>>> active=1/256 refcnt=3
>>>>>>>>> [Mon Mar 18 14:59:33 2024]     pending: lru_add_drain_per_cpu
>>>>>>>>> BAR(362)
>>>>>>>>> [Mon Mar 18 14:59:33 2024] workqueue kblockd: flags=0x18
>>>>>>>>> [Mon Mar 18 14:59:33 2024]   pwq 55: cpus=27 node=1 flags=0x0
>>>>>>>>> nice=-20
>>>>>>>>> active=1/256 refcnt=2
>>>>>>>>> [Mon Mar 18 14:59:33 2024]     pending: blk_mq_timeout_work
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In contrast to last time, it doesn't show anything regarding nfs
>>>>>>>>> this
>>>>>>>>> time.
>>>>>>>>>
>>>>>>>>> I also tried the suggestion from Dai Ngo (echo 3 >
>>>>>>>>> /proc/sys/vm/drop_caches), but that didn't seem to make any
>>>>>>>>> difference.
>>>>>>>>>
>>>>>>>>> We haven't restarted the server yet as it seems the impact seems to
>>>>>>>>> affect fewer clients that before. Is there anything we can run
>>>>>>>>> on the
>>>>>>>>> server to further debug this?
>>>>>>>>>
>>>>>>>>> In the past, the issue seemed to deteriorate rapidly and
>>>>>>>>> resulted in
>>>>>>>>> issues for almost all clients after about 20 minutes. This time the
>>>>>>>>> impact seems to be less, but it's not gone.
>>>>>>>>>
>>>>>>>>> How can we force the NFS server to forget about a specific
>>>>>>>>> client? I
>>>>>>>>> haven't tried to restart the nfs service yet as I'm afraid it will
>>>>>>>>> fail to stop as before.
>>>>>>>>>
>>>>>>> Not with that kernel. There are some new administrative interfaces
>>>>>>> that
>>>>>>> might allow that in the future, but they were just merged upstream
>>>>>>> and
>>>>>>> aren't in that kernel.
>>>>>>>
>>>>>>> --
>>>>>>> Jeff Layton <[email protected]>

--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>