2021-10-28 15:12:21

by Charles Hedrick

[permalink] [raw]
Subject: odd problem, with backtrace

I’m sending this just in case anyone finds it useful.

We had an episode where two systems hung. Processes on two different systems hung. One was in Chrome, trying to do an NFS write. The other I couldn’t tell.

We rebooted both machines, but neither could do NFS mount from the server involved. Other systems were using it just fine.

I did “systemctl restart nfs-server”, and got the following:

Oct 28 10:56:47 communis.lcsr.rutgers.edu systemd[1]: Stopped ZFS file system shares.
Oct 28 10:56:47 communis.lcsr.rutgers.edu systemd[1]: Stopping ZFS file system shares...
Oct 28 10:56:47 communis.lcsr.rutgers.edu systemd[1]: Stopping NFS server and services...
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050410] ------------[ cut here ]------------
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050437] WARNING: CPU: 55 PID: 1201295 at fs/nfsd/nfs4state.c:1966 free_client+0\
xd3/0xe0 [nfsd]
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050437] Modules linked in: nfsv3 nfsv4 nfs fscache cpuid binfmt_misc ufs qnx4 h\
fsplus hfs minix ntfs msdos jfs xfs algif_hash af_alg rpcsec_gss_krb5 nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua int\
el_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp zfs(PO) kvm_intel zunicode(PO) k\
vm zlua(PO) zavl(PO) icp(PO) rapl zcommon(PO) znvpair(PO) ipmi_ssif spl(O) intel_cstate mei_me joydev input_leds mei ioatdma ipmi_si ipmi_\
devintf ipmi_msghandler acpi_pad acpi_power_meter mac_hid sch_fq_codel nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables x_tables auto\
fs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath l\
inear nvme nvme_core i40e hid_generic usbhid hid raid1 ast drm_vram_helper i2c_algo_bit crct10dif_pclmul crc32_pclmul ghash_clmulni_intel \
ttm drm_kms_helper aesni_intel crypto_simd syscopyarea sysfillrect cryptd sysimgblt ixgbe
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050477] glue_helper fb_sys_fops xfrm_algo dca drm vmd ahci i2c_i801 mdio lpc_i\
ch libahci wmi
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050485] CPU: 55 PID: 1201295 Comm: nfsd Kdump: loaded Tainted: P O \
5.4.0-74-generic #83-Ubuntu
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050486] Hardware name: Supermicro SYS-2029U-TN24R4T/X11DPU, BIOS 3.3a 07/21/202\
0
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050497] RIP: 0010:free_client+0xd3/0xe0 [nfsd]
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050499] Code: c0 e8 21 6f ab d8 48 8d bb f8 03 00 00 f0 ff 8b f8 03 00 00 0f 88\
e5 73 01 00 75 05 e8 c6 f8 ff ff 5b 41 5c 41 5d 41 5e 5d c3 <0f> 0b eb 8a 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 f0
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050500] RSP: 0018:ffffaf920d2ebd60 EFLAGS: 00010202
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050502] RAX: 0000000000000001 RBX: ffff95f2dffe8510 RCX: ffff95f2dffe8878
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050502] RDX: ffff95f2dffe8878 RSI: 000000000000000d RDI: ffff95e531513400
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050503] RBP: ffffaf920d2ebd80 R08: 0000000000000000 R09: ffff95f43f9e9a80
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050504] R10: ffff95f43f9d7848 R11: 0000000000000000 R12: ffff95f2dffe8878
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050505] R13: dead000000000122 R14: dead000000000100 R15: ffff95f2dffe8510
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050506] FS: 0000000000000000(0000) GS:ffff95f43f9c0000(0000) knlGS:00000000000\
00000
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050507] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050508] CR2: 00007f7e0305a014 CR3: 000000036d410002 CR4: 00000000007606e0
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050509] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050509] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050510] PKRU: 55555554
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050510] Call Trace:
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050523] __destroy_client+0x1a6/0x1f0 [nfsd]
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050533] nfs4_state_shutdown_net+0x130/0x210 [nfsd]
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050540] nfsd_shutdown_net+0x2d/0x60 [nfsd]
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050546] nfsd_last_thread+0x106/0x120 [nfsd]
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050571] ? svc_close_net+0x50/0x160 [sunrpc]
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050584] svc_shutdown_net+0x33/0x40 [sunrpc]
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050590] nfsd_destroy+0x38/0x60 [nfsd]
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050597] nfsd+0x127/0x150 [nfsd]
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050603] kthread+0x104/0x140
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050609] ? nfsd_destroy+0x60/0x60 [nfsd]
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050611] ? kthread_park+0x90/0x90
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050614] ret_from_fork+0x1f/0x40
Oct 28 10:56:47 communis.lcsr.rutgers.edu kernel: [6955369.050616] ---[ end trace d13cddb172dc312c ]---
Oct 28 10:56:48 communis.lcsr.rutgers.edu kernel: [6955369.331866] nfsd: last server has exited, flushing export cache
Oct 28 10:56:48 communis.lcsr.rutgers.edu systemd[1]: nfs-server.service: Succeeded.

This is Ubuntu 20.04, 5.4.0-74-generic All mounts are NFS 4.2.

We’ve disabled Chrome for the moment pending investigation. I’m trying to avoid going back to NFS 4.0, but may eventually be forced to.