2020-10-11 18:00:58

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Kernel panic / list_add corruption when in nfsd4_run_cb_work

Hi

On a system running 4.19.146-1 in Debian buster an issue got hit,
while the server was under some slight load, but it does not seem
easily reproducible, so asking if some more information can be
provided to track/narrow this down. On the console the following was
caught:

[1002011.364398] list_add corruption. prev->next should be next (ffff9b9e875e6ea8), but was ffff9b9ec0a14140. (prev=ffff9b9ec0a14140).
[1002011.376317] ------------[ cut here ]------------
[1002011.381131] kernel BUG at lib/list_debug.c:28!
[1002011.385781] invalid opcode: 0000 [#1] SMP PTI
[1002011.390325] CPU: 13 PID: 25775 Comm: kworker/u40:0 Not tainted 4.19.0-11-amd64 #1 Debian 4.19.146-1
[1002011.399562] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[1002011.409427] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
[1002011.415539] RIP: 0010:__list_add_valid.cold.0+0x26/0x28
[1002011.420949] Code: 00 00 00 c3 48 89 d1 48 c7 c7 d0 a3 69 a3 48 89 c2 e8 50 6b d0 ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 28 a4 69 a3 e8 3c 6b d0 ff <0f> 0b 48 89 fe 48 89 c2 48 c7 c7 b8 a4 69 a3 e8
28 6b d0 ff 0f 0b
[1002011.439914] RSP: 0018:ffffbf14c6993e28 EFLAGS: 00010246
[1002011.445329] RAX: 0000000000000075 RBX: ffff9b9ec0a14140 RCX: 0000000000000000
[1002011.452656] RDX: 0000000000000000 RSI: ffff9baa8f7966b8 RDI: ffff9baa8f7966b8
[1002011.459979] RBP: ffffbf14c6993e70 R08: 0000000000000559 R09: 0000000000aaaaaa
[1002011.467301] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b9e875e6e00
[1002011.474627] R13: ffff9ba0341872a8 R14: ffff9ba034187278 R15: ffff9b9e875e6ea8
[1002011.481948] FS: 0000000000000000(0000) GS:ffff9baa8f780000(0000) knlGS:0000000000000000
[1002011.490228] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1002011.496160] CR2: 00007fda6b5330a0 CR3: 000000079d60a002 CR4: 00000000007606e0
[1002011.503487] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1002011.510806] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1002011.518132] PKRU: 55555554
[1002011.521022] Call Trace:
[1002011.523672] nfsd4_cb_recall_prepare+0x2b8/0x310 [nfsd]
[1002011.529089] nfsd4_run_cb_work+0x7d/0xf0 [nfsd]
[1002011.533818] process_one_work+0x1a7/0x3a0
[1002011.538015] worker_thread+0x30/0x390
[1002011.541861] ? create_worker+0x1a0/0x1a0
[1002011.545968] kthread+0x112/0x130
[1002011.549381] ? kthread_bind+0x30/0x30
[1002011.553232] ret_from_fork+0x35/0x40
[1002011.556992] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree bonding ipmi_ssif intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ast ttm ghash_clmulni_intel drm_kms_helper intel_cstate mei_me intel_uncore drm iTCO_wdt ioatdma intel_rapl_perf pcspkr evdev joydev pcc_cpufreq i2c_algo_bit sg iTCO_vendor_support mei ipmi_si dca ipmi_devintf wmi ipmi_msghandler acpi_pad acpi_power_meter button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1
[1002011.627690] raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel xhci_pci ahci xhci_hcd aesni_intel libahci aes_x86_64 crypto_simd libata arcmsr cryptd usbcore i40e scsi_mod glue_helper lpc_ich i2c_i801 mfd_core usb_common
[1002011.651077] ---[ end trace 465dc56412b98978 ]---
[1002011.765289] RIP: 0010:__list_add_valid.cold.0+0x26/0x28
[1002011.771502] Code: 00 00 00 c3 48 89 d1 48 c7 c7 d0 a3 69 a3 48 89 c2 e8 50 6b d0 ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 28 a4 69 a3 e8 3c 6b d0 ff <0f> 0b 48 89 fe 48 89 c2 48 c7 c7 b8 a4 69 a3 e8 28 6b d0 ff 0f 0b
[1002011.792067] RSP: 0018:ffffbf14c6993e28 EFLAGS: 00010246
[1002011.798311] RAX: 0000000000000075 RBX: ffff9b9ec0a14140 RCX: 0000000000000000
[1002011.806453] RDX: 0000000000000000 RSI: ffff9baa8f7966b8 RDI: ffff9baa8f7966b8
[1002011.814572] RBP: ffffbf14c6993e70 R08: 0000000000000559 R09: 0000000000aaaaaa
[1002011.822680] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b9e875e6e00
[1002011.830789] R13: ffff9ba0341872a8 R14: ffff9ba034187278 R15: ffff9b9e875e6ea8
[1002011.838895] FS: 0000000000000000(0000) GS:ffff9baa8f780000(0000) knlGS:0000000000000000
[1002011.847972] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1002011.854694] CR2: 00007fda6b5330a0 CR3: 000000079d60a002 CR4: 00000000007606e0
[1002011.862806] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1002011.870904] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1002011.878989] PKRU: 55555554
[1002011.882636] Kernel panic - not syncing: Fatal exception
[1002011.888846] Kernel Offset: 0x21800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Kernel version is 4.19.146-1 as in the current Debian buster.

The exports are as:

/srv/exports/data-01
node1(rw,wdelay,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/srv/exports/data-02
node1(rw,wdelay,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/srv/exports/data-01
@netgroup(rw,wdelay,root_squash,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)
/srv/exports/data-02
@netgroup(rw,wdelay,root_squash,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)

Clients mounting those are mixed systems running Debian buster on 4.19.146-1
and Debian stretch systems running 4.9.228-1.

Versions of listed packages (maybe relevant) on the server:

acl: 2.2.53-4
libgssapi-krb5-2: 1.17-3
libevent-2.1-6: 2.1.8-stable-4
nfs-utils: 1:1.3.4-2.5+deb10u1
util-linux: 2.33.1-0.1

(note those are all the respective versions in Debian buster).

Is there anything one can try to provivde to possibly track this down
or are we here simply out of luck?

Regards,
Salvatore


2020-10-12 14:27:05

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

On Sun, Oct 11, 2020 at 09:59:13AM +0200, Salvatore Bonaccorso wrote:
> Hi
>
> On a system running 4.19.146-1 in Debian buster an issue got hit,
> while the server was under some slight load, but it does not seem
> easily reproducible, so asking if some more information can be
> provided to track/narrow this down. On the console the following was
> caught:

Worth checking git logs of fs/nfsd/nfs4state.c and
fs/nfsd/nfs4callback.c. It might be
2bbfed98a4d82ac4e7abfcd4eba40bddfc670b1d "nfsd: Fix races between
nfsd4_cb_release() and nfsd4_shutdown_callback()" ?

--b.

>
> [1002011.364398] list_add corruption. prev->next should be next (ffff9b9e875e6ea8), but was ffff9b9ec0a14140. (prev=ffff9b9ec0a14140).
> [1002011.376317] ------------[ cut here ]------------
> [1002011.381131] kernel BUG at lib/list_debug.c:28!
> [1002011.385781] invalid opcode: 0000 [#1] SMP PTI
> [1002011.390325] CPU: 13 PID: 25775 Comm: kworker/u40:0 Not tainted 4.19.0-11-amd64 #1 Debian 4.19.146-1
> [1002011.399562] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
> [1002011.409427] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
> [1002011.415539] RIP: 0010:__list_add_valid.cold.0+0x26/0x28
> [1002011.420949] Code: 00 00 00 c3 48 89 d1 48 c7 c7 d0 a3 69 a3 48 89 c2 e8 50 6b d0 ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 28 a4 69 a3 e8 3c 6b d0 ff <0f> 0b 48 89 fe 48 89 c2 48 c7 c7 b8 a4 69 a3 e8
> 28 6b d0 ff 0f 0b
> [1002011.439914] RSP: 0018:ffffbf14c6993e28 EFLAGS: 00010246
> [1002011.445329] RAX: 0000000000000075 RBX: ffff9b9ec0a14140 RCX: 0000000000000000
> [1002011.452656] RDX: 0000000000000000 RSI: ffff9baa8f7966b8 RDI: ffff9baa8f7966b8
> [1002011.459979] RBP: ffffbf14c6993e70 R08: 0000000000000559 R09: 0000000000aaaaaa
> [1002011.467301] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b9e875e6e00
> [1002011.474627] R13: ffff9ba0341872a8 R14: ffff9ba034187278 R15: ffff9b9e875e6ea8
> [1002011.481948] FS: 0000000000000000(0000) GS:ffff9baa8f780000(0000) knlGS:0000000000000000
> [1002011.490228] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [1002011.496160] CR2: 00007fda6b5330a0 CR3: 000000079d60a002 CR4: 00000000007606e0
> [1002011.503487] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [1002011.510806] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [1002011.518132] PKRU: 55555554
> [1002011.521022] Call Trace:
> [1002011.523672] nfsd4_cb_recall_prepare+0x2b8/0x310 [nfsd]
> [1002011.529089] nfsd4_run_cb_work+0x7d/0xf0 [nfsd]
> [1002011.533818] process_one_work+0x1a7/0x3a0
> [1002011.538015] worker_thread+0x30/0x390
> [1002011.541861] ? create_worker+0x1a0/0x1a0
> [1002011.545968] kthread+0x112/0x130
> [1002011.549381] ? kthread_bind+0x30/0x30
> [1002011.553232] ret_from_fork+0x35/0x40
> [1002011.556992] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree bonding ipmi_ssif intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ast ttm ghash_clmulni_intel drm_kms_helper intel_cstate mei_me intel_uncore drm iTCO_wdt ioatdma intel_rapl_perf pcspkr evdev joydev pcc_cpufreq i2c_algo_bit sg iTCO_vendor_support mei ipmi_si dca ipmi_devintf wmi ipmi_msghandler acpi_pad acpi_power_meter button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1
> [1002011.627690] raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel xhci_pci ahci xhci_hcd aesni_intel libahci aes_x86_64 crypto_simd libata arcmsr cryptd usbcore i40e scsi_mod glue_helper lpc_ich i2c_i801 mfd_core usb_common
> [1002011.651077] ---[ end trace 465dc56412b98978 ]---
> [1002011.765289] RIP: 0010:__list_add_valid.cold.0+0x26/0x28
> [1002011.771502] Code: 00 00 00 c3 48 89 d1 48 c7 c7 d0 a3 69 a3 48 89 c2 e8 50 6b d0 ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 28 a4 69 a3 e8 3c 6b d0 ff <0f> 0b 48 89 fe 48 89 c2 48 c7 c7 b8 a4 69 a3 e8 28 6b d0 ff 0f 0b
> [1002011.792067] RSP: 0018:ffffbf14c6993e28 EFLAGS: 00010246
> [1002011.798311] RAX: 0000000000000075 RBX: ffff9b9ec0a14140 RCX: 0000000000000000
> [1002011.806453] RDX: 0000000000000000 RSI: ffff9baa8f7966b8 RDI: ffff9baa8f7966b8
> [1002011.814572] RBP: ffffbf14c6993e70 R08: 0000000000000559 R09: 0000000000aaaaaa
> [1002011.822680] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b9e875e6e00
> [1002011.830789] R13: ffff9ba0341872a8 R14: ffff9ba034187278 R15: ffff9b9e875e6ea8
> [1002011.838895] FS: 0000000000000000(0000) GS:ffff9baa8f780000(0000) knlGS:0000000000000000
> [1002011.847972] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [1002011.854694] CR2: 00007fda6b5330a0 CR3: 000000079d60a002 CR4: 00000000007606e0
> [1002011.862806] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [1002011.870904] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [1002011.878989] PKRU: 55555554
> [1002011.882636] Kernel panic - not syncing: Fatal exception
> [1002011.888846] Kernel Offset: 0x21800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>
> Kernel version is 4.19.146-1 as in the current Debian buster.
>
> The exports are as:
>
> /srv/exports/data-01
> node1(rw,wdelay,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
> /srv/exports/data-02
> node1(rw,wdelay,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
> /srv/exports/data-01
> @netgroup(rw,wdelay,root_squash,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)
> /srv/exports/data-02
> @netgroup(rw,wdelay,root_squash,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)
>
> Clients mounting those are mixed systems running Debian buster on 4.19.146-1
> and Debian stretch systems running 4.9.228-1.
>
> Versions of listed packages (maybe relevant) on the server:
>
> acl: 2.2.53-4
> libgssapi-krb5-2: 1.17-3
> libevent-2.1-6: 2.1.8-stable-4
> nfs-utils: 1:1.3.4-2.5+deb10u1
> util-linux: 2.33.1-0.1
>
> (note those are all the respective versions in Debian buster).
>
> Is there anything one can try to provivde to possibly track this down
> or are we here simply out of luck?
>
> Regards,
> Salvatore

2020-10-12 15:43:39

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

Hi Bruce,

Thanks a lot for your reply, much appreciated.

On Mon, Oct 12, 2020 at 10:26:02AM -0400, J. Bruce Fields wrote:
> On Sun, Oct 11, 2020 at 09:59:13AM +0200, Salvatore Bonaccorso wrote:
> > Hi
> >
> > On a system running 4.19.146-1 in Debian buster an issue got hit,
> > while the server was under some slight load, but it does not seem
> > easily reproducible, so asking if some more information can be
> > provided to track/narrow this down. On the console the following was
> > caught:
>
> Worth checking git logs of fs/nfsd/nfs4state.c and
> fs/nfsd/nfs4callback.c. It might be
> 2bbfed98a4d82ac4e7abfcd4eba40bddfc670b1d "nfsd: Fix races between
> nfsd4_cb_release() and nfsd4_shutdown_callback()" ?

That might be possible. As it was not possible to simply trigger the
issue, do you know if it is possible to simply reproduce the issue
fixed in the above?

2bbfed98a4d8 ("nfsd: Fix races between nfsd4_cb_release() and
nfsd4_shutdown_callback()") would be missing in the v4.19.y stable
series (as it was in 5.5-rc1 but not backported to other stable
versions).

Regards,
Salvatore

2020-10-12 16:34:51

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

On Mon, Oct 12, 2020 at 05:41:59PM +0200, Salvatore Bonaccorso wrote:
> Hi Bruce,
>
> Thanks a lot for your reply, much appreciated.
>
> On Mon, Oct 12, 2020 at 10:26:02AM -0400, J. Bruce Fields wrote:
> > On Sun, Oct 11, 2020 at 09:59:13AM +0200, Salvatore Bonaccorso wrote:
> > > Hi
> > >
> > > On a system running 4.19.146-1 in Debian buster an issue got hit,
> > > while the server was under some slight load, but it does not seem
> > > easily reproducible, so asking if some more information can be
> > > provided to track/narrow this down. On the console the following was
> > > caught:
> >
> > Worth checking git logs of fs/nfsd/nfs4state.c and
> > fs/nfsd/nfs4callback.c. It might be
> > 2bbfed98a4d82ac4e7abfcd4eba40bddfc670b1d "nfsd: Fix races between
> > nfsd4_cb_release() and nfsd4_shutdown_callback()" ?
>
> That might be possible. As it was not possible to simply trigger the
> issue, do you know if it is possible to simply reproduce the issue
> fixed in the above?

I don't have a reproducer.

--b.

>
> 2bbfed98a4d8 ("nfsd: Fix races between nfsd4_cb_release() and
> nfsd4_shutdown_callback()") would be missing in the v4.19.y stable
> series (as it was in 5.5-rc1 but not backported to other stable
> versions).
>
> Regards,
> Salvatore

2020-10-18 09:56:15

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

Hi,

On Mon, Oct 12, 2020 at 12:33:55PM -0400, J. Bruce Fields wrote:
> On Mon, Oct 12, 2020 at 05:41:59PM +0200, Salvatore Bonaccorso wrote:
> > Hi Bruce,
> >
> > Thanks a lot for your reply, much appreciated.
> >
> > On Mon, Oct 12, 2020 at 10:26:02AM -0400, J. Bruce Fields wrote:
> > > On Sun, Oct 11, 2020 at 09:59:13AM +0200, Salvatore Bonaccorso wrote:
> > > > Hi
> > > >
> > > > On a system running 4.19.146-1 in Debian buster an issue got hit,
> > > > while the server was under some slight load, but it does not seem
> > > > easily reproducible, so asking if some more information can be
> > > > provided to track/narrow this down. On the console the following was
> > > > caught:
> > >
> > > Worth checking git logs of fs/nfsd/nfs4state.c and
> > > fs/nfsd/nfs4callback.c. It might be
> > > 2bbfed98a4d82ac4e7abfcd4eba40bddfc670b1d "nfsd: Fix races between
> > > nfsd4_cb_release() and nfsd4_shutdown_callback()" ?
> >
> > That might be possible. As it was not possible to simply trigger the
> > issue, do you know if it is possible to simply reproduce the issue
> > fixed in the above?
>
> I don't have a reproducer.

I stil could not find a way to controlled trigger the issue, but on
the same server the following was caused as well, which might be
releated as well (altough different backtrace, but maybe gives
additional hints what can be looked for):

[ 4390.059004] ------------[ cut here ]------------
[ 4390.063780] WARNING: CPU: 14 PID: 24184 at fs/nfsd/nfs4state.c:4778 laundromat_main.cold.125+0x31/0x7a [nfsd]
[ 4390.073763] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree bonding intel_rapl skx_edac nfit libnv
dimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ast ttm drm_kms_helper intel_uncore mei_me drm jo
ydev iTCO_wdt evdev pcc_cpufreq pcspkr sg ioatdma intel_rapl_perf mei iTCO_vendor_support i2c_algo_bit dca ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button nfsd auth_rpcgss
nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc
32c_generic raid1
[ 4390.144313] raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel aesni_intel ahci xhci_pci libahci aes_x86_64 xhci_hcd arcmsr crypto_simd libata cryptd i40e usbcore scsi_mod
glue_helper lpc_ich i2c_i801 mfd_core usb_common
[ 4390.165906] CPU: 14 PID: 24184 Comm: kworker/u42:2 Not tainted 4.19.0-11-amd64 #1 Debian 4.19.146-1
[ 4390.174969] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 4390.184654] Workqueue: nfsd4 laundromat_main [nfsd]
[ 4390.189550] RIP: 0010:laundromat_main.cold.125+0x31/0x7a [nfsd]
[ 4390.195484] Code: f6 6b c0 e8 24 ae 62 d7 e9 6f 21 ff ff 48 c7 c7 38 f6 6b c0 e8 13 ae 62 d7 e9 1f 21 ff ff 48 c7 c7 40 f3 6b c0 e8 02 ae 62 d7 <0f> 0b e9 34 23 ff ff 48 c7 c7 40 f3 6b c0 e8 ef a
d 62 d7 0f 0b e9
[ 4390.214280] RSP: 0018:ffffb0568a6d7e20 EFLAGS: 00010246
[ 4390.219523] RAX: 0000000000000024 RBX: ffffb0568a6d7e50 RCX: 0000000000000000
[ 4390.226678] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI: ffff9def8f8166b8
[ 4390.233826] RBP: ffffb0568a6d7e50 R08: 0000000000000556 R09: 0000000000aaaaaa
[ 4390.240976] R10: 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732
[ 4390.248126] R13: ffff9de31d44c798 R14: 000000000000001c R15: ffffb0568a6d7e50
[ 4390.255277] FS: 0000000000000000(0000) GS:ffff9def8f800000(0000) knlGS:0000000000000000
[ 4390.263382] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4390.269147] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0
[ 4390.276302] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4390.283448] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4390.290598] PKRU: 55555554
[ 4390.293316] Call Trace:
[ 4390.295790] process_one_work+0x1a7/0x3a0
[ 4390.299815] worker_thread+0x30/0x390
[ 4390.303487] ? create_worker+0x1a0/0x1a0
[ 4390.307422] kthread+0x112/0x130
[ 4390.310661] ? kthread_bind+0x30/0x30
[ 4390.314340] ret_from_fork+0x35/0x40
[ 4390.317936] ---[ end trace d0209d068f8583fd ]---
[ 4390.322577] list_add double add: new=ffff9de31d44c798, prev=ffffb0568a6d7e50, next=ffff9de31d44c798.
[ 4390.331736] ------------[ cut here ]------------
[ 4390.336368] kernel BUG at lib/list_debug.c:31!
[ 4390.340829] invalid opcode: 0000 [#1] SMP PTI
[ 4390.345200] CPU: 14 PID: 24184 Comm: kworker/u42:2 Tainted: G W 4.19.0-11-amd64 #1 Debian 4.19.146-1
[ 4390.355648] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 4390.365327] Workqueue: nfsd4 laundromat_main [nfsd]
[ 4390.371091] RIP: 0010:__list_add_valid+0x41/0x50
[ 4390.376551] Code: 85 94 00 00 00 48 39 c7 74 0b 48 39 d7 74 06 b8 01 00 00 00 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 80 a4 a9 98 e8 bd 6b d0 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8b 07 48 8b 57 08
[ 4390.397015] RSP: 0018:ffffb0568a6d7e18 EFLAGS: 00010246
[ 4390.403077] RAX: 0000000000000058 RBX: ffff9de31d44c798 RCX: 0000000000000000
[ 4390.411045] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI: ffff9def8f8166b8
[ 4390.418996] RBP: ffffb0568a6d7e50 R08: 0000000000000573 R09: 0000000000aaaaaa
[ 4390.426937] R10: 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732
[ 4390.434859] R13: ffff9de31d44c798 R14: 000000000000001c R15: ffffb0568a6d7e50
[ 4390.442761] FS: 0000000000000000(0000) GS:ffff9def8f800000(0000) knlGS:0000000000000000
[ 4390.451610] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4390.458100] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0
[ 4390.465970] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4390.473820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4390.481650] PKRU: 55555554
[ 4390.485032] Call Trace:
[ 4390.488145] laundromat_main+0x27a/0x610 [nfsd]
[ 4390.493333] process_one_work+0x1a7/0x3a0
[ 4390.497987] worker_thread+0x30/0x390
[ 4390.502276] ? create_worker+0x1a0/0x1a0
[ 4390.506808] kthread+0x112/0x130
[ 4390.510636] ? kthread_bind+0x30/0x30
[ 4390.514892] ret_from_fork+0x35/0x40
[ 4390.519056] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree bonding intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ast ttm drm_kms_helper intel_uncore mei_me drm joydev iTCO_wdt evdev pcc_cpufreq pcspkr sg ioatdma intel_rapl_perf mei iTCO_vendor_support i2c_algo_bit dca ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1
[ 4390.593983] raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel aesni_intel ahci xhci_pci libahci aes_x86_64 xhci_hcd arcmsr crypto_simd libata cryptd i40e usbcore scsi_mod glue_helper lpc_ich i2c_i801 mfd_core usb_common
[ 4390.616982] ---[ end trace d0209d068f8583fe ]---
[ 4390.704972] RIP: 0010:__list_add_valid+0x41/0x50
[ 4390.710342] Code: 85 94 00 00 00 48 39 c7 74 0b 48 39 d7 74 06 b8 01 00 00 00 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 80 a4 a9 98 e8 bd 6b d0 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8b 07 48 8b 57 08
[ 4390.730656] RSP: 0018:ffffb0568a6d7e18 EFLAGS: 00010246
[ 4390.736676] RAX: 0000000000000058 RBX: ffff9de31d44c798 RCX: 0000000000000000
[ 4390.744609] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI: ffff9def8f8166b8
[ 4390.752555] RBP: ffffb0568a6d7e50 R08: 0000000000000573 R09: 0000000000aaaaaa
[ 4390.760497] R10: 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732
[ 4390.768433] R13: ffff9de31d44c798 R14: 000000000000001c R15: ffffb0568a6d7e50
[ 4390.776361] FS: 0000000000000000(0000) GS:ffff9def8f800000(0000) knlGS:0000000000000000
[ 4390.785244] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4390.791789] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0
[ 4390.799725] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4390.807642] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4390.815544] PKRU: 55555554
[ 4390.819023] Kernel panic - not syncing: Fatal exception
[ 4390.825022] Kernel Offset: 0x16c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Thanks for your time in any case!

Regards,
Salvatore

2021-10-06 18:47:06

by Salvatore Bonaccorso

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

Hi,

On Sun, Oct 18, 2020 at 11:39:03AM +0200, Salvatore Bonaccorso wrote:
> Hi,
>
> On Mon, Oct 12, 2020 at 12:33:55PM -0400, J. Bruce Fields wrote:
> > On Mon, Oct 12, 2020 at 05:41:59PM +0200, Salvatore Bonaccorso wrote:
> > > Hi Bruce,
> > >
> > > Thanks a lot for your reply, much appreciated.
> > >
> > > On Mon, Oct 12, 2020 at 10:26:02AM -0400, J. Bruce Fields wrote:
> > > > On Sun, Oct 11, 2020 at 09:59:13AM +0200, Salvatore Bonaccorso wrote:
> > > > > Hi
> > > > >
> > > > > On a system running 4.19.146-1 in Debian buster an issue got hit,
> > > > > while the server was under some slight load, but it does not seem
> > > > > easily reproducible, so asking if some more information can be
> > > > > provided to track/narrow this down. On the console the following was
> > > > > caught:
> > > >
> > > > Worth checking git logs of fs/nfsd/nfs4state.c and
> > > > fs/nfsd/nfs4callback.c. It might be
> > > > 2bbfed98a4d82ac4e7abfcd4eba40bddfc670b1d "nfsd: Fix races between
> > > > nfsd4_cb_release() and nfsd4_shutdown_callback()" ?
> > >
> > > That might be possible. As it was not possible to simply trigger the
> > > issue, do you know if it is possible to simply reproduce the issue
> > > fixed in the above?
> >
> > I don't have a reproducer.
>
> I stil could not find a way to controlled trigger the issue, but on
> the same server the following was caused as well, which might be
> releated as well (altough different backtrace, but maybe gives
> additional hints what can be looked for):
>
> [ 4390.059004] ------------[ cut here ]------------
> [ 4390.063780] WARNING: CPU: 14 PID: 24184 at fs/nfsd/nfs4state.c:4778 laundromat_main.cold.125+0x31/0x7a [nfsd]
> [ 4390.073763] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree bonding intel_rapl skx_edac nfit libnv
> dimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ast ttm drm_kms_helper intel_uncore mei_me drm jo
> ydev iTCO_wdt evdev pcc_cpufreq pcspkr sg ioatdma intel_rapl_perf mei iTCO_vendor_support i2c_algo_bit dca ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button nfsd auth_rpcgss
> nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc
> 32c_generic raid1
> [ 4390.144313] raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel aesni_intel ahci xhci_pci libahci aes_x86_64 xhci_hcd arcmsr crypto_simd libata cryptd i40e usbcore scsi_mod
> glue_helper lpc_ich i2c_i801 mfd_core usb_common
> [ 4390.165906] CPU: 14 PID: 24184 Comm: kworker/u42:2 Not tainted 4.19.0-11-amd64 #1 Debian 4.19.146-1
> [ 4390.174969] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
> [ 4390.184654] Workqueue: nfsd4 laundromat_main [nfsd]
> [ 4390.189550] RIP: 0010:laundromat_main.cold.125+0x31/0x7a [nfsd]
> [ 4390.195484] Code: f6 6b c0 e8 24 ae 62 d7 e9 6f 21 ff ff 48 c7 c7 38 f6 6b c0 e8 13 ae 62 d7 e9 1f 21 ff ff 48 c7 c7 40 f3 6b c0 e8 02 ae 62 d7 <0f> 0b e9 34 23 ff ff 48 c7 c7 40 f3 6b c0 e8 ef a
> d 62 d7 0f 0b e9
> [ 4390.214280] RSP: 0018:ffffb0568a6d7e20 EFLAGS: 00010246
> [ 4390.219523] RAX: 0000000000000024 RBX: ffffb0568a6d7e50 RCX: 0000000000000000
> [ 4390.226678] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI: ffff9def8f8166b8
> [ 4390.233826] RBP: ffffb0568a6d7e50 R08: 0000000000000556 R09: 0000000000aaaaaa
> [ 4390.240976] R10: 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732
> [ 4390.248126] R13: ffff9de31d44c798 R14: 000000000000001c R15: ffffb0568a6d7e50
> [ 4390.255277] FS: 0000000000000000(0000) GS:ffff9def8f800000(0000) knlGS:0000000000000000
> [ 4390.263382] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4390.269147] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0
> [ 4390.276302] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 4390.283448] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 4390.290598] PKRU: 55555554
> [ 4390.293316] Call Trace:
> [ 4390.295790] process_one_work+0x1a7/0x3a0
> [ 4390.299815] worker_thread+0x30/0x390
> [ 4390.303487] ? create_worker+0x1a0/0x1a0
> [ 4390.307422] kthread+0x112/0x130
> [ 4390.310661] ? kthread_bind+0x30/0x30
> [ 4390.314340] ret_from_fork+0x35/0x40
> [ 4390.317936] ---[ end trace d0209d068f8583fd ]---
> [ 4390.322577] list_add double add: new=ffff9de31d44c798, prev=ffffb0568a6d7e50, next=ffff9de31d44c798.
> [ 4390.331736] ------------[ cut here ]------------
> [ 4390.336368] kernel BUG at lib/list_debug.c:31!
> [ 4390.340829] invalid opcode: 0000 [#1] SMP PTI
> [ 4390.345200] CPU: 14 PID: 24184 Comm: kworker/u42:2 Tainted: G W 4.19.0-11-amd64 #1 Debian 4.19.146-1
> [ 4390.355648] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
> [ 4390.365327] Workqueue: nfsd4 laundromat_main [nfsd]
> [ 4390.371091] RIP: 0010:__list_add_valid+0x41/0x50
> [ 4390.376551] Code: 85 94 00 00 00 48 39 c7 74 0b 48 39 d7 74 06 b8 01 00 00 00 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 80 a4 a9 98 e8 bd 6b d0 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8b 07 48 8b 57 08
> [ 4390.397015] RSP: 0018:ffffb0568a6d7e18 EFLAGS: 00010246
> [ 4390.403077] RAX: 0000000000000058 RBX: ffff9de31d44c798 RCX: 0000000000000000
> [ 4390.411045] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI: ffff9def8f8166b8
> [ 4390.418996] RBP: ffffb0568a6d7e50 R08: 0000000000000573 R09: 0000000000aaaaaa
> [ 4390.426937] R10: 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732
> [ 4390.434859] R13: ffff9de31d44c798 R14: 000000000000001c R15: ffffb0568a6d7e50
> [ 4390.442761] FS: 0000000000000000(0000) GS:ffff9def8f800000(0000) knlGS:0000000000000000
> [ 4390.451610] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4390.458100] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0
> [ 4390.465970] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 4390.473820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 4390.481650] PKRU: 55555554
> [ 4390.485032] Call Trace:
> [ 4390.488145] laundromat_main+0x27a/0x610 [nfsd]
> [ 4390.493333] process_one_work+0x1a7/0x3a0
> [ 4390.497987] worker_thread+0x30/0x390
> [ 4390.502276] ? create_worker+0x1a0/0x1a0
> [ 4390.506808] kthread+0x112/0x130
> [ 4390.510636] ? kthread_bind+0x30/0x30
> [ 4390.514892] ret_from_fork+0x35/0x40
> [ 4390.519056] Modules linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree bonding intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ast ttm drm_kms_helper intel_uncore mei_me drm joydev iTCO_wdt evdev pcc_cpufreq pcspkr sg ioatdma intel_rapl_perf mei iTCO_vendor_support i2c_algo_bit dca ipmi_si wmi ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1
> [ 4390.593983] raid0 multipath linear md_mod hid_generic usbhid hid sd_mod crc32c_intel aesni_intel ahci xhci_pci libahci aes_x86_64 xhci_hcd arcmsr crypto_simd libata cryptd i40e usbcore scsi_mod glue_helper lpc_ich i2c_i801 mfd_core usb_common
> [ 4390.616982] ---[ end trace d0209d068f8583fe ]---
> [ 4390.704972] RIP: 0010:__list_add_valid+0x41/0x50
> [ 4390.710342] Code: 85 94 00 00 00 48 39 c7 74 0b 48 39 d7 74 06 b8 01 00 00 00 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 80 a4 a9 98 e8 bd 6b d0 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8b 07 48 8b 57 08
> [ 4390.730656] RSP: 0018:ffffb0568a6d7e18 EFLAGS: 00010246
> [ 4390.736676] RAX: 0000000000000058 RBX: ffff9de31d44c798 RCX: 0000000000000000
> [ 4390.744609] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI: ffff9def8f8166b8
> [ 4390.752555] RBP: ffffb0568a6d7e50 R08: 0000000000000573 R09: 0000000000aaaaaa
> [ 4390.760497] R10: 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732
> [ 4390.768433] R13: ffff9de31d44c798 R14: 000000000000001c R15: ffffb0568a6d7e50
> [ 4390.776361] FS: 0000000000000000(0000) GS:ffff9def8f800000(0000) knlGS:0000000000000000
> [ 4390.785244] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4390.791789] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0
> [ 4390.799725] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 4390.807642] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 4390.815544] PKRU: 55555554
> [ 4390.819023] Kernel panic - not syncing: Fatal exception
> [ 4390.825022] Kernel Offset: 0x16c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>
> Thanks for your time in any case!

Waving back after almost one year. We started to see again those
crashes under load after during the year situation improved (when load
went down). I'm still quite lost in trying to either reproduce the
issue or pinpoint a possible cause.

Any hints from you upstream, or what you want me to provide? The host
is currently running 4.19.194 based kernel in Debian buster, but can
soonish update it to 4.19.208 as well to match almost the current
version in the 4.19.y series.

Regards,
Salvatore

2021-11-22 09:25:07

by Olivier Monaco

[permalink] [raw]
Subject: RE: Kernel panic / list_add corruption when in nfsd4_run_cb_work

Hi,

I think my problem is related to this thread.

We are running a VMware vCenter platform running 9 groups of virtual machines. Each group includes a VM with NFS for file sharing and 3 VMs with NFS clients, so we are running 9 independent file servers.

Starting from April 21, we had sometimes a kernel bug with message like :
2021-05-04T02:28:21.101162+02:00 storage-t20 kernel: [1736623.971161] list_add corruption. prev->next should be next (ffff9c2d0875ecb8), but was ffff9c2c913a0fe8. (prev=ffff9c2c913a0fe8).
2021-05-04T02:28:21.101176+02:00 storage-t20 kernel: [1736623.971315] ------------[ cut here ]------------
2021-05-04T02:28:21.101177+02:00 storage-t20 kernel: [1736623.971317] kernel BUG at lib/list_debug.c:28!
2021-05-04T02:28:21.101178+02:00 storage-t20 kernel: [1736623.971362] invalid opcode: 0000 [#1] SMP NOPTI
2021-05-04T02:28:21.101178+02:00 storage-t20 kernel: [1736623.971402] CPU: 1 PID: 2435711 Comm: kworker/u256:5 Tainted: G W 5.10.0-0.bpo.4-amd64 #1 Debian 5.10.19-1~bpo10+1
2021-05-04T02:28:21.101179+02:00 storage-t20 kernel: [1736623.971456] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
2021-05-04T02:28:21.101180+02:00 storage-t20 kernel: [1736623.971499] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
2021-05-04T02:28:21.101180+02:00 storage-t20 kernel: [1736623.971515] RIP: 0010:__list_add_valid.cold.0+0x26/0x28
2021-05-04T02:28:21.101181+02:00 storage-t20 kernel: [1736623.971527] Code: 7b 1c bf ff 48 89 d1 48 c7 c7 18 71 92 86 48 89 c2 e8 02 2a ff ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 70 71 92 86 e8 ee 29 ff ff <0f> 0b 48 89 fe 48 89 c2 48 c7 c7 00 72 92 86 e8 da 29 ff ff 0f 0b
2021-05-04T02:28:21.101181+02:00 storage-t20 kernel: [1736623.971564] RSP: 0018:ffffb93f4075fe48 EFLAGS: 00010246
2021-05-04T02:28:21.101182+02:00 storage-t20 kernel: [1736623.971579] RAX: 0000000000000075 RBX: ffff9c2c913a0fe8 RCX: 0000000000000000
2021-05-04T02:28:21.101182+02:00 storage-t20 kernel: [1736623.971594] RDX: 0000000000000000 RSI: ffff9c2d39e58a00 RDI: ffff9c2d39e58a00
2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971608] RBP: ffff9c2c913a1018 R08: 0000000000000000 R09: c0000000ffff7fff
2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971623] R10: 0000000000000001 R11: ffffb93f4075fc58 R12: ffff9c2d0875ec00
2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971637] R13: ffff9c2c913a0fe8 R14: ffff9c2d0875ecb8 R15: ffff9c2c913a1050
2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971653] FS: 0000000000000000(0000) GS:ffff9c2d39e40000(0000) knlGS:0000000000000000
2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971684] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971735] CR2: 00007f8d98002698 CR3: 0000000103904005 CR4: 00000000007706e0
2021-05-04T02:28:21.101185+02:00 storage-t20 kernel: [1736623.971774] PKRU: 55555554
2021-05-04T02:28:21.101185+02:00 storage-t20 kernel: [1736623.971781] Call Trace:
2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971805] nfsd4_cb_recall_prepare+0x2aa/0x2f0 [nfsd]
2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971829] nfsd4_run_cb_work+0xe9/0x150 [nfsd]
2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971843] process_one_work+0x1aa/0x340
2021-05-04T02:28:21.101187+02:00 storage-t20 kernel: [1736623.971855] ? create_worker+0x1a0/0x1a0
2021-05-04T02:28:21.101187+02:00 storage-t20 kernel: [1736623.971865] worker_thread+0x30/0x390
2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.971875] ? create_worker+0x1a0/0x1a0
2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.972279] kthread+0x116/0x130
2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.972663] ? kthread_park+0x80/0x80
2021-05-04T02:28:21.101189+02:00 storage-t20 kernel: [1736623.973043] ret_from_fork+0x1f/0x30
2021-05-04T02:28:21.101189+02:00 storage-t20 kernel: [1736623.973411] Modules linked in: binfmt_misc vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common nfit libnvdimm crc32_pclmul ghash_clmulni_intel aesni_intel libaes crypto_simd cryptd glue_helper rapl vmw_balloon vmwgfx joydev evdev serio_raw pcspkr ttm sg drm_kms_helper vmw_vmci cec ac button nfsd auth_rpcgss nfs_acl lockd grace drm sunrpc fuse configfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic dm_mod sd_mod t10_pi crc_t10dif crct10dif_generic ata_generic crct10dif_pclmul crct10dif_common crc32c_intel psmouse vmxnet3 ata_piix libata vmw_pvscsi scsi_mod i2c_piix4
2021-05-04T02:28:21.101190+02:00 storage-t20 kernel: [1736623.976175] ---[ end trace f6e153631af275dd ]---

This problem occurs when server are under load but not during a load pick. There's no way to detect incoming failure. We tried to reproduce this issue without success.

Before the end of October, we were running Debian Buster with 5.10 kernel. This issue occurred at least 4 times before summer, none during the summer and started again at mid-October (3 times) when load on our infrastructure was increasing again.

We updated all our NFS servers at end of October to Debian Bullseye with kernel 5.14. This issues occurred for the first time then today on one of our servers:
2021-11-22T09:12:16.957103+01:00 storage-s10 kernel: [1930092.493512] list_add corruption. prev->next should be next (ffff9244023034b0), but was ffff92439f684c08. (prev=ffff92439f684c08).
2021-11-22T09:12:16.957118+01:00 storage-s10 kernel: [1930092.493693] ------------[ cut here ]------------
2021-11-22T09:12:16.957119+01:00 storage-s10 kernel: [1930092.493694] kernel BUG at lib/list_debug.c:26!
2021-11-22T09:12:16.957119+01:00 storage-s10 kernel: [1930092.493764] invalid opcode: 0000 [#1] SMP NOPTI
2021-11-22T09:12:16.957120+01:00 storage-s10 kernel: [1930092.493812] CPU: 0 PID: 2506486 Comm: kworker/u256:4 Tainted: G W 5.14.0-0.bpo.2-amd64 #1 Debian 5.14.9-2~bpo11+1
2021-11-22T09:12:16.957121+01:00 storage-s10 kernel: [1930092.493861] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
2021-11-22T09:12:16.957122+01:00 storage-s10 kernel: [1930092.493982] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
2021-11-22T09:12:16.957126+01:00 storage-s10 kernel: [1930092.494135] RIP: 0010:__list_add_valid.cold+0x3d/0x3f
2021-11-22T09:12:16.957127+01:00 storage-s10 kernel: [1930092.494203] Code: f2 4c 89 c1 48 89 fe 48 c7 c7 08 53 15 a8 e8 f8 fe fe ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 b0 52 15 a8 e8 e1 fe fe ff <0f> 0b 48 89 fe 48 c7 c7 40 53 15 a8 e8 d0 fe fe ff 0f 0b 48 c7 c7
2021-11-22T09:12:16.957128+01:00 storage-s10 kernel: [1930092.494288] RSP: 0018:ffffbd09c29b7e38 EFLAGS: 00010246
2021-11-22T09:12:16.957128+01:00 storage-s10 kernel: [1930092.494323] RAX: 0000000000000075 RBX: ffff92439f684c08 RCX: 0000000000000000
2021-11-22T09:12:16.957135+01:00 storage-s10 kernel: [1930092.494350] RDX: 0000000000000000 RSI: ffff924439e18880 RDI: ffff924439e18880
2021-11-22T09:12:16.957136+01:00 storage-s10 kernel: [1930092.494371] RBP: ffff9243087a2950 R08: 0000000000000000 R09: ffffbd09c29b7c68
2021-11-22T09:12:16.957136+01:00 storage-s10 kernel: [1930092.494391] R10: ffffbd09c29b7c60 R11: ffff92443fec48a8 R12: ffff924402303400
2021-11-22T09:12:16.957137+01:00 storage-s10 kernel: [1930092.494426] R13: ffff9243087a2920 R14: ffff9244023034b0 R15: 0000000000000006
2021-11-22T09:12:16.957137+01:00 storage-s10 kernel: [1930092.494447] FS: 0000000000000000(0000) GS:ffff924439e00000(0000) knlGS:0000000000000000
2021-11-22T09:12:16.957137+01:00 storage-s10 kernel: [1930092.494470] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2021-11-22T09:12:16.957139+01:00 storage-s10 kernel: [1930092.494488] CR2: 00007fb154911328 CR3: 000000010aabe005 CR4: 00000000007706f0
2021-11-22T09:12:16.957139+01:00 storage-s10 kernel: [1930092.494536] PKRU: 55555554
2021-11-22T09:12:16.957140+01:00 storage-s10 kernel: [1930092.494546] Call Trace:
2021-11-22T09:12:16.957140+01:00 storage-s10 kernel: [1930092.494578] nfsd4_cb_recall_prepare+0x144/0x160 [nfsd]
2021-11-22T09:12:16.957140+01:00 storage-s10 kernel: [1930092.494626] nfsd4_run_cb_work+0x8b/0x140 [nfsd]
2021-11-22T09:12:16.957141+01:00 storage-s10 kernel: [1930092.494659] process_one_work+0x1e9/0x390
2021-11-22T09:12:16.957141+01:00 storage-s10 kernel: [1930092.494675] worker_thread+0x53/0x3e0
2021-11-22T09:12:16.957143+01:00 storage-s10 kernel: [1930092.495301] ? process_one_work+0x390/0x390
2021-11-22T09:12:16.957143+01:00 storage-s10 kernel: [1930092.495851] kthread+0x124/0x150
2021-11-22T09:12:16.957143+01:00 storage-s10 kernel: [1930092.496396] ? set_kthread_struct+0x40/0x40
2021-11-22T09:12:16.957144+01:00 storage-s10 kernel: [1930092.496926] ret_from_fork+0x1f/0x30
2021-11-22T09:12:16.957144+01:00 storage-s10 kernel: [1930092.497450] Modules linked in: binfmt_misc vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common nfit libnvdimm ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl vmwgfx vmw_balloon ttm evdev joydev serio_raw pcspkr drm_kms_helper vmw_vmci cec sg rc_core ac button nfsd auth_rpcgss nfs_acl lockd grace drm fuse sunrpc configfs ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c crc32c_generic dm_mod sd_mod t10_pi crc_t10dif crct10dif_generic ata_generic crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ata_piix vmw_pvscsi libata psmouse scsi_mod vmxnet3 i2c_piix4
2021-11-22T09:12:16.957146+01:00 storage-s10 kernel: [1930092.501328] ---[ end trace 1f8f8a420695731b ]---

This trace is followed by:
2021-11-22T09:12:16.957146+01:00 storage-s10 kernel: [1930092.501874] RIP: 0010:__list_add_valid.cold+0x3d/0x3f
2021-11-22T09:12:16.958925+01:00 storage-s10 kernel: [1930092.502423] Code: f2 4c 89 c1 48 89 fe 48 c7 c7 08 53 15 a8 e8 f8 fe fe ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 b0 52 15 a8 e8 e1 fe fe ff <0f> 0b 48 89 fe 48 c7 c7 40 53 15 a8 e8 d0 fe fe ff 0f 0b 48 c7 c7
2021-11-22T09:12:16.958933+01:00 storage-s10 kernel: [1930092.504070] RSP: 0018:ffffbd09c29b7e38 EFLAGS: 00010246
2021-11-22T09:12:16.962117+01:00 storage-s10 kernel: [1930092.504608] RAX: 0000000000000075 RBX: ffff92439f684c08 RCX: 0000000000000000
2021-11-22T09:12:16.962124+01:00 storage-s10 kernel: [1930092.505150] RDX: 0000000000000000 RSI: ffff924439e18880 RDI: ffff924439e18880
2021-11-22T09:12:16.962124+01:00 storage-s10 kernel: [1930092.505685] RBP: ffff9243087a2950 R08: 0000000000000000 R09: ffffbd09c29b7c68
2021-11-22T09:12:16.962125+01:00 storage-s10 kernel: [1930092.506213] R10: ffffbd09c29b7c60 R11: ffff92443fec48a8 R12: ffff924402303400
2021-11-22T09:12:16.962125+01:00 storage-s10 kernel: [1930092.506729] R13: ffff9243087a2920 R14: ffff9244023034b0 R15: 0000000000000006
2021-11-22T09:12:16.962126+01:00 storage-s10 kernel: [1930092.507260] FS: 0000000000000000(0000) GS:ffff924439e00000(0000) knlGS:0000000000000000
2021-11-22T09:12:16.965880+01:00 storage-s10 kernel: [1930092.507783] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2021-11-22T09:12:16.965886+01:00 storage-s10 kernel: [1930092.508302] CR2: 00007fb154911328 CR3: 000000010aabe005 CR4: 00000000007706f0
2021-11-22T09:12:16.965887+01:00 storage-s10 kernel: [1930092.508854] PKRU: 55555554
2021-11-22T09:12:38.013111+01:00 storage-s10 kernel: [1930113.531463] rcu: INFO: rcu_sched self-detected stall on CPU
2021-11-22T09:12:38.013126+01:00 storage-s10 kernel: [1930113.532037] rcu: #0111-....: (1 GPs behind) idle=726/1/0x4000000000000000 softirq=433107463/433107464 fqs=2625
2021-11-22T09:12:38.013128+01:00 storage-s10 kernel: [1930113.533150] #011(t=5251 jiffies g=411028473 q=326)
2021-11-22T09:12:38.013129+01:00 storage-s10 kernel: [1930113.533689] NMI backtrace for cpu 1
2021-11-22T09:12:38.013130+01:00 storage-s10 kernel: [1930113.534251] CPU: 1 PID: 702 Comm: nfsd Tainted: G D W 5.14.0-0.bpo.2-amd64 #1 Debian 5.14.9-2~bpo11+1
2021-11-22T09:12:38.013130+01:00 storage-s10 kernel: [1930113.535361] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
2021-11-22T09:12:38.013131+01:00 storage-s10 kernel: [1930113.536527] Call Trace:
2021-11-22T09:12:38.013131+01:00 storage-s10 kernel: [1930113.537119] <IRQ>
2021-11-22T09:12:38.013132+01:00 storage-s10 kernel: [1930113.537790] dump_stack_lvl+0x46/0x5a
2021-11-22T09:12:38.013132+01:00 storage-s10 kernel: [1930113.538470] nmi_cpu_backtrace.cold+0x32/0x69
2021-11-22T09:12:38.013132+01:00 storage-s10 kernel: [1930113.539058] ? lapic_can_unplug_cpu+0x80/0x80
2021-11-22T09:12:38.013133+01:00 storage-s10 kernel: [1930113.539660] nmi_trigger_cpumask_backtrace+0xd7/0xe0
2021-11-22T09:12:38.013133+01:00 storage-s10 kernel: [1930113.540243] rcu_dump_cpu_stacks+0xc1/0xef
2021-11-22T09:12:38.013134+01:00 storage-s10 kernel: [1930113.540810] rcu_sched_clock_irq.cold+0xc7/0x1e9
2021-11-22T09:12:38.013134+01:00 storage-s10 kernel: [1930113.541342] ? trigger_load_balance+0x60/0x2e0
2021-11-22T09:12:38.013134+01:00 storage-s10 kernel: [1930113.541886] ? scheduler_tick+0xb8/0x220
2021-11-22T09:12:38.013135+01:00 storage-s10 kernel: [1930113.542398] update_process_times+0x8c/0xc0
2021-11-22T09:12:38.013135+01:00 storage-s10 kernel: [1930113.542909] tick_sched_handle+0x22/0x60
2021-11-22T09:12:38.013136+01:00 storage-s10 kernel: [1930113.543398] tick_sched_timer+0x7a/0xd0
2021-11-22T09:12:38.013137+01:00 storage-s10 kernel: [1930113.543882] ? tick_do_update_jiffies64.part.0+0xa0/0xa0
2021-11-22T09:12:38.013137+01:00 storage-s10 kernel: [1930113.544355] __hrtimer_run_queues+0x127/0x270
2021-11-22T09:12:38.013138+01:00 storage-s10 kernel: [1930113.544817] hrtimer_interrupt+0x110/0x2c0
2021-11-22T09:12:38.013138+01:00 storage-s10 kernel: [1930113.545288] __sysvec_apic_timer_interrupt+0x59/0xd0
2021-11-22T09:12:38.013139+01:00 storage-s10 kernel: [1930113.545724] sysvec_apic_timer_interrupt+0x6d/0x90
2021-11-22T09:12:38.013140+01:00 storage-s10 kernel: [1930113.546187] </IRQ>
2021-11-22T09:12:38.013140+01:00 storage-s10 kernel: [1930113.546611] asm_sysvec_apic_timer_interrupt+0x12/0x20
2021-11-22T09:12:38.013141+01:00 storage-s10 kernel: [1930113.547042] RIP: 0010:native_queued_spin_lock_slowpath+0xe/0x1d0
2021-11-22T09:12:38.013141+01:00 storage-s10 kernel: [1930113.547499] Code: 0f b1 38 48 85 c0 75 c5 4d 89 50 08 4c 89 c0 c3 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 66 90 ba 01 00 00 00 8b 07 <85> c0 75 09 f0 0f b1 17 85 c0 75 f2 c3 f3 90 eb ed 81 fe 00 01 00
2021-11-22T09:12:38.013143+01:00 storage-s10 kernel: [1930113.548864] RSP: 0018:ffffbd09c2f2bce0 EFLAGS: 00000202
2021-11-22T09:12:38.013143+01:00 storage-s10 kernel: [1930113.549320] RAX: 0000000000000001 RBX: ffff924405db8900 RCX: 0000000000000038
2021-11-22T09:12:38.013144+01:00 storage-s10 kernel: [1930113.549791] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffc076fd00
2021-11-22T09:12:38.013144+01:00 storage-s10 kernel: [1930113.550268] RBP: ffffbd09c2f2bdc8 R08: ffff92437f8b9604 R09: ffffffffa8653b40
2021-11-22T09:12:38.013144+01:00 storage-s10 kernel: [1930113.550714] R10: ffff92430d001d08 R11: 000945fa00000000 R12: 0000000000000000
2021-11-22T09:12:38.013145+01:00 storage-s10 kernel: [1930113.551187] R13: 00000000000000e0 R14: ffff924405610000 R15: ffff9243ba930980
2021-11-22T09:12:38.013155+01:00 storage-s10 kernel: [1930113.551636] _raw_spin_lock+0x1a/0x20
2021-11-22T09:12:38.013155+01:00 storage-s10 kernel: [1930113.552087] nfsd4_process_open2+0x6d8/0x1690 [nfsd]
2021-11-22T09:12:38.013155+01:00 storage-s10 kernel: [1930113.552568] ? nfsd_permission+0x6a/0x100 [nfsd]
2021-11-22T09:12:38.013156+01:00 storage-s10 kernel: [1930113.553012] ? fh_verify+0x1b1/0x6c0 [nfsd]
2021-11-22T09:12:38.013156+01:00 storage-s10 kernel: [1930113.553473] ? nfsd_lookup+0xab/0x150 [nfsd]
2021-11-22T09:12:38.013156+01:00 storage-s10 kernel: [1930113.553893] nfsd4_open+0x614/0x6f0 [nfsd]
2021-11-22T09:12:38.013157+01:00 storage-s10 kernel: [1930113.554305] nfsd4_proc_compound+0x393/0x630 [nfsd]
2021-11-22T09:12:38.013158+01:00 storage-s10 kernel: [1930113.554737] nfsd_dispatch+0x147/0x230 [nfsd]
2021-11-22T09:12:38.013159+01:00 storage-s10 kernel: [1930113.555143] svc_process_common+0x3d1/0x6d0 [sunrpc]
2021-11-22T09:12:38.013159+01:00 storage-s10 kernel: [1930113.555593] ? nfsd_svc+0x350/0x350 [nfsd]
2021-11-22T09:12:38.013159+01:00 storage-s10 kernel: [1930113.556004] svc_process+0xb7/0xf0 [sunrpc]
2021-11-22T09:12:38.013160+01:00 storage-s10 kernel: [1930113.556440] nfsd+0xe8/0x140 [nfsd]
2021-11-22T09:12:38.013160+01:00 storage-s10 kernel: [1930113.556866] ? nfsd_shutdown_threads+0x80/0x80 [nfsd]
2021-11-22T09:12:38.013162+01:00 storage-s10 kernel: [1930113.557282] kthread+0x124/0x150
2021-11-22T09:12:38.013162+01:00 storage-s10 kernel: [1930113.557689] ? set_kthread_struct+0x40/0x40
2021-11-22T09:12:38.013162+01:00 storage-s10 kernel: [1930113.558105] ret_from_fork+0x1f/0x30
...
2021-11-22T09:13:05.616901+01:00 storage-s10 kernel: [1930141.140062] watchdog: BUG: soft lockup - CPU#1 stuck for 49s! [nfsd:702]
2021-11-22T09:13:05.616916+01:00 storage-s10 kernel: [1930141.140654] Modules linked in: binfmt_misc vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common nfit libnvdimm ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl vmwgfx vmw_balloon ttm evdev joydev serio_raw pcspkr drm_kms_helper vmw_vmci cec sg rc_core ac button nfsd auth_rpcgss nfs_acl lockd grace drm fuse sunrpc configfs ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c crc32c_generic dm_mod sd_mod t10_pi crc_t10dif crct10dif_generic ata_generic crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ata_piix vmw_pvscsi libata psmouse scsi_mod vmxnet3 i2c_piix4
2021-11-22T09:13:05.616917+01:00 storage-s10 kernel: [1930141.143948] CPU: 1 PID: 702 Comm: nfsd Tainted: G D W 5.14.0-0.bpo.2-amd64 #1 Debian 5.14.9-2~bpo11+1
2021-11-22T09:13:05.616918+01:00 storage-s10 kernel: [1930141.144966] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018

If it can be relevant, we try to set /proc/sys/kernel/panic to 20 for enforce a reboot on kernel bug but it did not work.

Our groups of servers are used to host websites, some with only one website, some with many. For now, this issue only occurred on NFS servers part of a group with many websites. But one of the most load NFS server is also used only for one website and has never had this issue. May be this issue occurs only when the NFS server is serving many different files.

Regards,

Olivier

-----Message d'origine-----
De?: Salvatore Bonaccorso <[email protected]> De la part de Salvatore Bonaccorso
Envoy??: mercredi 6 octobre 2021 20:47
??: J. Bruce Fields <[email protected]>; [email protected]
Cc?: Chuck Lever <[email protected]>
Objet?: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

Hi,

On Sun, Oct 18, 2020 at 11:39:03AM +0200, Salvatore Bonaccorso wrote:
> Hi,
>
> On Mon, Oct 12, 2020 at 12:33:55PM -0400, J. Bruce Fields wrote:
> > On Mon, Oct 12, 2020 at 05:41:59PM +0200, Salvatore Bonaccorso wrote:
> > > Hi Bruce,
> > >
> > > Thanks a lot for your reply, much appreciated.
> > >
> > > On Mon, Oct 12, 2020 at 10:26:02AM -0400, J. Bruce Fields wrote:
> > > > On Sun, Oct 11, 2020 at 09:59:13AM +0200, Salvatore Bonaccorso wrote:
> > > > > Hi
> > > > >
> > > > > On a system running 4.19.146-1 in Debian buster an issue got
> > > > > hit, while the server was under some slight load, but it does
> > > > > not seem easily reproducible, so asking if some more
> > > > > information can be provided to track/narrow this down. On the
> > > > > console the following was
> > > > > caught:
> > > >
> > > > Worth checking git logs of fs/nfsd/nfs4state.c and
> > > > fs/nfsd/nfs4callback.c. It might be
> > > > 2bbfed98a4d82ac4e7abfcd4eba40bddfc670b1d "nfsd: Fix races
> > > > between
> > > > nfsd4_cb_release() and nfsd4_shutdown_callback()" ?
> > >
> > > That might be possible. As it was not possible to simply trigger
> > > the issue, do you know if it is possible to simply reproduce the
> > > issue fixed in the above?
> >
> > I don't have a reproducer.
>
> I stil could not find a way to controlled trigger the issue, but on
> the same server the following was caused as well, which might be
> releated as well (altough different backtrace, but maybe gives
> additional hints what can be looked for):
>
> [ 4390.059004] ------------[ cut here ]------------ [ 4390.063780]
> WARNING: CPU: 14 PID: 24184 at fs/nfsd/nfs4state.c:4778
> laundromat_main.cold.125+0x31/0x7a [nfsd] [ 4390.073763] Modules
> linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc
> rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree
> bonding intel_rapl skx_edac nfit libnv dimm x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ast ttm
> drm_kms_helper intel_uncore mei_me drm jo ydev iTCO_wdt evdev
> pcc_cpufreq pcspkr sg ioatdma intel_rapl_perf mei iTCO_vendor_support
> i2c_algo_bit dca ipmi_si wmi ipmi_devintf ipmi_msghandler
> acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace
> sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb
> dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq
> async_xor async_tx xor raid6_pq libcrc32c crc 32c_generic raid1 [
> 4390.144313] raid0 multipath linear md_mod hid_generic usbhid hid
> sd_mod crc32c_intel aesni_intel ahci xhci_pci libahci aes_x86_64
> xhci_hcd arcmsr crypto_simd libata cryptd i40e usbcore scsi_mod
> glue_helper lpc_ich i2c_i801 mfd_core usb_common [ 4390.165906] CPU:
> 14 PID: 24184 Comm: kworker/u42:2 Not tainted 4.19.0-11-amd64 #1
> Debian 4.19.146-1 [ 4390.174969] Hardware name: DALCO AG
> S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
> [ 4390.184654] Workqueue: nfsd4 laundromat_main [nfsd] [ 4390.189550]
> RIP: 0010:laundromat_main.cold.125+0x31/0x7a [nfsd] [ 4390.195484]
> Code: f6 6b c0 e8 24 ae 62 d7 e9 6f 21 ff ff 48 c7 c7 38 f6 6b c0 e8
> 13 ae 62 d7 e9 1f 21 ff ff 48 c7 c7 40 f3 6b c0 e8 02 ae 62 d7 <0f> 0b
> e9 34 23 ff ff 48 c7 c7 40 f3 6b c0 e8 ef a d 62 d7 0f 0b e9 [
> 4390.214280] RSP: 0018:ffffb0568a6d7e20 EFLAGS: 00010246 [ 4390.219523] RAX: 0000000000000024 RBX: ffffb0568a6d7e50 RCX: 0000000000000000 [ 4390.226678] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI: ffff9def8f8166b8 [ 4390.233826] RBP: ffffb0568a6d7e50 R08: 0000000000000556 R09: 0000000000aaaaaa [ 4390.240976] R10: 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732 [ 4390.248126] R13: ffff9de31d44c798 R14: 000000000000001c R15: ffffb0568a6d7e50 [ 4390.255277] FS: 0000000000000000(0000) GS:ffff9def8f800000(0000) knlGS:0000000000000000 [ 4390.263382] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4390.269147] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0 [ 4390.276302] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4390.283448] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4390.290598] PKRU: 55555554 [ 4390.293316] Call Trace:
> [ 4390.295790] process_one_work+0x1a7/0x3a0 [ 4390.299815]
> worker_thread+0x30/0x390 [ 4390.303487] ? create_worker+0x1a0/0x1a0 [
> 4390.307422] kthread+0x112/0x130 [ 4390.310661] ?
> kthread_bind+0x30/0x30 [ 4390.314340] ret_from_fork+0x35/0x40 [
> 4390.317936] ---[ end trace d0209d068f8583fd ]--- [ 4390.322577]
> list_add double add: new=ffff9de31d44c798, prev=ffffb0568a6d7e50, next=ffff9de31d44c798.
> [ 4390.331736] ------------[ cut here ]------------ [ 4390.336368]
> kernel BUG at lib/list_debug.c:31!
> [ 4390.340829] invalid opcode: 0000 [#1] SMP PTI
> [ 4390.345200] CPU: 14 PID: 24184 Comm: kworker/u42:2 Tainted: G W 4.19.0-11-amd64 #1 Debian 4.19.146-1
> [ 4390.355648] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS
> SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 4390.365327]
> Workqueue: nfsd4 laundromat_main [nfsd] [ 4390.371091] RIP:
> 0010:__list_add_valid+0x41/0x50 [ 4390.376551] Code: 85 94 00 00 00 48
> 39 c7 74 0b 48 39 d7 74 06 b8 01 00 00 00 c3 48 89 f2 4c 89 c1 48 89
> fe 48 c7 c7 80 a4 a9 98 e8 bd 6b d0 ff <0f> 0b 66 66 2e 0f 1f 84 00 00
> 00 00 00 66 90 48 8b 07 48 8b 57 08 [ 4390.397015] RSP:
> 0018:ffffb0568a6d7e18 EFLAGS: 00010246 [ 4390.403077] RAX:
> 0000000000000058 RBX: ffff9de31d44c798 RCX: 0000000000000000 [
> 4390.411045] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI:
> ffff9def8f8166b8 [ 4390.418996] RBP: ffffb0568a6d7e50 R08:
> 0000000000000573 R09: 0000000000aaaaaa [ 4390.426937] R10:
> 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732 [
> 4390.434859] R13: ffff9de31d44c798 R14: 000000000000001c R15:
> ffffb0568a6d7e50 [ 4390.442761] FS: 0000000000000000(0000)
> GS:ffff9def8f800000(0000) knlGS:0000000000000000 [ 4390.451610] CS:
> 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4390.458100] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0 [ 4390.465970] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4390.473820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4390.481650] PKRU: 55555554 [ 4390.485032] Call Trace:
> [ 4390.488145] laundromat_main+0x27a/0x610 [nfsd] [ 4390.493333]
> process_one_work+0x1a7/0x3a0 [ 4390.497987] worker_thread+0x30/0x390
> [ 4390.502276] ? create_worker+0x1a0/0x1a0 [ 4390.506808]
> kthread+0x112/0x130 [ 4390.510636] ? kthread_bind+0x30/0x30 [
> 4390.514892] ret_from_fork+0x35/0x40 [ 4390.519056] Modules linked
> in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc
> rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree
> bonding intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ast ttm
> drm_kms_helper intel_uncore mei_me drm joydev iTCO_wdt evdev
> pcc_cpufreq pcspkr sg ioatdma intel_rapl_perf mei iTCO_vendor_support
> i2c_algo_bit dca ipmi_si wmi ipmi_devintf ipmi_msghandler
> acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace
> sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb
> dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq
> async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 [
> 4390.593983] raid0 multipath linear md_mod hid_generic usbhid hid
> sd_mod crc32c_intel aesni_intel ahci xhci_pci libahci aes_x86_64
> xhci_hcd arcmsr crypto_simd libata cryptd i40e usbcore scsi_mod
> glue_helper lpc_ich i2c_i801 mfd_core usb_common [ 4390.616982] ---[
> end trace d0209d068f8583fe ]--- [ 4390.704972] RIP:
> 0010:__list_add_valid+0x41/0x50 [ 4390.710342] Code: 85 94 00 00 00 48
> 39 c7 74 0b 48 39 d7 74 06 b8 01 00 00 00 c3 48 89 f2 4c 89 c1 48 89
> fe 48 c7 c7 80 a4 a9 98 e8 bd 6b d0 ff <0f> 0b 66 66 2e 0f 1f 84 00 00
> 00 00 00 66 90 48 8b 07 48 8b 57 08 [ 4390.730656] RSP:
> 0018:ffffb0568a6d7e18 EFLAGS: 00010246 [ 4390.736676] RAX:
> 0000000000000058 RBX: ffff9de31d44c798 RCX: 0000000000000000 [
> 4390.744609] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI:
> ffff9def8f8166b8 [ 4390.752555] RBP: ffffb0568a6d7e50 R08:
> 0000000000000573 R09: 0000000000aaaaaa [ 4390.760497] R10:
> 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732 [
> 4390.768433] R13: ffff9de31d44c798 R14: 000000000000001c R15:
> ffffb0568a6d7e50 [ 4390.776361] FS: 0000000000000000(0000)
> GS:ffff9def8f800000(0000) knlGS:0000000000000000 [ 4390.785244] CS:
> 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4390.791789] CR2:
> 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0 [
> 4390.799725] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000 [ 4390.807642] DR3: 0000000000000000 DR6:
> 00000000fffe0ff0 DR7: 0000000000000400 [ 4390.815544] PKRU: 55555554 [
> 4390.819023] Kernel panic - not syncing: Fatal exception [
> 4390.825022] Kernel Offset: 0x16c00000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>
> Thanks for your time in any case!

Waving back after almost one year. We started to see again those crashes under load after during the year situation improved (when load went down). I'm still quite lost in trying to either reproduce the issue or pinpoint a possible cause.

Any hints from you upstream, or what you want me to provide? The host is currently running 4.19.194 based kernel in Debian buster, but can soonish update it to 4.19.208 as well to match almost the current version in the 4.19.y series.

Regards,
Salvatore


2021-11-22 15:17:54

by Chuck Lever III

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work



> On Nov 22, 2021, at 4:15 AM, Olivier Monaco <[email protected]> wrote:
>
> Hi,
>
> I think my problem is related to this thread.
>
> We are running a VMware vCenter platform running 9 groups of virtual machines. Each group includes a VM with NFS for file sharing and 3 VMs with NFS clients, so we are running 9 independent file servers.

I've opened https://bugzilla.linux-nfs.org/show_bug.cgi?id=371

Just a random thought: would enabling KASAN shed some light?


> Starting from April 21, we had sometimes a kernel bug with message like :
> 2021-05-04T02:28:21.101162+02:00 storage-t20 kernel: [1736623.971161] list_add corruption. prev->next should be next (ffff9c2d0875ecb8), but was ffff9c2c913a0fe8. (prev=ffff9c2c913a0fe8).
> 2021-05-04T02:28:21.101176+02:00 storage-t20 kernel: [1736623.971315] ------------[ cut here ]------------
> 2021-05-04T02:28:21.101177+02:00 storage-t20 kernel: [1736623.971317] kernel BUG at lib/list_debug.c:28!
> 2021-05-04T02:28:21.101178+02:00 storage-t20 kernel: [1736623.971362] invalid opcode: 0000 [#1] SMP NOPTI
> 2021-05-04T02:28:21.101178+02:00 storage-t20 kernel: [1736623.971402] CPU: 1 PID: 2435711 Comm: kworker/u256:5 Tainted: G W 5.10.0-0.bpo.4-amd64 #1 Debian 5.10.19-1~bpo10+1
> 2021-05-04T02:28:21.101179+02:00 storage-t20 kernel: [1736623.971456] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
> 2021-05-04T02:28:21.101180+02:00 storage-t20 kernel: [1736623.971499] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
> 2021-05-04T02:28:21.101180+02:00 storage-t20 kernel: [1736623.971515] RIP: 0010:__list_add_valid.cold.0+0x26/0x28
> 2021-05-04T02:28:21.101181+02:00 storage-t20 kernel: [1736623.971527] Code: 7b 1c bf ff 48 89 d1 48 c7 c7 18 71 92 86 48 89 c2 e8 02 2a ff ff 0f 0b 48 89 c1 4c 89 c6 48 c7 c7 70 71 92 86 e8 ee 29 ff ff <0f> 0b 48 89 fe 48 89 c2 48 c7 c7 00 72 92 86 e8 da 29 ff ff 0f 0b
> 2021-05-04T02:28:21.101181+02:00 storage-t20 kernel: [1736623.971564] RSP: 0018:ffffb93f4075fe48 EFLAGS: 00010246
> 2021-05-04T02:28:21.101182+02:00 storage-t20 kernel: [1736623.971579] RAX: 0000000000000075 RBX: ffff9c2c913a0fe8 RCX: 0000000000000000
> 2021-05-04T02:28:21.101182+02:00 storage-t20 kernel: [1736623.971594] RDX: 0000000000000000 RSI: ffff9c2d39e58a00 RDI: ffff9c2d39e58a00
> 2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971608] RBP: ffff9c2c913a1018 R08: 0000000000000000 R09: c0000000ffff7fff
> 2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971623] R10: 0000000000000001 R11: ffffb93f4075fc58 R12: ffff9c2d0875ec00
> 2021-05-04T02:28:21.101183+02:00 storage-t20 kernel: [1736623.971637] R13: ffff9c2c913a0fe8 R14: ffff9c2d0875ecb8 R15: ffff9c2c913a1050
> 2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971653] FS: 0000000000000000(0000) GS:ffff9c2d39e40000(0000) knlGS:0000000000000000
> 2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971684] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2021-05-04T02:28:21.101184+02:00 storage-t20 kernel: [1736623.971735] CR2: 00007f8d98002698 CR3: 0000000103904005 CR4: 00000000007706e0
> 2021-05-04T02:28:21.101185+02:00 storage-t20 kernel: [1736623.971774] PKRU: 55555554
> 2021-05-04T02:28:21.101185+02:00 storage-t20 kernel: [1736623.971781] Call Trace:
> 2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971805] nfsd4_cb_recall_prepare+0x2aa/0x2f0 [nfsd]
> 2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971829] nfsd4_run_cb_work+0xe9/0x150 [nfsd]
> 2021-05-04T02:28:21.101186+02:00 storage-t20 kernel: [1736623.971843] process_one_work+0x1aa/0x340
> 2021-05-04T02:28:21.101187+02:00 storage-t20 kernel: [1736623.971855] ? create_worker+0x1a0/0x1a0
> 2021-05-04T02:28:21.101187+02:00 storage-t20 kernel: [1736623.971865] worker_thread+0x30/0x390
> 2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.971875] ? create_worker+0x1a0/0x1a0
> 2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.972279] kthread+0x116/0x130
> 2021-05-04T02:28:21.101188+02:00 storage-t20 kernel: [1736623.972663] ? kthread_park+0x80/0x80
> 2021-05-04T02:28:21.101189+02:00 storage-t20 kernel: [1736623.973043] ret_from_fork+0x1f/0x30
> 2021-05-04T02:28:21.101189+02:00 storage-t20 kernel: [1736623.973411] Modules linked in: binfmt_misc vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common nfit libnvdimm crc32_pclmul ghash_clmulni_intel aesni_intel libaes crypto_simd cryptd glue_helper rapl vmw_balloon vmwgfx joydev evdev serio_raw pcspkr ttm sg drm_kms_helper vmw_vmci cec ac button nfsd auth_rpcgss nfs_acl lockd grace drm sunrpc fuse configfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic dm_mod sd_mod t10_pi crc_t10dif crct10dif_generic ata_generic crct10dif_pclmul crct10dif_common crc32c_intel psmouse vmxnet3 ata_piix libata vmw_pvscsi scsi_mod i2c_piix4
> 2021-05-04T02:28:21.101190+02:00 storage-t20 kernel: [1736623.976175] ---[ end trace f6e153631af275dd ]---
>
> This problem occurs when server are under load but not during a load pick. There's no way to detect incoming failure. We tried to reproduce this issue without success.
>
> Before the end of October, we were running Debian Buster with 5.10 kernel. This issue occurred at least 4 times before summer, none during the summer and started again at mid-October (3 times) when load on our infrastructure was increasing again.
>
> We updated all our NFS servers at end of October to Debian Bullseye with kernel 5.14. This issues occurred for the first time then today on one of our servers:
> 2021-11-22T09:12:16.957103+01:00 storage-s10 kernel: [1930092.493512] list_add corruption. prev->next should be next (ffff9244023034b0), but was ffff92439f684c08. (prev=ffff92439f684c08).
> 2021-11-22T09:12:16.957118+01:00 storage-s10 kernel: [1930092.493693] ------------[ cut here ]------------
> 2021-11-22T09:12:16.957119+01:00 storage-s10 kernel: [1930092.493694] kernel BUG at lib/list_debug.c:26!
> 2021-11-22T09:12:16.957119+01:00 storage-s10 kernel: [1930092.493764] invalid opcode: 0000 [#1] SMP NOPTI
> 2021-11-22T09:12:16.957120+01:00 storage-s10 kernel: [1930092.493812] CPU: 0 PID: 2506486 Comm: kworker/u256:4 Tainted: G W 5.14.0-0.bpo.2-amd64 #1 Debian 5.14.9-2~bpo11+1
> 2021-11-22T09:12:16.957121+01:00 storage-s10 kernel: [1930092.493861] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
> 2021-11-22T09:12:16.957122+01:00 storage-s10 kernel: [1930092.493982] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
> 2021-11-22T09:12:16.957126+01:00 storage-s10 kernel: [1930092.494135] RIP: 0010:__list_add_valid.cold+0x3d/0x3f
> 2021-11-22T09:12:16.957127+01:00 storage-s10 kernel: [1930092.494203] Code: f2 4c 89 c1 48 89 fe 48 c7 c7 08 53 15 a8 e8 f8 fe fe ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 b0 52 15 a8 e8 e1 fe fe ff <0f> 0b 48 89 fe 48 c7 c7 40 53 15 a8 e8 d0 fe fe ff 0f 0b 48 c7 c7
> 2021-11-22T09:12:16.957128+01:00 storage-s10 kernel: [1930092.494288] RSP: 0018:ffffbd09c29b7e38 EFLAGS: 00010246
> 2021-11-22T09:12:16.957128+01:00 storage-s10 kernel: [1930092.494323] RAX: 0000000000000075 RBX: ffff92439f684c08 RCX: 0000000000000000
> 2021-11-22T09:12:16.957135+01:00 storage-s10 kernel: [1930092.494350] RDX: 0000000000000000 RSI: ffff924439e18880 RDI: ffff924439e18880
> 2021-11-22T09:12:16.957136+01:00 storage-s10 kernel: [1930092.494371] RBP: ffff9243087a2950 R08: 0000000000000000 R09: ffffbd09c29b7c68
> 2021-11-22T09:12:16.957136+01:00 storage-s10 kernel: [1930092.494391] R10: ffffbd09c29b7c60 R11: ffff92443fec48a8 R12: ffff924402303400
> 2021-11-22T09:12:16.957137+01:00 storage-s10 kernel: [1930092.494426] R13: ffff9243087a2920 R14: ffff9244023034b0 R15: 0000000000000006
> 2021-11-22T09:12:16.957137+01:00 storage-s10 kernel: [1930092.494447] FS: 0000000000000000(0000) GS:ffff924439e00000(0000) knlGS:0000000000000000
> 2021-11-22T09:12:16.957137+01:00 storage-s10 kernel: [1930092.494470] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2021-11-22T09:12:16.957139+01:00 storage-s10 kernel: [1930092.494488] CR2: 00007fb154911328 CR3: 000000010aabe005 CR4: 00000000007706f0
> 2021-11-22T09:12:16.957139+01:00 storage-s10 kernel: [1930092.494536] PKRU: 55555554
> 2021-11-22T09:12:16.957140+01:00 storage-s10 kernel: [1930092.494546] Call Trace:
> 2021-11-22T09:12:16.957140+01:00 storage-s10 kernel: [1930092.494578] nfsd4_cb_recall_prepare+0x144/0x160 [nfsd]
> 2021-11-22T09:12:16.957140+01:00 storage-s10 kernel: [1930092.494626] nfsd4_run_cb_work+0x8b/0x140 [nfsd]
> 2021-11-22T09:12:16.957141+01:00 storage-s10 kernel: [1930092.494659] process_one_work+0x1e9/0x390
> 2021-11-22T09:12:16.957141+01:00 storage-s10 kernel: [1930092.494675] worker_thread+0x53/0x3e0
> 2021-11-22T09:12:16.957143+01:00 storage-s10 kernel: [1930092.495301] ? process_one_work+0x390/0x390
> 2021-11-22T09:12:16.957143+01:00 storage-s10 kernel: [1930092.495851] kthread+0x124/0x150
> 2021-11-22T09:12:16.957143+01:00 storage-s10 kernel: [1930092.496396] ? set_kthread_struct+0x40/0x40
> 2021-11-22T09:12:16.957144+01:00 storage-s10 kernel: [1930092.496926] ret_from_fork+0x1f/0x30
> 2021-11-22T09:12:16.957144+01:00 storage-s10 kernel: [1930092.497450] Modules linked in: binfmt_misc vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common nfit libnvdimm ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl vmwgfx vmw_balloon ttm evdev joydev serio_raw pcspkr drm_kms_helper vmw_vmci cec sg rc_core ac button nfsd auth_rpcgss nfs_acl lockd grace drm fuse sunrpc configfs ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c crc32c_generic dm_mod sd_mod t10_pi crc_t10dif crct10dif_generic ata_generic crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ata_piix vmw_pvscsi libata psmouse scsi_mod vmxnet3 i2c_piix4
> 2021-11-22T09:12:16.957146+01:00 storage-s10 kernel: [1930092.501328] ---[ end trace 1f8f8a420695731b ]---
>
> This trace is followed by:
> 2021-11-22T09:12:16.957146+01:00 storage-s10 kernel: [1930092.501874] RIP: 0010:__list_add_valid.cold+0x3d/0x3f
> 2021-11-22T09:12:16.958925+01:00 storage-s10 kernel: [1930092.502423] Code: f2 4c 89 c1 48 89 fe 48 c7 c7 08 53 15 a8 e8 f8 fe fe ff 0f 0b 48 89 d1 4c 89 c6 4c 89 ca 48 c7 c7 b0 52 15 a8 e8 e1 fe fe ff <0f> 0b 48 89 fe 48 c7 c7 40 53 15 a8 e8 d0 fe fe ff 0f 0b 48 c7 c7
> 2021-11-22T09:12:16.958933+01:00 storage-s10 kernel: [1930092.504070] RSP: 0018:ffffbd09c29b7e38 EFLAGS: 00010246
> 2021-11-22T09:12:16.962117+01:00 storage-s10 kernel: [1930092.504608] RAX: 0000000000000075 RBX: ffff92439f684c08 RCX: 0000000000000000
> 2021-11-22T09:12:16.962124+01:00 storage-s10 kernel: [1930092.505150] RDX: 0000000000000000 RSI: ffff924439e18880 RDI: ffff924439e18880
> 2021-11-22T09:12:16.962124+01:00 storage-s10 kernel: [1930092.505685] RBP: ffff9243087a2950 R08: 0000000000000000 R09: ffffbd09c29b7c68
> 2021-11-22T09:12:16.962125+01:00 storage-s10 kernel: [1930092.506213] R10: ffffbd09c29b7c60 R11: ffff92443fec48a8 R12: ffff924402303400
> 2021-11-22T09:12:16.962125+01:00 storage-s10 kernel: [1930092.506729] R13: ffff9243087a2920 R14: ffff9244023034b0 R15: 0000000000000006
> 2021-11-22T09:12:16.962126+01:00 storage-s10 kernel: [1930092.507260] FS: 0000000000000000(0000) GS:ffff924439e00000(0000) knlGS:0000000000000000
> 2021-11-22T09:12:16.965880+01:00 storage-s10 kernel: [1930092.507783] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2021-11-22T09:12:16.965886+01:00 storage-s10 kernel: [1930092.508302] CR2: 00007fb154911328 CR3: 000000010aabe005 CR4: 00000000007706f0
> 2021-11-22T09:12:16.965887+01:00 storage-s10 kernel: [1930092.508854] PKRU: 55555554
> 2021-11-22T09:12:38.013111+01:00 storage-s10 kernel: [1930113.531463] rcu: INFO: rcu_sched self-detected stall on CPU
> 2021-11-22T09:12:38.013126+01:00 storage-s10 kernel: [1930113.532037] rcu: #0111-....: (1 GPs behind) idle=726/1/0x4000000000000000 softirq=433107463/433107464 fqs=2625
> 2021-11-22T09:12:38.013128+01:00 storage-s10 kernel: [1930113.533150] #011(t=5251 jiffies g=411028473 q=326)
> 2021-11-22T09:12:38.013129+01:00 storage-s10 kernel: [1930113.533689] NMI backtrace for cpu 1
> 2021-11-22T09:12:38.013130+01:00 storage-s10 kernel: [1930113.534251] CPU: 1 PID: 702 Comm: nfsd Tainted: G D W 5.14.0-0.bpo.2-amd64 #1 Debian 5.14.9-2~bpo11+1
> 2021-11-22T09:12:38.013130+01:00 storage-s10 kernel: [1930113.535361] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
> 2021-11-22T09:12:38.013131+01:00 storage-s10 kernel: [1930113.536527] Call Trace:
> 2021-11-22T09:12:38.013131+01:00 storage-s10 kernel: [1930113.537119] <IRQ>
> 2021-11-22T09:12:38.013132+01:00 storage-s10 kernel: [1930113.537790] dump_stack_lvl+0x46/0x5a
> 2021-11-22T09:12:38.013132+01:00 storage-s10 kernel: [1930113.538470] nmi_cpu_backtrace.cold+0x32/0x69
> 2021-11-22T09:12:38.013132+01:00 storage-s10 kernel: [1930113.539058] ? lapic_can_unplug_cpu+0x80/0x80
> 2021-11-22T09:12:38.013133+01:00 storage-s10 kernel: [1930113.539660] nmi_trigger_cpumask_backtrace+0xd7/0xe0
> 2021-11-22T09:12:38.013133+01:00 storage-s10 kernel: [1930113.540243] rcu_dump_cpu_stacks+0xc1/0xef
> 2021-11-22T09:12:38.013134+01:00 storage-s10 kernel: [1930113.540810] rcu_sched_clock_irq.cold+0xc7/0x1e9
> 2021-11-22T09:12:38.013134+01:00 storage-s10 kernel: [1930113.541342] ? trigger_load_balance+0x60/0x2e0
> 2021-11-22T09:12:38.013134+01:00 storage-s10 kernel: [1930113.541886] ? scheduler_tick+0xb8/0x220
> 2021-11-22T09:12:38.013135+01:00 storage-s10 kernel: [1930113.542398] update_process_times+0x8c/0xc0
> 2021-11-22T09:12:38.013135+01:00 storage-s10 kernel: [1930113.542909] tick_sched_handle+0x22/0x60
> 2021-11-22T09:12:38.013136+01:00 storage-s10 kernel: [1930113.543398] tick_sched_timer+0x7a/0xd0
> 2021-11-22T09:12:38.013137+01:00 storage-s10 kernel: [1930113.543882] ? tick_do_update_jiffies64.part.0+0xa0/0xa0
> 2021-11-22T09:12:38.013137+01:00 storage-s10 kernel: [1930113.544355] __hrtimer_run_queues+0x127/0x270
> 2021-11-22T09:12:38.013138+01:00 storage-s10 kernel: [1930113.544817] hrtimer_interrupt+0x110/0x2c0
> 2021-11-22T09:12:38.013138+01:00 storage-s10 kernel: [1930113.545288] __sysvec_apic_timer_interrupt+0x59/0xd0
> 2021-11-22T09:12:38.013139+01:00 storage-s10 kernel: [1930113.545724] sysvec_apic_timer_interrupt+0x6d/0x90
> 2021-11-22T09:12:38.013140+01:00 storage-s10 kernel: [1930113.546187] </IRQ>
> 2021-11-22T09:12:38.013140+01:00 storage-s10 kernel: [1930113.546611] asm_sysvec_apic_timer_interrupt+0x12/0x20
> 2021-11-22T09:12:38.013141+01:00 storage-s10 kernel: [1930113.547042] RIP: 0010:native_queued_spin_lock_slowpath+0xe/0x1d0
> 2021-11-22T09:12:38.013141+01:00 storage-s10 kernel: [1930113.547499] Code: 0f b1 38 48 85 c0 75 c5 4d 89 50 08 4c 89 c0 c3 0f 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 66 90 ba 01 00 00 00 8b 07 <85> c0 75 09 f0 0f b1 17 85 c0 75 f2 c3 f3 90 eb ed 81 fe 00 01 00
> 2021-11-22T09:12:38.013143+01:00 storage-s10 kernel: [1930113.548864] RSP: 0018:ffffbd09c2f2bce0 EFLAGS: 00000202
> 2021-11-22T09:12:38.013143+01:00 storage-s10 kernel: [1930113.549320] RAX: 0000000000000001 RBX: ffff924405db8900 RCX: 0000000000000038
> 2021-11-22T09:12:38.013144+01:00 storage-s10 kernel: [1930113.549791] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffc076fd00
> 2021-11-22T09:12:38.013144+01:00 storage-s10 kernel: [1930113.550268] RBP: ffffbd09c2f2bdc8 R08: ffff92437f8b9604 R09: ffffffffa8653b40
> 2021-11-22T09:12:38.013144+01:00 storage-s10 kernel: [1930113.550714] R10: ffff92430d001d08 R11: 000945fa00000000 R12: 0000000000000000
> 2021-11-22T09:12:38.013145+01:00 storage-s10 kernel: [1930113.551187] R13: 00000000000000e0 R14: ffff924405610000 R15: ffff9243ba930980
> 2021-11-22T09:12:38.013155+01:00 storage-s10 kernel: [1930113.551636] _raw_spin_lock+0x1a/0x20
> 2021-11-22T09:12:38.013155+01:00 storage-s10 kernel: [1930113.552087] nfsd4_process_open2+0x6d8/0x1690 [nfsd]
> 2021-11-22T09:12:38.013155+01:00 storage-s10 kernel: [1930113.552568] ? nfsd_permission+0x6a/0x100 [nfsd]
> 2021-11-22T09:12:38.013156+01:00 storage-s10 kernel: [1930113.553012] ? fh_verify+0x1b1/0x6c0 [nfsd]
> 2021-11-22T09:12:38.013156+01:00 storage-s10 kernel: [1930113.553473] ? nfsd_lookup+0xab/0x150 [nfsd]
> 2021-11-22T09:12:38.013156+01:00 storage-s10 kernel: [1930113.553893] nfsd4_open+0x614/0x6f0 [nfsd]
> 2021-11-22T09:12:38.013157+01:00 storage-s10 kernel: [1930113.554305] nfsd4_proc_compound+0x393/0x630 [nfsd]
> 2021-11-22T09:12:38.013158+01:00 storage-s10 kernel: [1930113.554737] nfsd_dispatch+0x147/0x230 [nfsd]
> 2021-11-22T09:12:38.013159+01:00 storage-s10 kernel: [1930113.555143] svc_process_common+0x3d1/0x6d0 [sunrpc]
> 2021-11-22T09:12:38.013159+01:00 storage-s10 kernel: [1930113.555593] ? nfsd_svc+0x350/0x350 [nfsd]
> 2021-11-22T09:12:38.013159+01:00 storage-s10 kernel: [1930113.556004] svc_process+0xb7/0xf0 [sunrpc]
> 2021-11-22T09:12:38.013160+01:00 storage-s10 kernel: [1930113.556440] nfsd+0xe8/0x140 [nfsd]
> 2021-11-22T09:12:38.013160+01:00 storage-s10 kernel: [1930113.556866] ? nfsd_shutdown_threads+0x80/0x80 [nfsd]
> 2021-11-22T09:12:38.013162+01:00 storage-s10 kernel: [1930113.557282] kthread+0x124/0x150
> 2021-11-22T09:12:38.013162+01:00 storage-s10 kernel: [1930113.557689] ? set_kthread_struct+0x40/0x40
> 2021-11-22T09:12:38.013162+01:00 storage-s10 kernel: [1930113.558105] ret_from_fork+0x1f/0x30
> ...
> 2021-11-22T09:13:05.616901+01:00 storage-s10 kernel: [1930141.140062] watchdog: BUG: soft lockup - CPU#1 stuck for 49s! [nfsd:702]
> 2021-11-22T09:13:05.616916+01:00 storage-s10 kernel: [1930141.140654] Modules linked in: binfmt_misc vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common nfit libnvdimm ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl vmwgfx vmw_balloon ttm evdev joydev serio_raw pcspkr drm_kms_helper vmw_vmci cec sg rc_core ac button nfsd auth_rpcgss nfs_acl lockd grace drm fuse sunrpc configfs ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c crc32c_generic dm_mod sd_mod t10_pi crc_t10dif crct10dif_generic ata_generic crct10dif_pclmul crct10dif_common crc32_pclmul crc32c_intel ata_piix vmw_pvscsi libata psmouse scsi_mod vmxnet3 i2c_piix4
> 2021-11-22T09:13:05.616917+01:00 storage-s10 kernel: [1930141.143948] CPU: 1 PID: 702 Comm: nfsd Tainted: G D W 5.14.0-0.bpo.2-amd64 #1 Debian 5.14.9-2~bpo11+1
> 2021-11-22T09:13:05.616918+01:00 storage-s10 kernel: [1930141.144966] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
>
> If it can be relevant, we try to set /proc/sys/kernel/panic to 20 for enforce a reboot on kernel bug but it did not work.
>
> Our groups of servers are used to host websites, some with only one website, some with many. For now, this issue only occurred on NFS servers part of a group with many websites. But one of the most load NFS server is also used only for one website and has never had this issue. May be this issue occurs only when the NFS server is serving many different files.
>
> Regards,
>
> Olivier
>
> -----Message d'origine-----
> De : Salvatore Bonaccorso <[email protected]> De la part de Salvatore Bonaccorso
> Envoyé : mercredi 6 octobre 2021 20:47
> À : J. Bruce Fields <[email protected]>; [email protected]
> Cc : Chuck Lever <[email protected]>
> Objet : Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work
>
> Hi,
>
> On Sun, Oct 18, 2020 at 11:39:03AM +0200, Salvatore Bonaccorso wrote:
>> Hi,
>>
>> On Mon, Oct 12, 2020 at 12:33:55PM -0400, J. Bruce Fields wrote:
>>> On Mon, Oct 12, 2020 at 05:41:59PM +0200, Salvatore Bonaccorso wrote:
>>>> Hi Bruce,
>>>>
>>>> Thanks a lot for your reply, much appreciated.
>>>>
>>>> On Mon, Oct 12, 2020 at 10:26:02AM -0400, J. Bruce Fields wrote:
>>>>> On Sun, Oct 11, 2020 at 09:59:13AM +0200, Salvatore Bonaccorso wrote:
>>>>>> Hi
>>>>>>
>>>>>> On a system running 4.19.146-1 in Debian buster an issue got
>>>>>> hit, while the server was under some slight load, but it does
>>>>>> not seem easily reproducible, so asking if some more
>>>>>> information can be provided to track/narrow this down. On the
>>>>>> console the following was
>>>>>> caught:
>>>>>
>>>>> Worth checking git logs of fs/nfsd/nfs4state.c and
>>>>> fs/nfsd/nfs4callback.c. It might be
>>>>> 2bbfed98a4d82ac4e7abfcd4eba40bddfc670b1d "nfsd: Fix races
>>>>> between
>>>>> nfsd4_cb_release() and nfsd4_shutdown_callback()" ?
>>>>
>>>> That might be possible. As it was not possible to simply trigger
>>>> the issue, do you know if it is possible to simply reproduce the
>>>> issue fixed in the above?
>>>
>>> I don't have a reproducer.
>>
>> I stil could not find a way to controlled trigger the issue, but on
>> the same server the following was caused as well, which might be
>> releated as well (altough different backtrace, but maybe gives
>> additional hints what can be looked for):
>>
>> [ 4390.059004] ------------[ cut here ]------------ [ 4390.063780]
>> WARNING: CPU: 14 PID: 24184 at fs/nfsd/nfs4state.c:4778
>> laundromat_main.cold.125+0x31/0x7a [nfsd] [ 4390.073763] Modules
>> linked in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc
>> rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree
>> bonding intel_rapl skx_edac nfit libnv dimm x86_pkg_temp_thermal
>> intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif
>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ast ttm
>> drm_kms_helper intel_uncore mei_me drm jo ydev iTCO_wdt evdev
>> pcc_cpufreq pcspkr sg ioatdma intel_rapl_perf mei iTCO_vendor_support
>> i2c_algo_bit dca ipmi_si wmi ipmi_devintf ipmi_msghandler
>> acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace
>> sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb
>> dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq
>> async_xor async_tx xor raid6_pq libcrc32c crc 32c_generic raid1 [
>> 4390.144313] raid0 multipath linear md_mod hid_generic usbhid hid
>> sd_mod crc32c_intel aesni_intel ahci xhci_pci libahci aes_x86_64
>> xhci_hcd arcmsr crypto_simd libata cryptd i40e usbcore scsi_mod
>> glue_helper lpc_ich i2c_i801 mfd_core usb_common [ 4390.165906] CPU:
>> 14 PID: 24184 Comm: kworker/u42:2 Not tainted 4.19.0-11-amd64 #1
>> Debian 4.19.146-1 [ 4390.174969] Hardware name: DALCO AG
>> S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
>> [ 4390.184654] Workqueue: nfsd4 laundromat_main [nfsd] [ 4390.189550]
>> RIP: 0010:laundromat_main.cold.125+0x31/0x7a [nfsd] [ 4390.195484]
>> Code: f6 6b c0 e8 24 ae 62 d7 e9 6f 21 ff ff 48 c7 c7 38 f6 6b c0 e8
>> 13 ae 62 d7 e9 1f 21 ff ff 48 c7 c7 40 f3 6b c0 e8 02 ae 62 d7 <0f> 0b
>> e9 34 23 ff ff 48 c7 c7 40 f3 6b c0 e8 ef a d 62 d7 0f 0b e9 [
>> 4390.214280] RSP: 0018:ffffb0568a6d7e20 EFLAGS: 00010246 [ 4390.219523] RAX: 0000000000000024 RBX: ffffb0568a6d7e50 RCX: 0000000000000000 [ 4390.226678] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI: ffff9def8f8166b8 [ 4390.233826] RBP: ffffb0568a6d7e50 R08: 0000000000000556 R09: 0000000000aaaaaa [ 4390.240976] R10: 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732 [ 4390.248126] R13: ffff9de31d44c798 R14: 000000000000001c R15: ffffb0568a6d7e50 [ 4390.255277] FS: 0000000000000000(0000) GS:ffff9def8f800000(0000) knlGS:0000000000000000 [ 4390.263382] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4390.269147] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0 [ 4390.276302] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4390.283448] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4390.290598] PKRU: 55555554 [ 4390.293316] Call Trace:
>> [ 4390.295790] process_one_work+0x1a7/0x3a0 [ 4390.299815]
>> worker_thread+0x30/0x390 [ 4390.303487] ? create_worker+0x1a0/0x1a0 [
>> 4390.307422] kthread+0x112/0x130 [ 4390.310661] ?
>> kthread_bind+0x30/0x30 [ 4390.314340] ret_from_fork+0x35/0x40 [
>> 4390.317936] ---[ end trace d0209d068f8583fd ]--- [ 4390.322577]
>> list_add double add: new=ffff9de31d44c798, prev=ffffb0568a6d7e50, next=ffff9de31d44c798.
>> [ 4390.331736] ------------[ cut here ]------------ [ 4390.336368]
>> kernel BUG at lib/list_debug.c:31!
>> [ 4390.340829] invalid opcode: 0000 [#1] SMP PTI
>> [ 4390.345200] CPU: 14 PID: 24184 Comm: kworker/u42:2 Tainted: G W 4.19.0-11-amd64 #1 Debian 4.19.146-1
>> [ 4390.355648] Hardware name: DALCO AG S2600WFT/S2600WFT, BIOS
>> SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 4390.365327]
>> Workqueue: nfsd4 laundromat_main [nfsd] [ 4390.371091] RIP:
>> 0010:__list_add_valid+0x41/0x50 [ 4390.376551] Code: 85 94 00 00 00 48
>> 39 c7 74 0b 48 39 d7 74 06 b8 01 00 00 00 c3 48 89 f2 4c 89 c1 48 89
>> fe 48 c7 c7 80 a4 a9 98 e8 bd 6b d0 ff <0f> 0b 66 66 2e 0f 1f 84 00 00
>> 00 00 00 66 90 48 8b 07 48 8b 57 08 [ 4390.397015] RSP:
>> 0018:ffffb0568a6d7e18 EFLAGS: 00010246 [ 4390.403077] RAX:
>> 0000000000000058 RBX: ffff9de31d44c798 RCX: 0000000000000000 [
>> 4390.411045] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI:
>> ffff9def8f8166b8 [ 4390.418996] RBP: ffffb0568a6d7e50 R08:
>> 0000000000000573 R09: 0000000000aaaaaa [ 4390.426937] R10:
>> 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732 [
>> 4390.434859] R13: ffff9de31d44c798 R14: 000000000000001c R15:
>> ffffb0568a6d7e50 [ 4390.442761] FS: 0000000000000000(0000)
>> GS:ffff9def8f800000(0000) knlGS:0000000000000000 [ 4390.451610] CS:
>> 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4390.458100] CR2: 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0 [ 4390.465970] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4390.473820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4390.481650] PKRU: 55555554 [ 4390.485032] Call Trace:
>> [ 4390.488145] laundromat_main+0x27a/0x610 [nfsd] [ 4390.493333]
>> process_one_work+0x1a7/0x3a0 [ 4390.497987] worker_thread+0x30/0x390
>> [ 4390.502276] ? create_worker+0x1a0/0x1a0 [ 4390.506808]
>> kthread+0x112/0x130 [ 4390.510636] ? kthread_bind+0x30/0x30 [
>> 4390.514892] ret_from_fork+0x35/0x40 [ 4390.519056] Modules linked
>> in: tcp_diag udp_diag raw_diag inet_diag unix_diag binfmt_misc
>> rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache quota_v2 quota_tree
>> bonding intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal
>> intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif
>> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate ast ttm
>> drm_kms_helper intel_uncore mei_me drm joydev iTCO_wdt evdev
>> pcc_cpufreq pcspkr sg ioatdma intel_rapl_perf mei iTCO_vendor_support
>> i2c_algo_bit dca ipmi_si wmi ipmi_devintf ipmi_msghandler
>> acpi_power_meter acpi_pad button nfsd auth_rpcgss nfs_acl lockd grace
>> sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb
>> dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq
>> async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 [
>> 4390.593983] raid0 multipath linear md_mod hid_generic usbhid hid
>> sd_mod crc32c_intel aesni_intel ahci xhci_pci libahci aes_x86_64
>> xhci_hcd arcmsr crypto_simd libata cryptd i40e usbcore scsi_mod
>> glue_helper lpc_ich i2c_i801 mfd_core usb_common [ 4390.616982] ---[
>> end trace d0209d068f8583fe ]--- [ 4390.704972] RIP:
>> 0010:__list_add_valid+0x41/0x50 [ 4390.710342] Code: 85 94 00 00 00 48
>> 39 c7 74 0b 48 39 d7 74 06 b8 01 00 00 00 c3 48 89 f2 4c 89 c1 48 89
>> fe 48 c7 c7 80 a4 a9 98 e8 bd 6b d0 ff <0f> 0b 66 66 2e 0f 1f 84 00 00
>> 00 00 00 66 90 48 8b 07 48 8b 57 08 [ 4390.730656] RSP:
>> 0018:ffffb0568a6d7e18 EFLAGS: 00010246 [ 4390.736676] RAX:
>> 0000000000000058 RBX: ffff9de31d44c798 RCX: 0000000000000000 [
>> 4390.744609] RDX: 0000000000000000 RSI: ffff9def8f8166b8 RDI:
>> ffff9def8f8166b8 [ 4390.752555] RBP: ffffb0568a6d7e50 R08:
>> 0000000000000573 R09: 0000000000aaaaaa [ 4390.760497] R10:
>> 0000000000000000 R11: 0000000000000001 R12: 000000005f8b6732 [
>> 4390.768433] R13: ffff9de31d44c798 R14: 000000000000001c R15:
>> ffffb0568a6d7e50 [ 4390.776361] FS: 0000000000000000(0000)
>> GS:ffff9def8f800000(0000) knlGS:0000000000000000 [ 4390.785244] CS:
>> 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4390.791789] CR2:
>> 00007f50de60b7f8 CR3: 000000046740a006 CR4: 00000000007606e0 [
>> 4390.799725] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000 [ 4390.807642] DR3: 0000000000000000 DR6:
>> 00000000fffe0ff0 DR7: 0000000000000400 [ 4390.815544] PKRU: 55555554 [
>> 4390.819023] Kernel panic - not syncing: Fatal exception [
>> 4390.825022] Kernel Offset: 0x16c00000 from 0xffffffff81000000
>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>
>> Thanks for your time in any case!
>
> Waving back after almost one year. We started to see again those crashes under load after during the year situation improved (when load went down). I'm still quite lost in trying to either reproduce the issue or pinpoint a possible cause.
>
> Any hints from you upstream, or what you want me to provide? The host is currently running 4.19.194 based kernel in Debian buster, but can soonish update it to 4.19.208 as well to match almost the current version in the 4.19.y series.
>
> Regards,
> Salvatore
>

--
Chuck Lever



2021-11-24 15:29:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

On Mon, Nov 22, 2021 at 03:17:28PM +0000, Chuck Lever III wrote:
>
>
> > On Nov 22, 2021, at 4:15 AM, Olivier Monaco <[email protected]> wrote:
> >
> > Hi,
> >
> > I think my problem is related to this thread.
> >
> > We are running a VMware vCenter platform running 9 groups of virtual machines. Each group includes a VM with NFS for file sharing and 3 VMs with NFS clients, so we are running 9 independent file servers.
>
> I've opened https://bugzilla.linux-nfs.org/show_bug.cgi?id=371
>
> Just a random thought: would enabling KASAN shed some light?

In fact, we've gotten reports from Redhat QE of a KASAN use-after-free
warning in the laundromat thread, which I think might be the same bug.
We've been getting occasional reports of problems here for a long time,
but they've been very hard to reproduce.

After fooling with their reproducer, I think I finally have it.
Embarrasingly, it's nothing that complicated. You can make it much
easier to reproduce by adding an msleep call after the vfs_setlease in
nfs4_set_delegation.

If it's possible to run a patched kernel in production, you could try
the following, and I'd definitely be interested in any results.

Otherwise, you can probably work around the problem by disabling
delegations. Something like

sudo echo "fs.leases-enable = 0" >/etc/sysctl.d/nfsd-workaround.conf

should do it.

Not sure if this fix is best or if there's something simpler.

--b.

commit 6de51237589e
Author: J. Bruce Fields <[email protected]>
Date: Tue Nov 23 22:31:04 2021 -0500

nfsd: fix use-after-free due to delegation race

A delegation break could arrive as soon as we've called vfs_setlease. A
delegation break runs a callback which immediately (in
nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru. If we
then exit nfs4_set_delegation without hashing the delegation, it will be
freed as soon as the callback is done with it, without ever being
removed from del_recall_lru.

Symptoms show up later as use-after-free or list corruption warnings,
usually in the laundromat thread.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index bfad94c70b84..8e063f49240b 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5159,15 +5159,16 @@ nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh,
locks_free_lock(fl);
if (status)
goto out_clnt_odstate;
+
status = nfsd4_check_conflicting_opens(clp, fp);
- if (status)
- goto out_unlock;

spin_lock(&state_lock);
spin_lock(&fp->fi_lock);
- if (fp->fi_had_conflict)
+ if (status || fp->fi_had_conflict) {
+ list_del_init(&dp->dl_recall_lru);
+ dp->dl_time++;
status = -EAGAIN;
- else
+ } else
status = hash_delegation_locked(dp, fp);
spin_unlock(&fp->fi_lock);
spin_unlock(&state_lock);

2021-11-24 15:59:55

by Trond Myklebust

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

On Wed, 2021-11-24 at 10:29 -0500, Bruce Fields wrote:
> On Mon, Nov 22, 2021 at 03:17:28PM +0000, Chuck Lever III wrote:
> >
> >
> > > On Nov 22, 2021, at 4:15 AM, Olivier Monaco
> > > <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > I think my problem is related to this thread.
> > >
> > > We are running a VMware vCenter platform running 9 groups of
> > > virtual machines. Each group includes a VM with NFS for file
> > > sharing and 3 VMs with NFS clients, so we are running 9
> > > independent file servers.
> >
> > I've opened https://bugzilla.linux-nfs.org/show_bug.cgi?id=371
> >
> > Just a random thought: would enabling KASAN shed some light?
>
> In fact, we've gotten reports from Redhat QE of a KASAN use-after-
> free
> warning in the laundromat thread, which I think might be the same
> bug.
> We've been getting occasional reports of problems here for a long
> time,
> but they've been very hard to reproduce.
>
> After fooling with their reproducer, I think I finally have it.
> Embarrasingly, it's nothing that complicated.  You can make it much
> easier to reproduce by adding an msleep call after the vfs_setlease
> in
> nfs4_set_delegation.
>
> If it's possible to run a patched kernel in production, you could try
> the following, and I'd definitely be interested in any results.
>
> Otherwise, you can probably work around the problem by disabling
> delegations.  Something like
>
>         sudo echo "fs.leases-enable = 0" >/etc/sysctl.d/nfsd-
> workaround.conf
>
> should do it.
>
> Not sure if this fix is best or if there's something simpler.
>
> --b.
>
> commit 6de51237589e
> Author: J. Bruce Fields <[email protected]>
> Date:   Tue Nov 23 22:31:04 2021 -0500
>
>     nfsd: fix use-after-free due to delegation race
>    
>     A delegation break could arrive as soon as we've called
> vfs_setlease.  A
>     delegation break runs a callback which immediately (in
>     nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru. 
> If we
>     then exit nfs4_set_delegation without hashing the delegation, it
> will be
>     freed as soon as the callback is done with it, without ever being
>     removed from del_recall_lru.
>    
>     Symptoms show up later as use-after-free or list corruption
> warnings,
>     usually in the laundromat thread.
>    
>     Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index bfad94c70b84..8e063f49240b 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -5159,15 +5159,16 @@ nfs4_set_delegation(struct nfs4_client *clp,
> struct svc_fh *fh,
>                 locks_free_lock(fl);
>         if (status)
>                 goto out_clnt_odstate;
> +
>         status = nfsd4_check_conflicting_opens(clp, fp);
> -       if (status)
> -               goto out_unlock;
>  
>         spin_lock(&state_lock);
>         spin_lock(&fp->fi_lock);
> -       if (fp->fi_had_conflict)
> +       if (status || fp->fi_had_conflict) {
> +               list_del_init(&dp->dl_recall_lru);
> +               dp->dl_time++;
>                 status = -EAGAIN;
> -       else
> +       } else
>                 status = hash_delegation_locked(dp, fp);
>         spin_unlock(&fp->fi_lock);
>         spin_unlock(&state_lock);

Why won't this leak a reference to the stid? Afaics
nfsd_break_one_deleg() does take a reference before launching the
callback daemon, and that reference is released when the delegation is
later processed by the laundromat.

Hmm... Isn't the real bug here rather that the laundromat is corrupting
both the nn->client_lru and nn->del_recall_lru lists because it is
using list_add() instead of list_move() when adding these objects to
the reaplist?

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-11-24 16:10:23

by Trond Myklebust

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

On Wed, 2021-11-24 at 10:59 -0500, Trond Myklebust wrote:
> On Wed, 2021-11-24 at 10:29 -0500, Bruce Fields wrote:
> > On Mon, Nov 22, 2021 at 03:17:28PM +0000, Chuck Lever III wrote:
> > >
> > >
> > > > On Nov 22, 2021, at 4:15 AM, Olivier Monaco
> > > > <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I think my problem is related to this thread.
> > > >
> > > > We are running a VMware vCenter platform running 9 groups of
> > > > virtual machines. Each group includes a VM with NFS for file
> > > > sharing and 3 VMs with NFS clients, so we are running 9
> > > > independent file servers.
> > >
> > > I've opened https://bugzilla.linux-nfs.org/show_bug.cgi?id=371
> > >
> > > Just a random thought: would enabling KASAN shed some light?
> >
> > In fact, we've gotten reports from Redhat QE of a KASAN use-after-
> > free
> > warning in the laundromat thread, which I think might be the same
> > bug.
> > We've been getting occasional reports of problems here for a long
> > time,
> > but they've been very hard to reproduce.
> >
> > After fooling with their reproducer, I think I finally have it.
> > Embarrasingly, it's nothing that complicated.  You can make it much
> > easier to reproduce by adding an msleep call after the vfs_setlease
> > in
> > nfs4_set_delegation.
> >
> > If it's possible to run a patched kernel in production, you could
> > try
> > the following, and I'd definitely be interested in any results.
> >
> > Otherwise, you can probably work around the problem by disabling
> > delegations.  Something like
> >
> >         sudo echo "fs.leases-enable = 0" >/etc/sysctl.d/nfsd-
> > workaround.conf
> >
> > should do it.
> >
> > Not sure if this fix is best or if there's something simpler.
> >
> > --b.
> >
> > commit 6de51237589e
> > Author: J. Bruce Fields <[email protected]>
> > Date:   Tue Nov 23 22:31:04 2021 -0500
> >
> >     nfsd: fix use-after-free due to delegation race
> >    
> >     A delegation break could arrive as soon as we've called
> > vfs_setlease.  A
> >     delegation break runs a callback which immediately (in
> >     nfsd4_cb_recall_prepare) adds the delegation to
> > del_recall_lru. 
> > If we
> >     then exit nfs4_set_delegation without hashing the delegation,
> > it
> > will be
> >     freed as soon as the callback is done with it, without ever
> > being
> >     removed from del_recall_lru.
> >    
> >     Symptoms show up later as use-after-free or list corruption
> > warnings,
> >     usually in the laundromat thread.
> >    
> >     Signed-off-by: J. Bruce Fields <[email protected]>
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index bfad94c70b84..8e063f49240b 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -5159,15 +5159,16 @@ nfs4_set_delegation(struct nfs4_client
> > *clp,
> > struct svc_fh *fh,
> >                 locks_free_lock(fl);
> >         if (status)
> >                 goto out_clnt_odstate;
> > +
> >         status = nfsd4_check_conflicting_opens(clp, fp);
> > -       if (status)
> > -               goto out_unlock;
> >  
> >         spin_lock(&state_lock);
> >         spin_lock(&fp->fi_lock);
> > -       if (fp->fi_had_conflict)
> > +       if (status || fp->fi_had_conflict) {
> > +               list_del_init(&dp->dl_recall_lru);
> > +               dp->dl_time++;
> >                 status = -EAGAIN;
> > -       else
> > +       } else
> >                 status = hash_delegation_locked(dp, fp);
> >         spin_unlock(&fp->fi_lock);
> >         spin_unlock(&state_lock);
>
> Why won't this leak a reference to the stid? Afaics
> nfsd_break_one_deleg() does take a reference before launching the
> callback daemon, and that reference is released when the delegation
> is
> later processed by the laundromat.
>
> Hmm... Isn't the real bug here rather that the laundromat is
> corrupting
> both the nn->client_lru and nn->del_recall_lru lists because it is
> using list_add() instead of list_move() when adding these objects to
> the reaplist?
>

OK. Looks like mark_client_expired_locked() does call list_del() on
clp->cl_lru on success, so I'm wrong about that.

However I don't see where anything is removing dp->dl_recall_lru, so I
still think the laundromat is corrupting the nn->del_recall_lru list.

--
Trond Myklebust
CTO, Hammerspace Inc
4984 El Camino Real, Suite 208
Los Altos, CA 94022

http://www.hammer.space

2021-11-24 16:10:40

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

On Wed, Nov 24, 2021 at 03:59:47PM +0000, Trond Myklebust wrote:
> On Wed, 2021-11-24 at 10:29 -0500, Bruce Fields wrote:
> > On Mon, Nov 22, 2021 at 03:17:28PM +0000, Chuck Lever III wrote:
> > >
> > >
> > > > On Nov 22, 2021, at 4:15 AM, Olivier Monaco
> > > > <[email protected]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I think my problem is related to this thread.
> > > >
> > > > We are running a VMware vCenter platform running 9 groups of
> > > > virtual machines. Each group includes a VM with NFS for file
> > > > sharing and 3 VMs with NFS clients, so we are running 9
> > > > independent file servers.
> > >
> > > I've opened https://bugzilla.linux-nfs.org/show_bug.cgi?id=371
> > >
> > > Just a random thought: would enabling KASAN shed some light?
> >
> > In fact, we've gotten reports from Redhat QE of a KASAN use-after-
> > free
> > warning in the laundromat thread, which I think might be the same
> > bug.
> > We've been getting occasional reports of problems here for a long
> > time,
> > but they've been very hard to reproduce.
> >
> > After fooling with their reproducer, I think I finally have it.
> > Embarrasingly, it's nothing that complicated.  You can make it much
> > easier to reproduce by adding an msleep call after the vfs_setlease
> > in
> > nfs4_set_delegation.
> >
> > If it's possible to run a patched kernel in production, you could try
> > the following, and I'd definitely be interested in any results.
> >
> > Otherwise, you can probably work around the problem by disabling
> > delegations.  Something like
> >
> >         sudo echo "fs.leases-enable = 0" >/etc/sysctl.d/nfsd-
> > workaround.conf
> >
> > should do it.
> >
> > Not sure if this fix is best or if there's something simpler.
> >
> > --b.
> >
> > commit 6de51237589e
> > Author: J. Bruce Fields <[email protected]>
> > Date:   Tue Nov 23 22:31:04 2021 -0500
> >
> >     nfsd: fix use-after-free due to delegation race
> >    
> >     A delegation break could arrive as soon as we've called
> > vfs_setlease.  A
> >     delegation break runs a callback which immediately (in
> >     nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru. 
> > If we
> >     then exit nfs4_set_delegation without hashing the delegation, it
> > will be
> >     freed as soon as the callback is done with it, without ever being
> >     removed from del_recall_lru.
> >    
> >     Symptoms show up later as use-after-free or list corruption
> > warnings,
> >     usually in the laundromat thread.
> >    
> >     Signed-off-by: J. Bruce Fields <bfi[email protected]>
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index bfad94c70b84..8e063f49240b 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -5159,15 +5159,16 @@ nfs4_set_delegation(struct nfs4_client *clp,
> > struct svc_fh *fh,
> >                 locks_free_lock(fl);
> >         if (status)
> >                 goto out_clnt_odstate;
> > +
> >         status = nfsd4_check_conflicting_opens(clp, fp);
> > -       if (status)
> > -               goto out_unlock;
> >  
> >         spin_lock(&state_lock);
> >         spin_lock(&fp->fi_lock);
> > -       if (fp->fi_had_conflict)
> > +       if (status || fp->fi_had_conflict) {
> > +               list_del_init(&dp->dl_recall_lru);
> > +               dp->dl_time++;
> >                 status = -EAGAIN;
> > -       else
> > +       } else
> >                 status = hash_delegation_locked(dp, fp);
> >         spin_unlock(&fp->fi_lock);
> >         spin_unlock(&state_lock);
>
> Why won't this leak a reference to the stid?

I'm not seeing it.

> Afaics nfsd_break_one_deleg() does take a reference before launching
> the callback daemon, and that reference is released when the
> delegation is later processed by the laundromat.

Right. Basically, there are two "long-lived" references, one taken as
long as the callback's using it, one taken as long as it's "hashed" (see
hash_delegation_locked/unhash_delegation_locked).

In the -EAGAIN case above, we're holding a temporary reference which
will be dropped on exit from this function; if a callback's in progress,
it will then drop the final reference.

> Hmm... Isn't the real bug here rather that the laundromat is corrupting
> both the nn->client_lru and nn->del_recall_lru lists because it is
> using list_add() instead of list_move() when adding these objects to
> the reaplist?

If that were the problem, I think we'd be hitting it all the time.
Looking.... No, unhash_delegation_locked did a list_del_init().

(Is the WARN_ON there too ugly?)

--b.

2021-11-24 17:14:58

by Trond Myklebust

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

On Wed, 2021-11-24 at 11:10 -0500, [email protected] wrote:
> On Wed, Nov 24, 2021 at 03:59:47PM +0000, Trond Myklebust wrote:
> > On Wed, 2021-11-24 at 10:29 -0500, Bruce Fields wrote:
> > > On Mon, Nov 22, 2021 at 03:17:28PM +0000, Chuck Lever III wrote:
> > > >
> > > >
> > > > > On Nov 22, 2021, at 4:15 AM, Olivier Monaco
> > > > > <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I think my problem is related to this thread.
> > > > >
> > > > > We are running a VMware vCenter platform running 9 groups of
> > > > > virtual machines. Each group includes a VM with NFS for file
> > > > > sharing and 3 VMs with NFS clients, so we are running 9
> > > > > independent file servers.
> > > >
> > > > I've opened https://bugzilla.linux-nfs.org/show_bug.cgi?id=371
> > > >
> > > > Just a random thought: would enabling KASAN shed some light?
> > >
> > > In fact, we've gotten reports from Redhat QE of a KASAN use-
> > > after-
> > > free
> > > warning in the laundromat thread, which I think might be the same
> > > bug.
> > > We've been getting occasional reports of problems here for a long
> > > time,
> > > but they've been very hard to reproduce.
> > >
> > > After fooling with their reproducer, I think I finally have it.
> > > Embarrasingly, it's nothing that complicated.  You can make it
> > > much
> > > easier to reproduce by adding an msleep call after the
> > > vfs_setlease
> > > in
> > > nfs4_set_delegation.
> > >
> > > If it's possible to run a patched kernel in production, you could
> > > try
> > > the following, and I'd definitely be interested in any results.
> > >
> > > Otherwise, you can probably work around the problem by disabling
> > > delegations.  Something like
> > >
> > >         sudo echo "fs.leases-enable = 0" >/etc/sysctl.d/nfsd-
> > > workaround.conf
> > >
> > > should do it.
> > >
> > > Not sure if this fix is best or if there's something simpler.
> > >
> > > --b.
> > >
> > > commit 6de51237589e
> > > Author: J. Bruce Fields <[email protected]>
> > > Date:   Tue Nov 23 22:31:04 2021 -0500
> > >
> > >     nfsd: fix use-after-free due to delegation race
> > >    
> > >     A delegation break could arrive as soon as we've called
> > > vfs_setlease.  A
> > >     delegation break runs a callback which immediately (in
> > >     nfsd4_cb_recall_prepare) adds the delegation to
> > > del_recall_lru. 
> > > If we
> > >     then exit nfs4_set_delegation without hashing the delegation,
> > > it
> > > will be
> > >     freed as soon as the callback is done with it, without ever
> > > being
> > >     removed from del_recall_lru.
> > >    
> > >     Symptoms show up later as use-after-free or list corruption
> > > warnings,
> > >     usually in the laundromat thread.
> > >    
> > >     Signed-off-by: J. Bruce Fields <[email protected]>
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index bfad94c70b84..8e063f49240b 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -5159,15 +5159,16 @@ nfs4_set_delegation(struct nfs4_client
> > > *clp,
> > > struct svc_fh *fh,
> > >                 locks_free_lock(fl);
> > >         if (status)
> > >                 goto out_clnt_odstate;
> > > +
> > >         status = nfsd4_check_conflicting_opens(clp, fp);
> > > -       if (status)
> > > -               goto out_unlock;
> > >  
> > >         spin_lock(&state_lock);
> > >         spin_lock(&fp->fi_lock);
> > > -       if (fp->fi_had_conflict)
> > > +       if (status || fp->fi_had_conflict) {
> > > +               list_del_init(&dp->dl_recall_lru);
> > > +               dp->dl_time++;
> > >                 status = -EAGAIN;
> > > -       else
> > > +       } else
> > >                 status = hash_delegation_locked(dp, fp);
> > >         spin_unlock(&fp->fi_lock);
> > >         spin_unlock(&state_lock);
> >
> > Why won't this leak a reference to the stid?
>
> I'm not seeing it.

Hmm... I thought we were leaking the reference taken by
nfsd_break_one_deleg(), but that does get cleaned up by
nfsd4_cb_recall_release()

>
> > Afaics nfsd_break_one_deleg() does take a reference before
> > launching
> > the callback daemon, and that reference is released when the
> > delegation is later processed by the laundromat.
>
> Right.  Basically, there are two "long-lived" references, one taken
> as
> long as the callback's using it, one taken as long as it's "hashed"
> (see
> hash_delegation_locked/unhash_delegation_locked).
>
> In the -EAGAIN case above, we're holding a temporary reference which
> will be dropped on exit from this function; if a callback's in
> progress,
> it will then drop the final reference.
>
> > Hmm... Isn't the real bug here rather that the laundromat is
> > corrupting
> > both the nn->client_lru and nn->del_recall_lru lists because it is
> > using list_add() instead of list_move() when adding these objects
> > to
> > the reaplist?
>
> If that were the problem, I think we'd be hitting it all the time.
> Looking....  No, unhash_delegation_locked did a list_del_init().
>
> (Is the WARN_ON there too ugly?)
>

It is a little nasty that we hide the list_del() calls in several
levels of function call, so they probably do deserve a comment.

That said, if, as in the case here, the delegation was unhashed, we
still end up not calling list_del_init() in unhash_delegation_locked(),
and since the list_add() is not conditional on it being successful, the
global list is again corrupted.

Yes, it is an unlikely race, but it is possible despite your change.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-11-24 22:06:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

On Wed, Nov 24, 2021 at 05:14:53PM +0000, Trond Myklebust wrote:
> It is a little nasty that we hide the list_del() calls in several
> levels of function call, so they probably do deserve a comment.
>
> That said, if, as in the case here, the delegation was unhashed, we
> still end up not calling list_del_init() in unhash_delegation_locked(),
> and since the list_add() is not conditional on it being successful, the
> global list is again corrupted.
>
> Yes, it is an unlikely race, but it is possible despite your change.

Thanks, good point.

Probably not something anyone's actually hitting, but another sign this
logic need rethinking.

--b.

2021-11-24 22:17:59

by Trond Myklebust

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

On Wed, 2021-11-24 at 17:06 -0500, [email protected] wrote:
> On Wed, Nov 24, 2021 at 05:14:53PM +0000, Trond Myklebust wrote:
> > It is a little nasty that we hide the list_del() calls in several
> > levels of function call, so they probably do deserve a comment.
> >
> > That said, if, as in the case here, the delegation was unhashed, we
> > still end up not calling list_del_init() in
> > unhash_delegation_locked(),
> > and since the list_add() is not conditional on it being successful,
> > the
> > global list is again corrupted.
> >
> > Yes, it is an unlikely race, but it is possible despite your
> > change.
>
> Thanks, good point.
>
> Probably not something anyone's actually hitting, but another sign
> this
> logic need rethinking.
>

I think it should be sufficient to let the laundromat skip that entry
and leave it on the list if the unhash_delegation_locked() fails, since
your fix should then be able to pick the delegation up and destroy it
safely.

We can keep the code in __destroy_client() and
nfs4_state_shutdown_net() unchanged, since those are presumably not
affected by this race.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-12-01 22:33:26

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Kernel panic / list_add corruption when in nfsd4_run_cb_work

On Wed, Nov 24, 2021 at 10:17:51PM +0000, Trond Myklebust wrote:
> On Wed, 2021-11-24 at 17:06 -0500, [email protected] wrote:
> > On Wed, Nov 24, 2021 at 05:14:53PM +0000, Trond Myklebust wrote:
> > > It is a little nasty that we hide the list_del() calls in several
> > > levels of function call, so they probably do deserve a comment.
> > >
> > > That said, if, as in the case here, the delegation was unhashed, we
> > > still end up not calling list_del_init() in
> > > unhash_delegation_locked(),
> > > and since the list_add() is not conditional on it being successful,
> > > the
> > > global list is again corrupted.
> > >
> > > Yes, it is an unlikely race, but it is possible despite your
> > > change.
> >
> > Thanks, good point.
> >
> > Probably not something anyone's actually hitting, but another sign
> > this
> > logic need rethinking.
> >
>
> I think it should be sufficient to let the laundromat skip that entry
> and leave it on the list if the unhash_delegation_locked() fails, since
> your fix should then be able to pick the delegation up and destroy it
> safely.
>
> We can keep the code in __destroy_client() and
> nfs4_state_shutdown_net() unchanged, since those are presumably not
> affected by this race.

I think simplest is just not to put the thing on the lru at all if it's
not hashed:

--b.

commit 5011f84ef05e
Author: J. Bruce Fields <[email protected]>
Date: Mon Nov 29 15:08:00 2021 -0500

nfsd: fix use-after-free due to delegation race

A delegation break could arrive as soon as we've called vfs_setlease. A
delegation break runs a callback which immediately (in
nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru. If we
then exit nfs4_set_delegation without hashing the delegation, it will be
freed as soon as the callback is done with it, without ever being
removed from del_recall_lru.

Symptoms show up later as use-after-free or list corruption warnings,
usually in the laundromat thread.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index bfad94c70b84..1956d377d1a6 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1207,6 +1207,11 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
return 0;
}

+static bool delegation_hashed(struct nfs4_delegation *dp)
+{
+ return !(list_empty(&dp->dl_perfile));
+}
+
static bool
unhash_delegation_locked(struct nfs4_delegation *dp)
{
@@ -1214,7 +1219,7 @@ unhash_delegation_locked(struct nfs4_delegation *dp)

lockdep_assert_held(&state_lock);

- if (list_empty(&dp->dl_perfile))
+ if (!delegation_hashed(dp))
return false;

dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
@@ -4598,7 +4603,7 @@ static void nfsd4_cb_recall_prepare(struct nfsd4_callback *cb)
* queued for a lease break. Don't queue it again.
*/
spin_lock(&state_lock);
- if (dp->dl_time == 0) {
+ if (delegation_hashed(dp) && dp->dl_time == 0) {
dp->dl_time = ktime_get_boottime_seconds();
list_add_tail(&dp->dl_recall_lru, &nn->del_recall_lru);
}