2014-05-20 08:46:04

by Veli-Matti Lintu

[permalink] [raw]
Subject: Soft lockups on kerberised NFSv4.0 clients

Hello,

We are seeing soft lockup call traces on kerberised NFSv4.0 clients acting as terminal servers serving multiple thin clients running graphical desktop sessions on NFS home directories. We do not have a simple way to reproduce the problem, but creating artificial load running multiple users running applications like Firefox, LibreOffice, GIMP, etc. usually works. On production systems high load has not been a requirement for soft lockups to happen, though. This happens both on KVM virtual machines and on real hardware.

NFS server kernels 3.10 - 3.15-rc5 have been tested and on NFS client kernels 3.12 - 3.15-rc5 have been tested.

The NFS clients do mounts only from a single NFS server and there are two mounts where the first one is done with auth=sys without an existing krb5.keytab. This results in SETCLIENTID being called with auth=sys. The user home directories are mounted with auth=krb5. Machine credentials are available when the krb5 mount is done. The dumps show that callbacks use auth=sys.

If the NFS mount is replaced with a CIFS mount, no soft lockups happen.

Turning off leases on NFS server has so far made the NFS clients stable and no soft lockups have happened. The leases were disabled with "echo 0 >/proc/sys/fs/leases-enable" before starting the NFS server.

Because reproducing the problem takes some time, dumpcap dumps are usually several gigabytes in size. In the dumps there is one consistent sequence that may tell something.

Shortly (< 1min) before the soft lockups appear, the NFS server responds to RENEW request with NFS4ERR_CB_PATH_DOWN, e.g.:

1171171 163.248014 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID: 0x34d1
1171172 163.248112 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In 1171171) RENEW
1182967 223.407973 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID: 0x34d1
1182968 223.408059 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In 1182967) RENEW
1223198 287.407968 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID: 0x34d1
1223199 287.408024 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In 1223198) RENEW
1841475 347.568113 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID: 0x34d1
1841477 347.568139 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In 1841475) RENEW Status: NFS4ERR_CB_PATH_DOWN
1841494 347.568913 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID: 0x34d1
1841495 347.568937 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In 1841494) RENEW Status: NFS4ERR_CB_PATH_DOWN

After this the NFS client returns all the delegations which can mean hundreds of DELEGRETURNS at once. Also a new SETCLIENTID call is done.

Before the NFS4ERR_CB_PATH_DOWN there seems to be a CB_RECALL where the client responds with NFS4ERR_BADHANDLE. I have seen also cases where the NFS server sends a CB_RECALL for a delegation that was already returned a few seconds before.

Quick test with Trond's nfsd-devel branch caused a lot of bad sequence id errors, so I could not run the same tests with that branch.

NFS debugging is not my expertise, so any advice on how to debug this further would be welcome. I'm more than willing to provide more information and do testing on this.


An example call trace:

[ 916.100013] BUG: soft lockup - CPU#3 stuck for 23s! [mozStorage #5:15492]
[ 916.100013] Modules linked in: cts nfsv4 cuse autofs4 deflate ctr twofish_generic twofish_i586 twofish_common camellia_generic serpent_sse2_i586 xts serpent_generic lrw gf128mul glue_helper ablk_helper cryptd blowfish_generic blowfish_common cast5_generic cast_common des_generic cmac xcbc rmd160 sha512_generic crypto_null af_key xfrm_algo i2c_piix4 microcode virtio_balloon serio_raw mac_hid rpcsec_gss_krb5 nfsd auth_rpcgss oid_registry nfs_acl nfs lockd parport_pc sunrpc ppdev fscache lp parport binfmt_misc overlayfs btrfs xor raid6_pq nbd psmouse e1000 floppy
[ 916.100013] CPU: 3 PID: 15492 Comm: mozStorage #5 Not tainted 3.15.0-rc5 #6
[ 916.100013] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 916.100013] task: e1ad28b0 ti: df6ba000 task.ti: df6ba000
[ 916.100013] EIP: 0060:[<c109423c>] EFLAGS: 00000282 CPU: 3
[ 916.100013] EIP is at prepare_to_wait+0x4c/0x80
[ 916.100013] EAX: 00000282 EBX: e1ad28b0 ECX: 00000082 EDX: 00000282
[ 916.100013] ESI: e7b4a658 EDI: df6bbe80 EBP: df6bbe64 ESP: df6bbe50
[ 916.100013] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 916.100013] CR0: 8005003b CR2: 08d30c30 CR3: 1f505000 CR4: 000006f0
[ 916.100013] Stack:
[ 916.100013] 00000082 00000082 df5d6a38 fffffe00 e7b4a658 df6bbe94 f854f167 df5d6a38
[ 916.100013] 00000000 00000000 e1ad28b0 c1094080 e7b4a65c e7b4a65c e0026540 df5d6a20
[ 916.100013] e0026540 df6bbeb4 f8545168 0000000d dfc56728 f74beb60 00000000 f74beb60
[ 916.100013] Call Trace:
[ 916.100013] [<f854f167>] nfs_iocounter_wait+0x87/0xb0 nfs
[ 916.100013] [<c1094080>] ? wake_atomic_t_function+0x50/0x50
[ 916.100013] [<f8545168>] do_unlk+0x48/0xb0 nfs
[ 916.100013] [<f85454b5>] nfs_lock+0x125/0x1a0 nfs
[ 916.100013] [<c10b5253>] ? ktime_get+0x53/0x120
[ 916.100013] [<f8545390>] ? nfs_flock+0xd0/0xd0 nfs
[ 916.100013] [<c11bcd9f>] vfs_lock_file+0x1f/0x50
[ 916.100013] [<c11bceb0>] do_lock_file_wait.part.19+0x30/0xb0
[ 916.100013] [<c164c05f>] ? __do_page_fault+0x21f/0x500
[ 916.100013] [<c11bdfd7>] fcntl_setlk64+0x107/0x210
[ 916.100013] [<c11870f2>] SyS_fcntl64+0xd2/0x100
[ 916.100013] [<c1648b8a>] syscall_call+0x7/0xb
[ 916.100013] [<c1640000>] ? add_new_disk+0x222/0x44b
[ 916.100013] Code: e8 4a 44 5b 00 8b 4d ec 3b 7b 0c 74 32 89 4d f0 8b 4d f0 64 8b 1d b4 6f a5 c1 87 0b 89 4d f0 8b 55 f0 89 c2 89 f0 e8 84 45 5b 00 <8b> 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 8d b4 26 00 00 00 00 8b


Veli-Matti


2014-05-20 14:21:22

by Jeff Layton

[permalink] [raw]
Subject: Re: Soft lockups on kerberised NFSv4.0 clients

On Tue, 20 May 2014 08:40:48 +0000 (UTC)
Veli-Matti Lintu <[email protected]> wrote:

> Hello,
>
> We are seeing soft lockup call traces on kerberised NFSv4.0 clients acting as terminal servers serving multiple thin clients running graphical desktop sessions on NFS home directories. We do not have a simple way to reproduce the problem, but creating artificial load running multiple users running applications like Firefox, LibreOffice, GIMP, etc. usually works. On production systems high load has not been a requirement for soft lockups to happen, though. This happens both on KVM virtual machines and on real hardware.
>
> NFS server kernels 3.10 - 3.15-rc5 have been tested and on NFS client kernels 3.12 - 3.15-rc5 have been tested.
>
> The NFS clients do mounts only from a single NFS server and there are two mounts where the first one is done with auth=sys without an existing krb5.keytab. This results in SETCLIENTID being called with auth=sys. The user home directories are mounted with auth=krb5. Machine credentials are available when the krb5 mount is done. The dumps show that callbacks use auth=sys.
>
> If the NFS mount is replaced with a CIFS mount, no soft lockups happen.
>
> Turning off leases on NFS server has so far made the NFS clients stable and no soft lockups have happened. The leases were disabled with "echo 0 >/proc/sys/fs/leases-enable" before starting the NFS server.
>
> Because reproducing the problem takes some time, dumpcap dumps are usually several gigabytes in size. In the dumps there is one consistent sequence that may tell something.
>
> Shortly (< 1min) before the soft lockups appear, the NFS server responds to RENEW request with NFS4ERR_CB_PATH_DOWN, e.g.:
>
> 1171171 163.248014 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID: 0x34d1
> 1171172 163.248112 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In 1171171) RENEW
> 1182967 223.407973 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID: 0x34d1
> 1182968 223.408059 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In 1182967) RENEW
> 1223198 287.407968 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID: 0x34d1
> 1223199 287.408024 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In 1223198) RENEW
> 1841475 347.568113 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID: 0x34d1
> 1841477 347.568139 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In 1841475) RENEW Status: NFS4ERR_CB_PATH_DOWN
> 1841494 347.568913 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID: 0x34d1
> 1841495 347.568937 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In 1841494) RENEW Status: NFS4ERR_CB_PATH_DOWN
>
> After this the NFS client returns all the delegations which can mean hundreds of DELEGRETURNS at once. Also a new SETCLIENTID call is done.
>
> Before the NFS4ERR_CB_PATH_DOWN there seems to be a CB_RECALL where the client responds with NFS4ERR_BADHANDLE. I have seen also cases where the NFS server sends a CB_RECALL for a delegation that was already returned a few seconds before.
>
> Quick test with Trond's nfsd-devel branch caused a lot of bad sequence id errors, so I could not run the same tests with that branch.
>
> NFS debugging is not my expertise, so any advice on how to debug this further would be welcome. I'm more than willing to provide more information and do testing on this.
>
>
> An example call trace:
>
> [ 916.100013] BUG: soft lockup - CPU#3 stuck for 23s! [mozStorage #5:15492]
> [ 916.100013] Modules linked in: cts nfsv4 cuse autofs4 deflate ctr twofish_generic twofish_i586 twofish_common camellia_generic serpent_sse2_i586 xts serpent_generic lrw gf128mul glue_helper ablk_helper cryptd blowfish_generic blowfish_common cast5_generic cast_common des_generic cmac xcbc rmd160 sha512_generic crypto_null af_key xfrm_algo i2c_piix4 microcode virtio_balloon serio_raw mac_hid rpcsec_gss_krb5 nfsd auth_rpcgss oid_registry nfs_acl nfs lockd parport_pc sunrpc ppdev fscache lp parport binfmt_misc overlayfs btrfs xor raid6_pq nbd psmouse e1000 floppy
> [ 916.100013] CPU: 3 PID: 15492 Comm: mozStorage #5 Not tainted 3.15.0-rc5 #6
> [ 916.100013] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [ 916.100013] task: e1ad28b0 ti: df6ba000 task.ti: df6ba000
> [ 916.100013] EIP: 0060:[<c109423c>] EFLAGS: 00000282 CPU: 3
> [ 916.100013] EIP is at prepare_to_wait+0x4c/0x80
> [ 916.100013] EAX: 00000282 EBX: e1ad28b0 ECX: 00000082 EDX: 00000282
> [ 916.100013] ESI: e7b4a658 EDI: df6bbe80 EBP: df6bbe64 ESP: df6bbe50
> [ 916.100013] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [ 916.100013] CR0: 8005003b CR2: 08d30c30 CR3: 1f505000 CR4: 000006f0
> [ 916.100013] Stack:
> [ 916.100013] 00000082 00000082 df5d6a38 fffffe00 e7b4a658 df6bbe94 f854f167 df5d6a38
> [ 916.100013] 00000000 00000000 e1ad28b0 c1094080 e7b4a65c e7b4a65c e0026540 df5d6a20
> [ 916.100013] e0026540 df6bbeb4 f8545168 0000000d dfc56728 f74beb60 00000000 f74beb60
> [ 916.100013] Call Trace:
> [ 916.100013] [<f854f167>] nfs_iocounter_wait+0x87/0xb0 nfs
> [ 916.100013] [<c1094080>] ? wake_atomic_t_function+0x50/0x50
> [ 916.100013] [<f8545168>] do_unlk+0x48/0xb0 nfs
> [ 916.100013] [<f85454b5>] nfs_lock+0x125/0x1a0 nfs
> [ 916.100013] [<c10b5253>] ? ktime_get+0x53/0x120
> [ 916.100013] [<f8545390>] ? nfs_flock+0xd0/0xd0 nfs
> [ 916.100013] [<c11bcd9f>] vfs_lock_file+0x1f/0x50
> [ 916.100013] [<c11bceb0>] do_lock_file_wait.part.19+0x30/0xb0
> [ 916.100013] [<c164c05f>] ? __do_page_fault+0x21f/0x500
> [ 916.100013] [<c11bdfd7>] fcntl_setlk64+0x107/0x210
> [ 916.100013] [<c11870f2>] SyS_fcntl64+0xd2/0x100
> [ 916.100013] [<c1648b8a>] syscall_call+0x7/0xb
> [ 916.100013] [<c1640000>] ? add_new_disk+0x222/0x44b
> [ 916.100013] Code: e8 4a 44 5b 00 8b 4d ec 3b 7b 0c 74 32 89 4d f0 8b 4d f0 64 8b 1d b4 6f a5 c1 87 0b 89 4d f0 8b 55 f0 89 c2 89 f0 e8 84 45 5b 00 <8b> 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 8d b4 26 00 00 00 00 8b
>


I hit some problems a while back with kerberized NFSv4.0 callbacks. You
may want to try these patches on the client that I posted in early
April:

[PATCH 0/3] nfs: fix v4.0 callback channel auth failures

AFAIK, Trond hasn't merged those yet, but hopefully they'll make v3.16.

There's also a companion nfs-utils patchset as well that has been
merged into upstream nfs-utils:

[PATCH v2 0/6] gssd: add the GSSAPI acceptor name to the info passed in downcall

Cheers,
--
Jeff Layton <[email protected]>

2014-05-21 14:55:10

by Veli-Matti Lintu

[permalink] [raw]
Subject: Re: Soft lockups on kerberised NFSv4.0 clients


----- Original Message -----
> From: "Jeff Layton" <[email protected]>
> To: "Veli-Matti Lintu" <[email protected]>
> Cc: [email protected], "Tuomas Räsänen" <[email protected]>
> Sent: Tuesday, May 20, 2014 5:21:17 PM
> Subject: Re: Soft lockups on kerberised NFSv4.0 clients
>
> On Tue, 20 May 2014 08:40:48 +0000 (UTC)
> Veli-Matti Lintu <[email protected]> wrote:
>
> > Hello,
> >
> > We are seeing soft lockup call traces on kerberised NFSv4.0 clients acting
> > as terminal servers serving multiple thin clients running graphical
> > desktop sessions on NFS home directories. We do not have a simple way to
> > reproduce the problem, but creating artificial load running multiple users
> > running applications like Firefox, LibreOffice, GIMP, etc. usually works.
> > On production systems high load has not been a requirement for soft
> > lockups to happen, though. This happens both on KVM virtual machines and
> > on real hardware.
> >
> > NFS server kernels 3.10 - 3.15-rc5 have been tested and on NFS client
> > kernels 3.12 - 3.15-rc5 have been tested.
> >
> > The NFS clients do mounts only from a single NFS server and there are two
> > mounts where the first one is done with auth=sys without an existing
> > krb5.keytab. This results in SETCLIENTID being called with auth=sys. The
> > user home directories are mounted with auth=krb5. Machine credentials are
> > available when the krb5 mount is done. The dumps show that callbacks use
> > auth=sys.
> >
> > If the NFS mount is replaced with a CIFS mount, no soft lockups happen.
> >
> > Turning off leases on NFS server has so far made the NFS clients stable and
> > no soft lockups have happened. The leases were disabled with "echo 0
> > >/proc/sys/fs/leases-enable" before starting the NFS server.
> >
> > Because reproducing the problem takes some time, dumpcap dumps are usually
> > several gigabytes in size. In the dumps there is one consistent sequence
> > that may tell something.
> >
> > Shortly (< 1min) before the soft lockups appear, the NFS server responds to
> > RENEW request with NFS4ERR_CB_PATH_DOWN, e.g.:
> >
> > 1171171 163.248014 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID:
> > 0x34d1
> > 1171172 163.248112 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In
> > 1171171) RENEW
> > 1182967 223.407973 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID:
> > 0x34d1
> > 1182968 223.408059 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In
> > 1182967) RENEW
> > 1223198 287.407968 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID:
> > 0x34d1
> > 1223199 287.408024 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In
> > 1223198) RENEW
> > 1841475 347.568113 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID:
> > 0x34d1
> > 1841477 347.568139 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In
> > 1841475) RENEW Status: NFS4ERR_CB_PATH_DOWN
> > 1841494 347.568913 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID:
> > 0x34d1
> > 1841495 347.568937 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In
> > 1841494) RENEW Status: NFS4ERR_CB_PATH_DOWN
> >
> > After this the NFS client returns all the delegations which can mean
> > hundreds of DELEGRETURNS at once. Also a new SETCLIENTID call is done.
> >
> > Before the NFS4ERR_CB_PATH_DOWN there seems to be a CB_RECALL where the
> > client responds with NFS4ERR_BADHANDLE. I have seen also cases where the
> > NFS server sends a CB_RECALL for a delegation that was already returned a
> > few seconds before.
> >
> > Quick test with Trond's nfsd-devel branch caused a lot of bad sequence id
> > errors, so I could not run the same tests with that branch.
> >
> > NFS debugging is not my expertise, so any advice on how to debug this
> > further would be welcome. I'm more than willing to provide more
> > information and do testing on this.
> >
> >
> > An example call trace:
> >
> > [ 916.100013] BUG: soft lockup - CPU#3 stuck for 23s! [mozStorage #5:15492]
> > [ 916.100013] Modules linked in: cts nfsv4 cuse autofs4 deflate ctr
> > twofish_generic twofish_i586 twofish_common camellia_generic
> > serpent_sse2_i586 xts serpent_generic lrw gf128mul glue_helper ablk_helper
> > cryptd blowfish_generic blowfish_common cast5_generic cast_common
> > des_generic cmac xcbc rmd160 sha512_generic crypto_null af_key xfrm_algo
> > i2c_piix4 microcode virtio_balloon serio_raw mac_hid rpcsec_gss_krb5 nfsd
> > auth_rpcgss oid_registry nfs_acl nfs lockd parport_pc sunrpc ppdev fscache
> > lp parport binfmt_misc overlayfs btrfs xor raid6_pq nbd psmouse e1000
> > floppy
> > [ 916.100013] CPU: 3 PID: 15492 Comm: mozStorage #5 Not tainted 3.15.0-rc5
> > #6
> > [ 916.100013] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> > [ 916.100013] task: e1ad28b0 ti: df6ba000 task.ti: df6ba000
> > [ 916.100013] EIP: 0060:[<c109423c>] EFLAGS: 00000282 CPU: 3
> > [ 916.100013] EIP is at prepare_to_wait+0x4c/0x80
> > [ 916.100013] EAX: 00000282 EBX: e1ad28b0 ECX: 00000082 EDX: 00000282
> > [ 916.100013] ESI: e7b4a658 EDI: df6bbe80 EBP: df6bbe64 ESP: df6bbe50
> > [ 916.100013] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > [ 916.100013] CR0: 8005003b CR2: 08d30c30 CR3: 1f505000 CR4: 000006f0
> > [ 916.100013] Stack:
> > [ 916.100013] 00000082 00000082 df5d6a38 fffffe00 e7b4a658 df6bbe94
> > f854f167 df5d6a38
> > [ 916.100013] 00000000 00000000 e1ad28b0 c1094080 e7b4a65c e7b4a65c
> > e0026540 df5d6a20
> > [ 916.100013] e0026540 df6bbeb4 f8545168 0000000d dfc56728 f74beb60
> > 00000000 f74beb60
> > [ 916.100013] Call Trace:
> > [ 916.100013] [<f854f167>] nfs_iocounter_wait+0x87/0xb0 nfs
> > [ 916.100013] [<c1094080>] ? wake_atomic_t_function+0x50/0x50
> > [ 916.100013] [<f8545168>] do_unlk+0x48/0xb0 nfs
> > [ 916.100013] [<f85454b5>] nfs_lock+0x125/0x1a0 nfs
> > [ 916.100013] [<c10b5253>] ? ktime_get+0x53/0x120
> > [ 916.100013] [<f8545390>] ? nfs_flock+0xd0/0xd0 nfs
> > [ 916.100013] [<c11bcd9f>] vfs_lock_file+0x1f/0x50
> > [ 916.100013] [<c11bceb0>] do_lock_file_wait.part.19+0x30/0xb0
> > [ 916.100013] [<c164c05f>] ? __do_page_fault+0x21f/0x500
> > [ 916.100013] [<c11bdfd7>] fcntl_setlk64+0x107/0x210
> > [ 916.100013] [<c11870f2>] SyS_fcntl64+0xd2/0x100
> > [ 916.100013] [<c1648b8a>] syscall_call+0x7/0xb
> > [ 916.100013] [<c1640000>] ? add_new_disk+0x222/0x44b
> > [ 916.100013] Code: e8 4a 44 5b 00 8b 4d ec 3b 7b 0c 74 32 89 4d f0 8b 4d
> > f0 64 8b 1d b4 6f a5 c1 87 0b 89 4d f0 8b 55 f0 89 c2 89 f0 e8 84 45 5b 00
> > <8b> 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 8d b4 26 00 00 00 00 8b

> I hit some problems a while back with kerberized NFSv4.0 callbacks. You
> may want to try these patches on the client that I posted in early
> April:
>
> [PATCH 0/3] nfs: fix v4.0 callback channel auth failures
>
> AFAIK, Trond hasn't merged those yet, but hopefully they'll make v3.16.
>
> There's also a companion nfs-utils patchset as well that has been
> merged into upstream nfs-utils:
>
> [PATCH v2 0/6] gssd: add the GSSAPI acceptor name to the info passed in
> downcall


We updated both NFS server and client to 3.15-rc5 + above patches and also
updated nfs-utils to newest git master from here:

git://git.linux-nfs.org/projects/steved/nfs-utils.git

nfs-utils needed a little patch for mounts to work in our systems (without
this the last write_buffer would always fail):


--- nfs-utils.orig/utils/gssd/gssd_proc.c 2014-05-21 17:35:18.429226526 +0300
+++ nfs-utils/utils/gssd/gssd_proc.c 2014-05-21 17:35:55.577246480 +0300
@@ -696,7 +696,7 @@
buf_size = sizeof(uid) + sizeof(timeout) + sizeof(pd->pd_seq_win) +
sizeof(pd->pd_ctx_hndl.length) + pd->pd_ctx_hndl.length +
sizeof(context_token->length) + context_token->length +
- acceptor->length;
+ sizeof(acceptor->length) + acceptor->length;
p = buf = malloc(buf_size);
if (!buf)
goto out_err;



There was no change to behaviour with these patches and the call trace
was the same. So it seems like it does not matter whether callbacks
use auth=sys or auth=krb5 for the soft lockups to happen.

Veli-Matti

2014-05-21 20:53:10

by Jeff Layton

[permalink] [raw]
Subject: Re: Soft lockups on kerberised NFSv4.0 clients

On Wed, 21 May 2014 14:55:49 +0000 (UTC)
Veli-Matti Lintu <[email protected]> wrote:

>
> ----- Original Message -----
> > From: "Jeff Layton" <[email protected]>
> > To: "Veli-Matti Lintu" <[email protected]>
> > Cc: [email protected], "Tuomas R?s?nen" <[email protected]>
> > Sent: Tuesday, May 20, 2014 5:21:17 PM
> > Subject: Re: Soft lockups on kerberised NFSv4.0 clients
> >
> > On Tue, 20 May 2014 08:40:48 +0000 (UTC)
> > Veli-Matti Lintu <[email protected]> wrote:
> >
> > > Hello,
> > >
> > > We are seeing soft lockup call traces on kerberised NFSv4.0 clients acting
> > > as terminal servers serving multiple thin clients running graphical
> > > desktop sessions on NFS home directories. We do not have a simple way to
> > > reproduce the problem, but creating artificial load running multiple users
> > > running applications like Firefox, LibreOffice, GIMP, etc. usually works.
> > > On production systems high load has not been a requirement for soft
> > > lockups to happen, though. This happens both on KVM virtual machines and
> > > on real hardware.
> > >
> > > NFS server kernels 3.10 - 3.15-rc5 have been tested and on NFS client
> > > kernels 3.12 - 3.15-rc5 have been tested.
> > >
> > > The NFS clients do mounts only from a single NFS server and there are two
> > > mounts where the first one is done with auth=sys without an existing
> > > krb5.keytab. This results in SETCLIENTID being called with auth=sys. The
> > > user home directories are mounted with auth=krb5. Machine credentials are
> > > available when the krb5 mount is done. The dumps show that callbacks use
> > > auth=sys.
> > >
> > > If the NFS mount is replaced with a CIFS mount, no soft lockups happen.
> > >
> > > Turning off leases on NFS server has so far made the NFS clients stable and
> > > no soft lockups have happened. The leases were disabled with "echo 0
> > > >/proc/sys/fs/leases-enable" before starting the NFS server.
> > >
> > > Because reproducing the problem takes some time, dumpcap dumps are usually
> > > several gigabytes in size. In the dumps there is one consistent sequence
> > > that may tell something.
> > >
> > > Shortly (< 1min) before the soft lockups appear, the NFS server responds to
> > > RENEW request with NFS4ERR_CB_PATH_DOWN, e.g.:
> > >
> > > 1171171 163.248014 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID:
> > > 0x34d1
> > > 1171172 163.248112 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In
> > > 1171171) RENEW
> > > 1182967 223.407973 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID:
> > > 0x34d1
> > > 1182968 223.408059 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In
> > > 1182967) RENEW
> > > 1223198 287.407968 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID:
> > > 0x34d1
> > > 1223199 287.408024 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In
> > > 1223198) RENEW
> > > 1841475 347.568113 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID:
> > > 0x34d1
> > > 1841477 347.568139 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In
> > > 1841475) RENEW Status: NFS4ERR_CB_PATH_DOWN
> > > 1841494 347.568913 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID:
> > > 0x34d1
> > > 1841495 347.568937 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In
> > > 1841494) RENEW Status: NFS4ERR_CB_PATH_DOWN
> > >
> > > After this the NFS client returns all the delegations which can mean
> > > hundreds of DELEGRETURNS at once. Also a new SETCLIENTID call is done.
> > >
> > > Before the NFS4ERR_CB_PATH_DOWN there seems to be a CB_RECALL where the
> > > client responds with NFS4ERR_BADHANDLE. I have seen also cases where the
> > > NFS server sends a CB_RECALL for a delegation that was already returned a
> > > few seconds before.
> > >
> > > Quick test with Trond's nfsd-devel branch caused a lot of bad sequence id
> > > errors, so I could not run the same tests with that branch.
> > >
> > > NFS debugging is not my expertise, so any advice on how to debug this
> > > further would be welcome. I'm more than willing to provide more
> > > information and do testing on this.
> > >
> > >
> > > An example call trace:
> > >
> > > [ 916.100013] BUG: soft lockup - CPU#3 stuck for 23s! [mozStorage #5:15492]
> > > [ 916.100013] Modules linked in: cts nfsv4 cuse autofs4 deflate ctr
> > > twofish_generic twofish_i586 twofish_common camellia_generic
> > > serpent_sse2_i586 xts serpent_generic lrw gf128mul glue_helper ablk_helper
> > > cryptd blowfish_generic blowfish_common cast5_generic cast_common
> > > des_generic cmac xcbc rmd160 sha512_generic crypto_null af_key xfrm_algo
> > > i2c_piix4 microcode virtio_balloon serio_raw mac_hid rpcsec_gss_krb5 nfsd
> > > auth_rpcgss oid_registry nfs_acl nfs lockd parport_pc sunrpc ppdev fscache
> > > lp parport binfmt_misc overlayfs btrfs xor raid6_pq nbd psmouse e1000
> > > floppy
> > > [ 916.100013] CPU: 3 PID: 15492 Comm: mozStorage #5 Not tainted 3.15.0-rc5
> > > #6
> > > [ 916.100013] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> > > [ 916.100013] task: e1ad28b0 ti: df6ba000 task.ti: df6ba000
> > > [ 916.100013] EIP: 0060:[<c109423c>] EFLAGS: 00000282 CPU: 3
> > > [ 916.100013] EIP is at prepare_to_wait+0x4c/0x80
> > > [ 916.100013] EAX: 00000282 EBX: e1ad28b0 ECX: 00000082 EDX: 00000282
> > > [ 916.100013] ESI: e7b4a658 EDI: df6bbe80 EBP: df6bbe64 ESP: df6bbe50
> > > [ 916.100013] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > [ 916.100013] CR0: 8005003b CR2: 08d30c30 CR3: 1f505000 CR4: 000006f0
> > > [ 916.100013] Stack:
> > > [ 916.100013] 00000082 00000082 df5d6a38 fffffe00 e7b4a658 df6bbe94
> > > f854f167 df5d6a38
> > > [ 916.100013] 00000000 00000000 e1ad28b0 c1094080 e7b4a65c e7b4a65c
> > > e0026540 df5d6a20
> > > [ 916.100013] e0026540 df6bbeb4 f8545168 0000000d dfc56728 f74beb60
> > > 00000000 f74beb60
> > > [ 916.100013] Call Trace:
> > > [ 916.100013] [<f854f167>] nfs_iocounter_wait+0x87/0xb0 nfs
> > > [ 916.100013] [<c1094080>] ? wake_atomic_t_function+0x50/0x50
> > > [ 916.100013] [<f8545168>] do_unlk+0x48/0xb0 nfs
> > > [ 916.100013] [<f85454b5>] nfs_lock+0x125/0x1a0 nfs
> > > [ 916.100013] [<c10b5253>] ? ktime_get+0x53/0x120
> > > [ 916.100013] [<f8545390>] ? nfs_flock+0xd0/0xd0 nfs
> > > [ 916.100013] [<c11bcd9f>] vfs_lock_file+0x1f/0x50
> > > [ 916.100013] [<c11bceb0>] do_lock_file_wait.part.19+0x30/0xb0
> > > [ 916.100013] [<c164c05f>] ? __do_page_fault+0x21f/0x500
> > > [ 916.100013] [<c11bdfd7>] fcntl_setlk64+0x107/0x210
> > > [ 916.100013] [<c11870f2>] SyS_fcntl64+0xd2/0x100
> > > [ 916.100013] [<c1648b8a>] syscall_call+0x7/0xb
> > > [ 916.100013] [<c1640000>] ? add_new_disk+0x222/0x44b
> > > [ 916.100013] Code: e8 4a 44 5b 00 8b 4d ec 3b 7b 0c 74 32 89 4d f0 8b 4d
> > > f0 64 8b 1d b4 6f a5 c1 87 0b 89 4d f0 8b 55 f0 89 c2 89 f0 e8 84 45 5b 00
> > > <8b> 5d f4 8b 75 f8 8b 7d fc 89 ec 5d c3 8d b4 26 00 00 00 00 8b
>
> > I hit some problems a while back with kerberized NFSv4.0 callbacks. You
> > may want to try these patches on the client that I posted in early
> > April:
> >
> > [PATCH 0/3] nfs: fix v4.0 callback channel auth failures
> >
> > AFAIK, Trond hasn't merged those yet, but hopefully they'll make v3.16.
> >
> > There's also a companion nfs-utils patchset as well that has been
> > merged into upstream nfs-utils:
> >
> > [PATCH v2 0/6] gssd: add the GSSAPI acceptor name to the info passed in
> > downcall
>
>
> We updated both NFS server and client to 3.15-rc5 + above patches and also
> updated nfs-utils to newest git master from here:
>
> git://git.linux-nfs.org/projects/steved/nfs-utils.git
>
> nfs-utils needed a little patch for mounts to work in our systems (without
> this the last write_buffer would always fail):
>
>
> --- nfs-utils.orig/utils/gssd/gssd_proc.c 2014-05-21 17:35:18.429226526 +0300
> +++ nfs-utils/utils/gssd/gssd_proc.c 2014-05-21 17:35:55.577246480 +0300
> @@ -696,7 +696,7 @@
> buf_size = sizeof(uid) + sizeof(timeout) + sizeof(pd->pd_seq_win) +
> sizeof(pd->pd_ctx_hndl.length) + pd->pd_ctx_hndl.length +
> sizeof(context_token->length) + context_token->length +
> - acceptor->length;
> + sizeof(acceptor->length) + acceptor->length;
> p = buf = malloc(buf_size);
> if (!buf)
> goto out_err;
>
>

Well spotted. Care to spin up an "official" patch for that and send it
to [email protected] (and please cc linux-nfs as well)?

>
> There was no change to behaviour with these patches and the call trace
> was the same. So it seems like it does not matter whether callbacks
> use auth=sys or auth=krb5 for the soft lockups to happen.

Ok, now that I look closer at your stack trace the problem appears to
be that the unlock code is waiting for the lock context's io_count to
drop to zero before allowing the unlock to proceed.

That likely means that there is some outstanding I/O that isn't
completing, but it's possible that the problem is the CB_RECALL is
being ignored. This will probably require some analysis of wire captures.

In your earlier mail, you mentioned that the client was responding to
the CB_RECALL with NFS4ERR_BADHANDLE. Determining why that's happening
may be the best place to focus your efforts.

Now that I look, nfs4_callback_recall does this:

res = htonl(NFS4ERR_BADHANDLE);
inode = nfs_delegation_find_inode(cps->clp, &args->fh);
if (inode == NULL)
goto out;

So it looks like it's not finding the delegation for some reason.
You'll probably need to hunt down which open gave you the delegation in
the first place and then sanity check the CB_RECALL request to
determine whether it's the client or server that's insane here...

--
Jeff Layton <[email protected]>

2014-06-02 10:01:13

by Tuomas Räsänen

[permalink] [raw]
Subject: Re: Soft lockups on kerberised NFSv4.0 clients

----- Original Message -----
> From: "Jeff Layton" <[email protected]>
>
.
.
.
> Ok, now that I look closer at your stack trace the problem appears to
> be that the unlock code is waiting for the lock context's io_count to
> drop to zero before allowing the unlock to proceed.
>
> That likely means that there is some outstanding I/O that isn't
> completing, but it's possible that the problem is the CB_RECALL is
> being ignored. This will probably require some analysis of wire captures.
>
> In your earlier mail, you mentioned that the client was responding to
> the CB_RECALL with NFS4ERR_BADHANDLE. Determining why that's happening
> may be the best place to focus your efforts.
>
> Now that I look, nfs4_callback_recall does this:
>
> res = htonl(NFS4ERR_BADHANDLE);
> inode = nfs_delegation_find_inode(cps->clp, &args->fh);
> if (inode == NULL)
> goto out;
>
> So it looks like it's not finding the delegation for some reason.
> You'll probably need to hunt down which open gave you the delegation in
> the first place and then sanity check the CB_RECALL request to
> determine whether it's the client or server that's insane here...
>

Speaking of insanity, I'll try to describe some of our findings in hope someone helps us to get a better grasp of the issue.

OPEN requests seem valid to me, there does not seem be any real difference between with OPENs granting RECALLable delegations and OPENs granting delegations which cause BADHANDLEs to be returned when RECALLed. I don't have any ideas what to look for.. probably been staring at capturelogs for too long...

BADHANDLE resposes to CB_RECALLs seem to be fairly common in our environment and there is not clear link between those and the softlockups described describer earlier by Veli-Matti. BADHANDLEs can happen multiple times before the first softlockup. After the first softlockup, the system keeps experiencing lockups (with various tracebacks) with an increasing speed, so I guess only the very first trace is meaningful. And the very first traceback seems to always be the traceback posted by Veli-Matti in his first email.

The BADHANDLE situation is also quite volatile: if nfs_delegation_find_inode() is called again, a bit later, before returning from nfs4_callback_recall(), it returns a valid inode instead of NULL. What does this indicate? Somehow related to the nature of RCU?

--
Tuomas

2014-06-17 13:49:45

by Tuomas Räsänen

[permalink] [raw]
Subject: Re: Soft lockups on kerberised NFSv4.0 clients

----- Original Message -----
> From: "Tuomas Räsänen" <[email protected]>
>
> The lockup mechnism seems to be as follows: the process (which is always
> firefox) is killed, and it tries to unlock the file (which is always a
> mmapped sqlite3 WAL index) which still has some pending IOs going on. The
> return value of nfs_wait_bit_killable() (-ERESTARTSYS from
> fatal_signal_pending(current)) is ignored and the process just keeps looṕing
> because io_count seems to be stuck at 1 (I still don't know why..).

I wrote a simple program which simulates the behavior described above
and causes softlockups (see the bottom of the file).

Here's what it does:
- creates and opens jamfile.dat (10M)
- locks the file with flock
- spawns N threads which all:
- mmap the whole file and write to the map
- unlocks the file after spawning threads

Sometimes unlocking flock() blocks for a while, waiting for pending
IOs [*]. If the process is killed during unlock (signaled SIGINT before the
program has printed 'unlock ok'), it seems to get stuck: pending IOs are
not finished and -ERESTARTSYS from nfs_wait_bit_killable() is not
handled, causing the task to loop inside __nfs_iocounter_wait()
indefinitely.

How to cause soft lockups:

1. Compile: gcc -pthread -o jam jam.c

2. Run ./jam

3. Press C-c shortly after running the script, after 'unlock' but before
'unlock ok' is printed

4. You might need to repeat steps 2. and 3. couple of times

[*]: Sometimes flock() seem to block for *very* long time (for ever?),
but sometimes only for a short period of time. But regarding this
problem, it does not matter: whenever the task is killed during the
unlock, the process freezes.

Applying the patch from my previous mail fixes the soft lockup issue,
because the task does not get into a infinite (or at least indefinite)
loop because interruptible wait_on_bit() is used instead. But what are
its side-effects? Is it completely brain-dead idea?

jam.c:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/file.h>
#include <sys/mman.h>
#include <unistd.h>

#define MAP_SIZE (sizeof(char) * 1024 * 1024 * 10)
#define THREADS 4

void *work_on_file(void *const arg)
{
int i;
int fd;
char *map;

fd = *((int *) arg);
map = (char *) mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

printf("write begins\n");
for (i = 0; i < MAP_SIZE; ++i) {
map[i] = 'a';
}
printf("write ends\n");

return NULL;
}

int main(void)
{
int i;
pthread_t *threads;
int fd;

fd = open("jamfile.dat", O_RDWR | O_CREAT);
ftruncate(fd, MAP_SIZE);

threads = malloc(sizeof(pthread_t) * THREADS);

printf("lock\n");
if (flock(fd, LOCK_EX) == -1) {
perror("failed to lock");
return -1;
}
printf("lock ok\n");

for (i = 0; i < THREADS; ++i) {
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_create(&threads[i], &attr, &work_on_file, &fd);
pthread_attr_destroy(&attr);
}

printf("unlock\n");
if (flock(fd, LOCK_UN) == -1) {
perror("failed to unlock");
return -1;
}
printf("unlock ok\n");

for (i = 0; i < THREADS; ++i) {
pthread_join(threads[i], NULL);
}

free(threads);

return close(fd);
}

--
Tuomas

2014-06-02 19:09:17

by Veli-Matti Lintu

[permalink] [raw]
Subject: Re: Soft lockups on kerberised NFSv4.0 clients


----- Original Message -----
> From: "Tuomas Räsänen" <[email protected]>
>
> ----- Original Message -----
> > From: "Jeff Layton" <[email protected]>
> .
> > Ok, now that I look closer at your stack trace the problem appears to
> > be that the unlock code is waiting for the lock context's io_count to
> > drop to zero before allowing the unlock to proceed.
> >
> > That likely means that there is some outstanding I/O that isn't
> > completing, but it's possible that the problem is the CB_RECALL is
> > being ignored. This will probably require some analysis of wire captures.
> >
> > In your earlier mail, you mentioned that the client was responding to
> > the CB_RECALL with NFS4ERR_BADHANDLE. Determining why that's happening
> > may be the best place to focus your efforts.
> >
> > Now that I look, nfs4_callback_recall does this:
> >
> > res = htonl(NFS4ERR_BADHANDLE);
> > inode = nfs_delegation_find_inode(cps->clp, &args->fh);
> > if (inode == NULL)
> > goto out;
> >
> > So it looks like it's not finding the delegation for some reason.
> > You'll probably need to hunt down which open gave you the delegation in
> > the first place and then sanity check the CB_RECALL request to
> > determine whether it's the client or server that's insane here...
> >
>
> Speaking of insanity, I'll try to describe some of our findings in hope
> someone helps us to get a better grasp of the issue.
>
> OPEN requests seem valid to me, there does not seem be any real difference
> between with OPENs granting RECALLable delegations and OPENs granting
> delegations which cause BADHANDLEs to be returned when RECALLed. I don't
> have any ideas what to look for.. probably been staring at capturelogs for
> too long...
>
> BADHANDLE resposes to CB_RECALLs seem to be fairly common in our environment
> and there is not clear link between those and the softlockups described
> describer earlier by Veli-Matti. BADHANDLEs can happen multiple times before
> the first softlockup. After the first softlockup, the system keeps
> experiencing lockups (with various tracebacks) with an increasing speed, so
> I guess only the very first trace is meaningful. And the very first
> traceback seems to always be the traceback posted by Veli-Matti in his first
> email.
>
> The BADHANDLE situation is also quite volatile: if
> nfs_delegation_find_inode() is called again, a bit later, before returning
> from nfs4_callback_recall(), it returns a valid inode instead of NULL. What
> does this indicate? Somehow related to the nature of RCU?

Adding to what Tuomas just wrote, there seem to be different sequences
that result in BADHANDLEs. Just by looking at the dumpcap dumps, it
looks like they all happen close to sillyrenames if I read them
correctly.

Here's one where the delegation is returned before first CB_RECALL from
the server. Both CB_RECALLs result in NFS4ERR_BADHANDLE as the client
has already returned the delegation and if I understand correctly there
seems to be no code in nfsd to check whether the recalled delegation is
still valid on the server when BADHANDLE is received. Because of this
nfsd4_cb_recall_done() retries the request once with 2sec delay and then
marks the callback channel to be down. When the client next time sends
RENEW, it results in NFS4ERR_CB_PATH_DOWN that triggers all the
delegations to be returned to the server.


static void nfsd4_cb_recall_done(struct rpc_task *task, void *calldata)
{
...
if (dp->dl_retries--) {
rpc_delay(task, 2*HZ);
task->tk_status = 0;
rpc_restart_call_prepare(task);
return;
}
nfsd4_mark_cb_down(clp, task->tk_status);
cb->cb_done = true;
}

15387748 12:01:08.759023 10.249.0.2 -> 10.249.15.254 NFS 254 V4 Call DELEGRETURN StateID:0xe21e
15387751 12:01:08.759077 10.249.15.254 -> 10.249.0.2 NFS 214 V4 Reply (Call In 15387748) DELEGRETURN
15387757 12:01:08.759158 10.249.15.254 -> 10.249.0.2 NFS 226 V1 CB_COMPOUND Call <EMPTY> CB_RECALL
15387770 12:01:08.759541 10.249.0.2 -> 10.249.15.254 NFS 114 V1 CB_COMPOUND Reply (Call In 15387757) <EMPTY> CB_RECALL
15467475 12:01:10.763189 10.249.15.254 -> 10.249.0.2 NFS 226 V1 CB_COMPOUND Call <EMPTY> CB_RECALL
15467510 12:01:10.764042 10.249.0.2 -> 10.249.15.254 NFS 114 V1 CB_COMPOUND Reply (Call In 15467475) <EMPTY> CB_RECALL
16412787 12:01:40.362237 10.249.0.2 -> 10.249.15.254 NFS 174 V4 Call RENEW CID: 0x7cff
16412788 12:01:40.362266 10.249.15.254 -> 10.249.0.2 NFS 114 V4 Reply (Call In 16412787) RENEW Status: NFS4ERR_CB_PATH_DOWN

And as Tuomas wrote, there is no clear link between these and the
softlockups, but recalling all delegations at once makes this a suspect
in my untrained eyes.


Happy hacking!

Veli-Matti

2014-06-09 10:10:03

by Tuomas Räsänen

[permalink] [raw]
Subject: Re: Soft lockups on kerberised NFSv4.0 clients

----- Original Message -----
> From: "Jeff Layton" <[email protected]>
>
> Ok, now that I look closer at your stack trace the problem appears to
> be that the unlock code is waiting for the lock context's io_count to
> drop to zero before allowing the unlock to proceed.
>
> That likely means that there is some outstanding I/O that isn't
> completing, but it's possible that the problem is the CB_RECALL is
> being ignored. This will probably require some analysis of wire captures.

Examining wire captures did not lead to anything concrete, so I tried to approach the problem from another angle, by examining the actual symptom, the softlockup. As you said, the stack trace seems to indicate that there are some pending IOs on the file the process is trying to unlock and the process keeps spinning inside __nfs_iocounter_wait().

static int
__nfs_iocounter_wait(struct nfs_io_counter *c)
{
wait_queue_head_t *wq = bit_waitqueue(&c->flags, NFS_IO_INPROGRESS);
DEFINE_WAIT_BIT(q, &c->flags, NFS_IO_INPROGRESS);
int ret = 0;

do {
prepare_to_wait(wq, &q.wait, TASK_KILLABLE);
set_bit(NFS_IO_INPROGRESS, &c->flags);
if (atomic_read(&c->io_count) == 0)
break;
ret = nfs_wait_bit_killable(&c->flags);
} while (atomic_read(&c->io_count) != 0);
finish_wait(wq, &q.wait);
return ret;
}

The lockup mechnism seems to be as follows: the process (which is always firefox) is killed, and it tries to unlock the file (which is always a mmapped sqlite3 WAL index) which still has some pending IOs going on. The return value of nfs_wait_bit_killable() (-ERESTARTSYS from fatal_signal_pending(current)) is ignored and the process just keeps looṕing because io_count seems to be stuck at 1 (I still don't know why..). This raised few questions:

Why the return value of nfs_wait_bit_killable() is not handled? Should it be handled and if yes, how?

Why the whole iocounter wait is not just implemented using wait_on_bit()?

I changed do_unlk() to use wait_on_bit() instead of nfs_iocounter_wait() and softlockups seem to have disappeared:

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 284ca90..eb41b32 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -781,7 +781,11 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)

l_ctx = nfs_get_lock_context(nfs_file_open_context(filp));
if (!IS_ERR(l_ctx)) {
- status = nfs_iocounter_wait(&l_ctx->io_count);
+ struct nfs_io_counter *io_count = &l_ctx->io_count;
+ status = wait_on_bit(&io_count->flags,
+ NFS_IO_INPROGRESS,
+ nfs_wait_bit_killable,
+ TASK_KILLABLE);
nfs_put_lock_context(l_ctx);
if (status < 0)
return status;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 2ffebf2..6b9089c 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -87,6 +87,7 @@ nfs_page_free(struct nfs_page *p)
static void
nfs_iocounter_inc(struct nfs_io_counter *c)
{
+ set_bit(NFS_IO_INPROGRESS, &c->flags);
atomic_inc(&c->io_count);
}

Any thoughts? I really want to understand the issue at hand and to help fixing it properly.

--
Tuomas

2014-09-03 07:09:12

by Tuomas Räsänen

[permalink] [raw]
Subject: Re: Soft lockups on kerberised NFSv4.0 clients

----- Original Message -----
> From: "Tuomas Räsänen" <[email protected]>
>
> ----- Original Message -----
> > From: "Jeff Layton" <[email protected]>
> >
> > Ok, now that I look closer at your stack trace the problem appears to
> > be that the unlock code is waiting for the lock context's io_count to
> > drop to zero before allowing the unlock to proceed.
> >
> > That likely means that there is some outstanding I/O that isn't
> > completing, but it's possible that the problem is the CB_RECALL is
> > being ignored. This will probably require some analysis of wire captures.
>
> The lockup mechnism seems to be as follows: the process (which is always
> firefox) is killed, and it tries to unlock the file (which is always a
> mmapped sqlite3 WAL index) which still has some pending IOs going on. The
> return value of nfs_wait_bit_killable() (-ERESTARTSYS from
> fatal_signal_pending(current)) is ignored and the process just keeps looṕing
> because io_count seems to be stuck at 1 (I still don't know why..). This
> raised few questions:
>
> Why the return value of nfs_wait_bit_killable() is not handled? Should it be
> handled and if yes, how?
>
> Why the whole iocounter wait is not just implemented using wait_on_bit()?
>
> I changed do_unlk() to use wait_on_bit() instead of nfs_iocounter_wait() and
> softlockups seem to have disappeared:
>
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index 284ca90..eb41b32 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -781,7 +781,11 @@ do_unlk(struct file *filp, int cmd, struct file_lock
> *fl, int is_local)
>
> l_ctx = nfs_get_lock_context(nfs_file_open_context(filp));
> if (!IS_ERR(l_ctx)) {
> - status = nfs_iocounter_wait(&l_ctx->io_count);
> + struct nfs_io_counter *io_count = &l_ctx->io_count;
> + status = wait_on_bit(&io_count->flags,
> + NFS_IO_INPROGRESS,
> + nfs_wait_bit_killable,
> + TASK_KILLABLE);
> nfs_put_lock_context(l_ctx);
> if (status < 0)
> return status;
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 2ffebf2..6b9089c 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -87,6 +87,7 @@ nfs_page_free(struct nfs_page *p)
> static void
> nfs_iocounter_inc(struct nfs_io_counter *c)
> {
> + set_bit(NFS_IO_INPROGRESS, &c->flags);
> atomic_inc(&c->io_count);
> }
>
> Any thoughts? I really want to understand the issue at hand and to help
> fixing it properly.

The same kind of patch was proposed by David Jeffery in http://www.spinics.net/lists/linux-nfs/msg45806.html and the discussion in that thread answered lot of my questions.

The proposed patch was not accepted but David's another patch fixes the softlockup symptom (as tested with jam.c) as well: http://www.spinics.net/lists/linux-nfs/msg45807.html

Case closed.

--
Tuomas