Hi,
We see random nfsv4.0 Linux clients ending stuck on accessing an nfs server
which was just failover. Specifically this happens with NetApps during ONTAP
upgrades.
An affected client has all i/o stuck against the specific nfs server, while
accessing other nfs servers is fine. Rebooting the client fixes the issue.
This weekend, it happened again and I managed to do some debugging.
btw: I anonymized/changed some data below (hostnames, IP addresses, etc.)
The affected client was sending RENEW operation against NetApp which was
updated, getting NFS4ERR_STALE_CLIENTID in response, but then the client
was not trying to recover (setclientid, etc.) instead it just sent another
RENEW op, and that continued like this in a loop:
[root@client root]# tshark -Y nfs host BADnfssrv1
Running as user "root" and group "root". This could be dangerous.
Capturing on 'eth0'
1 0.000000000 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
0xddb4
2 0.074544014 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 1)
RENEW Status: NFS4ERR_STALE_CLIENTID
5 5.087870527 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
0xddb4
6 5.162370462 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 5)
RENEW Status: NFS4ERR_STALE_CLIENTID
9 10.175844705 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
0xddb4
10 10.250376101 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 9)
RENEW Status: NFS4ERR_STALE_CLIENTID
13 15.263984093 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
0xddb4
14 15.338614422 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 13)
RENEW Status: NFS4ERR_STALE_CLIENTID
...
Traffic to other nfs servers on the same client were working fine.
There was a similar thread back in March reported here by James Pearson, see
thread with subject "Stuck NFSv4 mounts of Isilon filer with repeated
NFS4ERR_STALE_CLIENTID errors". I've managed to gather more data though, so
hopefully this is going to be useful.
This specific client is running:
[root@client ~]$ cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)
[root@client ~]$ uname -a
Linux client.company.domain 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4
01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Although we hit the issue before on more recent centos versions as well.
All nfs filesystems are mounted as with vers=4.0,sec=krb5
An example process hanging, in this case df (collected via sysrq/t):
Jun 6 19:31:35 client kernel: [10658517.222819] df D
ffff8802173d1fa0 0 20131 19284 0x00000086
Jun 6 19:31:35 client kernel: [10658517.222837] Call Trace:
Jun 6 19:31:35 client kernel: [10658517.222868] [<ffffffff816ab6d9>]
schedule+0x29/0x70
Jun 6 19:31:35 client kernel: [10658517.222879] [<ffffffffc03367f8>]
gss_cred_init+0x2b8/0x3f0 [auth_rpcgss]
Jun 6 19:31:35 client kernel: [10658517.222885] [<ffffffff810b34b0>] ?
wake_up_atomic_t+0x30/0x30
Jun 6 19:31:35 client kernel: [10658517.222924] [<ffffffffc02eebe9>]
rpcauth_lookup_credcache+0x219/0x2b0 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.222989] [<ffffffffc0334623>]
gss_lookup_cred+0x13/0x20 [auth_rpcgss]
Jun 6 19:31:35 client kernel: [10658517.223150] [<ffffffffc02ee2db>]
rpcauth_lookupcred+0x8b/0xd0 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.223232] [<ffffffffc02ef0dd>]
rpcauth_refreshcred+0x14d/0x1d0 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.223285] [<ffffffffc02e299a>] ?
xprt_lock_and_alloc_slot+0x6a/0x80 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.223420] [<ffffffffc02dddf0>] ?
call_bc_transmit+0x1b0/0x1b0 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.223558] [<ffffffffc02dddf0>] ?
call_bc_transmit+0x1b0/0x1b0 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.223627] [<ffffffffc02de320>] ?
call_retry_reserve+0x60/0x60 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.223721] [<ffffffffc02de320>] ?
call_retry_reserve+0x60/0x60 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.223779] [<ffffffffc02de35c>]
call_refresh+0x3c/0x60 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.223821] [<ffffffffc02ec017>]
__rpc_execute+0x97/0x410 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.223827] [<ffffffff810b3475>] ?
wake_up_bit+0x25/0x30
Jun 6 19:31:35 client kernel: [10658517.223867] [<ffffffffc02ed7b8>]
rpc_execute+0x68/0xb0 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.223945] [<ffffffffc02dd776>]
rpc_run_task+0xf6/0x150 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.224029] [<ffffffffc04d28c3>]
nfs4_call_sync_sequence+0x63/0xa0 [nfsv4]
Jun 6 19:31:35 client kernel: [10658517.224181] [<ffffffffc04d394c>]
_nfs4_proc_getattr+0xcc/0xf0 [nfsv4]
Jun 6 19:31:35 client kernel: [10658517.224238] [<ffffffffc04df8f2>]
nfs4_proc_getattr+0x72/0xf0 [nfsv4]
Jun 6 19:31:35 client kernel: [10658517.224305] [<ffffffffc049c43f>]
__nfs_revalidate_inode+0xbf/0x310 [nfs]
Jun 6 19:31:35 client kernel: [10658517.224350] [<ffffffffc049ca55>]
nfs_getattr+0x95/0x250 [nfs]
Jun 6 19:31:35 client kernel: [10658517.224356] [<ffffffff8120872f>] ?
cp_new_stat+0x14f/0x180
Jun 6 19:31:35 client kernel: [10658517.224363] [<ffffffff81208106>]
vfs_getattr+0x46/0x80
Jun 6 19:31:35 client kernel: [10658517.224369] [<ffffffff81208235>]
vfs_fstatat+0x75/0xc0
Jun 6 19:31:35 client kernel: [10658517.224375] [<ffffffff8120878e>]
SYSC_newstat+0x2e/0x60
Jun 6 19:31:35 client kernel: [10658517.224381] [<ffffffff811212c6>] ?
__audit_syscall_exit+0x1e6/0x280
Jun 6 19:31:35 client kernel: [10658517.224387] [<ffffffff81208a6e>]
SyS_newstat+0xe/0x10
Jun 6 19:31:35 client kernel: [10658517.224393] [<ffffffff816b89fd>]
system_call_fastpath+0x16/0x1b
Same stack for the process 90 minutes later:
Jun 6 20:58:01 client kernel: [10663703.685238] df D
ffff8802173d1fa0 0 20131 19284 0x00000086
Jun 6 20:58:01 client kernel: [10663703.685242] Call Trace:
Jun 6 20:58:01 client kernel: [10663703.685247] [<ffffffff816ab6d9>]
schedule+0x29/0x70
Jun 6 20:58:01 client kernel: [10663703.685257] [<ffffffffc03367f8>]
gss_cred_init+0x2b8/0x3f0 [auth_rpcgss]
Jun 6 20:58:01 client kernel: [10663703.685262] [<ffffffff810b34b0>] ?
wake_up_atomic_t+0x30/0x30
Jun 6 20:58:01 client kernel: [10663703.685297] [<ffffffffc02eebe9>]
rpcauth_lookup_credcache+0x219/0x2b0 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.685391] [<ffffffffc0334623>]
gss_lookup_cred+0x13/0x20 [auth_rpcgss]
Jun 6 20:58:01 client kernel: [10663703.685560] [<ffffffffc02ee2db>]
rpcauth_lookupcred+0x8b/0xd0 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.685597] [<ffffffffc02ef0dd>]
rpcauth_refreshcred+0x14d/0x1d0 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.685836] [<ffffffffc02e299a>] ?
xprt_lock_and_alloc_slot+0x6a/0x80 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.685888] [<ffffffffc02dddf0>] ?
call_bc_transmit+0x1b0/0x1b0 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.685997] [<ffffffffc02dddf0>] ?
call_bc_transmit+0x1b0/0x1b0 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.686179] [<ffffffffc02de320>] ?
call_retry_reserve+0x60/0x60 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.686227] [<ffffffffc02de320>] ?
call_retry_reserve+0x60/0x60 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.686415] [<ffffffffc02de35c>]
call_refresh+0x3c/0x60 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.686514] [<ffffffffc02ec017>]
__rpc_execute+0x97/0x410 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.686519] [<ffffffff810b3475>] ?
wake_up_bit+0x25/0x30
Jun 6 20:58:01 client kernel: [10663703.686556] [<ffffffffc02ed7b8>]
rpc_execute+0x68/0xb0 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.686712] [<ffffffffc02dd776>]
rpc_run_task+0xf6/0x150 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.686852] [<ffffffffc04d28c3>]
nfs4_call_sync_sequence+0x63/0xa0 [nfsv4]
Jun 6 20:58:01 client kernel: [10663703.686887] [<ffffffffc04d394c>]
_nfs4_proc_getattr+0xcc/0xf0 [nfsv4]
Jun 6 20:58:01 client kernel: [10663703.687127] [<ffffffffc04df8f2>]
nfs4_proc_getattr+0x72/0xf0 [nfsv4]
Jun 6 20:58:01 client kernel: [10663703.687153] [<ffffffffc049c43f>]
__nfs_revalidate_inode+0xbf/0x310 [nfs]
Jun 6 20:58:01 client kernel: [10663703.687403] [<ffffffffc049ca55>]
nfs_getattr+0x95/0x250 [nfs]
Jun 6 20:58:01 client kernel: [10663703.687409] [<ffffffff8120872f>] ?
cp_new_stat+0x14f/0x180
Jun 6 20:58:01 client kernel: [10663703.687414] [<ffffffff81208106>]
vfs_getattr+0x46/0x80
Jun 6 20:58:01 client kernel: [10663703.687419] [<ffffffff81208235>]
vfs_fstatat+0x75/0xc0
Jun 6 20:58:01 client kernel: [10663703.687425] [<ffffffff8120878e>]
SYSC_newstat+0x2e/0x60
Jun 6 20:58:01 client kernel: [10663703.687430] [<ffffffff811212c6>] ?
__audit_syscall_exit+0x1e6/0x280
Jun 6 20:58:01 client kernel: [10663703.687436] [<ffffffff81208a6e>]
SyS_newstat+0xe/0x10
Jun 6 20:58:01 client kernel: [10663703.687440] [<ffffffff816b89fd>]
system_call_fastpath+0x16/0x1b
The client doesn't call nfs4_recovery_handle_error():
[root@client root]# stap -e 'probe
module("nfsv4").function("nfs4_recovery_handle_error"){print_backtrace()}'
I waited a couple of minutes - no output.
Which is strange...
The affected nfs server is BADnfssrv1 [10.100.139.63]
[root@client root]# stap -e 'probe
module("nfsv4").function("nfs4_renew_state") \
{ printf("%s %s least_time: %d secs_to_renew:
%d server: %s\n", ctime(gettimeofday_s()), probefunc(),
$clp->cl_lease_time/HZ(), (jiffies()-$clp->cl_last_renewal)/HZ(),
kernel_string($clp->cl_hostname)) }'
Sat Jun 6 14:03:07 2020 nfs4_renew_state least_time: 30 secs_to_renew: 20
server: nfsserv1.company.domain
Sat Jun 6 14:03:10 2020 nfs4_renew_state least_time: 30 secs_to_renew: 5826
server: BADnfssrv1.company.domain
Sat Jun 6 14:03:14 2020 nfs4_renew_state least_time: 30 secs_to_renew: 20
server: nfsserv2.company.domain
Sat Jun 6 14:03:15 2020 nfs4_renew_state least_time: 30 secs_to_renew: 5831
server: BADnfssrv1.company.domain
Sat Jun 6 14:03:20 2020 nfs4_renew_state least_time: 30 secs_to_renew: 20
server: lovpesgfs1.company.domain
Sat Jun 6 14:03:20 2020 nfs4_renew_state least_time: 30 secs_to_renew: 5836
server: BADnfssrv1.company.domain
Sat Jun 6 14:03:25 2020 nfs4_renew_state least_time: 30 secs_to_renew: 5841
server: BADnfssrv1.company.domain
Sat Jun 6 14:03:27 2020 nfs4_renew_state least_time: 30 secs_to_renew: 20
server: nfsserv1.company.domain
Sat Jun 6 14:03:31 2020 nfs4_renew_state least_time: 30 secs_to_renew: 5847
server: BADnfssrv1.company.domain
Sat Jun 6 14:03:34 2020 nfs4_renew_state least_time: 30 secs_to_renew: 20
server: nfsserv2.company.domain
Sat Jun 6 14:03:36 2020 nfs4_renew_state least_time: 30 secs_to_renew: 5852
server: BADnfssrv1.company.domain
...
Notice that the last time BADnfssrv1 renewed its lease was over 90 minutes
ago. Other nfs servers are fine.
The NetApp ONTAP upgrade (clustered) was executed about 90 minutes ago.
[root@client root]# stap -e 'probe
module("nfsv4").function("nfs4_proc_async_renew") \
{ printf("%s %s %d\n", ctime(gettimeofday_s()),
kernel_string($clp->cl_hostname), $renew_flags) }'
Sat Jun 6 14:13:05 2020 BADnfssrv1.company.domain 1
Sat Jun 6 14:13:08 2020 nfsserv1.company.domain 1
Sat Jun 6 14:13:11 2020 BADnfssrv1.company.domain 1
...
The function nfs4_renew_done() in linux-3.10.0-693.11.6.el7 is:
static void nfs4_renew_done(struct rpc_task *task, void *calldata)
{
struct nfs4_renewdata *data = calldata;
struct nfs_client *clp = data->client;
unsigned long timestamp = data->timestamp;
trace_nfs4_renew_async(clp, task->tk_status);
switch (task->tk_status) {
case 0:
break;
case -NFS4ERR_LEASE_MOVED:
nfs4_schedule_lease_moved_recovery(clp);
break;
default:
/* Unless we're shutting down, schedule state recovery! */
if (test_bit(NFS_CS_RENEWD, &clp->cl_res_state) == 0)
return;
if (task->tk_status != NFS4ERR_CB_PATH_DOWN) {
nfs4_schedule_lease_recovery(clp);
return;
}
nfs4_schedule_path_down_recovery(clp);
}
do_renew_lease(clp, timestamp);
}
[root@client root]# stap -e 'probe
module("nfsv4").function("nfs4_renew_done") \
{ clp=$data->client; printf("%s %s tk_status:
%d\n", ctime(gettimeofday_s()), kernel_string(clp->cl_hostname),
$task->tk_status) } \
probe
module("nfsv4").function("nfs4_schedule_lease_recovery") \
{ printf("%s nfs4_schedule_lease_recovery(): %s
cl_state: %d\n", ctime(gettimeofday_s()), kernel_string($clp->cl_hostname),
$clp->cl_state) } \
probe
module("nfsv4").function("do_renew_lease")\
{ printf("%s do_renew_lease(): %s\n",
ctime(gettimeofday_s()), kernel_string($clp->cl_hostname)) }'
Sat Jun 6 14:42:06 2020 BADnfssrv1.company.domain tk_status: -10022
Sat Jun 6 14:42:06 2020 nfs4_schedule_lease_recovery():
BADnfssrv1.company.domain cl_state: 5
Sat Jun 6 14:42:11 2020 nfsserv1.company.domain tk_status: 0
Sat Jun 6 14:42:11 2020 do_renew_lease(): nfsserv1.company.domain
Sat Jun 6 14:42:11 2020 BADnfssrv1.company.domain tk_status: -10022
Sat Jun 6 14:42:11 2020 nfs4_schedule_lease_recovery():
BADnfssrv1.company.domain cl_state: 5
Sat Jun 6 14:42:16 2020 BADnfssrv1.company.domain tk_status: -10022
Sat Jun 6 14:42:16 2020 nfs4_schedule_lease_recovery():
BADnfssrv1.company.domain cl_state: 5
Sat Jun 6 14:42:18 2020 nfsserv2.company.domain tk_status: 0
Sat Jun 6 14:42:18 2020 do_renew_lease(): nfsserv2.company.domain
Sat Jun 6 14:42:21 2020 BADnfssrv1.company.domain tk_status: -10022
Sat Jun 6 14:42:21 2020 nfs4_schedule_lease_recovery():
BADnfssrv1.company.domain cl_state: 5
^C
The -10022 is NFS4ERR_STALE_CLIENTID as seen in network dump.
The nfs4_schedule_lease_recovery() calls nfs4_schedule_state_manager():
void nfs4_schedule_state_manager(struct nfs_client *clp)
{
struct task_struct *task;
char buf[INET6_ADDRSTRLEN + sizeof("-manager") + 1];
if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0)
return;
...
Where NFS4CLNT_MANAGER_RUNNING is bit 0, so given cl_state == 5 the bit is
set.
Let's get stack for the 10.200.0.10-man(ager) kthread:
[root@client root]# echo t >/proc/sysrq-trigger
[root@client root]# less /var/log/messages
...
Jun 6 19:31:35 client kernel: [10658517.225231] 10.200.0.10-man S
ffff880229e7af70 0 5424 2 0x00000080
Jun 6 19:31:35 client kernel: [10658517.225236] Call Trace:
Jun 6 19:31:35 client kernel: [10658517.225277] [<ffffffffc02dffb0>] ?
rpc_release_client+0x90/0xb0 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.225309] [<ffffffff816ab6d9>]
schedule+0x29/0x70
Jun 6 19:31:35 client kernel: [10658517.225355] [<ffffffff816a90e9>]
schedule_timeout+0x239/0x2c0
Jun 6 19:31:35 client kernel: [10658517.225386] [<ffffffff811e0922>] ?
kmem_cache_free+0x1e2/0x200
Jun 6 19:31:35 client kernel: [10658517.225437] [<ffffffff81187177>] ?
mempool_free_slab+0x17/0x20
Jun 6 19:31:35 client kernel: [10658517.225469] [<ffffffff81187579>] ?
mempool_free+0x49/0x90
Jun 6 19:31:35 client kernel: [10658517.225513] [<ffffffff816ac441>]
wait_for_completion_interruptible+0x131/0x1a0
Jun 6 19:31:35 client kernel: [10658517.225543] [<ffffffff810c6440>] ?
wake_up_state+0x20/0x20
Jun 6 19:31:35 client kernel: [10658517.225590] [<ffffffffc04f0a70>]
nfs4_drain_slot_tbl+0x60/0x70 [nfsv4]
Jun 6 19:31:35 client kernel: [10658517.225650] [<ffffffffc04f0a97>]
nfs4_begin_drain_session.isra.11+0x17/0x40 [nfsv4]
Jun 6 19:31:35 client kernel: [10658517.225685] [<ffffffffc04f137f>]
nfs4_establish_lease+0x2f/0x80 [nfsv4]
Jun 6 19:31:35 client kernel: [10658517.225727] [<ffffffffc04f2818>]
nfs4_state_manager+0x1d8/0x8c0 [nfsv4]
Jun 6 19:31:35 client kernel: [10658517.225764] [<ffffffffc04f2f00>] ?
nfs4_state_manager+0x8c0/0x8c0 [nfsv4]
Jun 6 19:31:35 client kernel: [10658517.225854] [<ffffffffc04f2f1f>]
nfs4_run_state_manager+0x1f/0x40 [nfsv4]
Jun 6 19:31:35 client kernel: [10658517.225912] [<ffffffff810b252f>]
kthread+0xcf/0xe0
Jun 6 19:31:35 client kernel: [10658517.225976] [<ffffffff810b2460>] ?
insert_kthread_work+0x40/0x40
Jun 6 19:31:35 client kernel: [10658517.226000] [<ffffffff816b8798>]
ret_from_fork+0x58/0x90
Jun 6 19:31:35 client kernel: [10658517.226045] [<ffffffff810b2460>] ?
insert_kthread_work+0x40/0x40
...
I sent the sysrq/t couple of minutes later and then again later and each
time I get exactly same stack as above for the same pid (5424).
Notice that the kthread name is server IP-man(ager) and the IP address
matches the IP of the nfs server we are having issue with on this client.
So if I'm reading it correctly what is happening is that client sends RENEW
op, gets NFS4ERR_STALE_CLIENTID (-10022),
schedules lease recovery which sets NFS4CLNT_CHECK_LEASE (in
nfs4_schedule_lease_recovery() and tries to run
nfs4_schedule_state_manager(),
which ends up not doing anything as the kthread is already running and
should deal with the NFS4CLNT_CHECK_LEASE bit set. However as the
nfs4_state_manager() is stuck
on draining slot it never processes the NFS4CLNT_CHECK_LEASE. This explains
why network dump is showing the client looping on RENEW ops being send,
getting the error and not recovering (not setting new client id, etc.). It
also explains why other processes issuing any i/o to a file system served by
the server are stuck.
Given the stack, it is stuck here on wait_for_completion_interruptible()
(also notice the rpc_release_client() at the top of the stack).
static int nfs4_drain_slot_tbl(struct nfs4_slot_table *tbl)
{
set_bit(NFS4_SLOT_TBL_DRAINING, &tbl->slot_tbl_state);
spin_lock(&tbl->slot_tbl_lock);
if (tbl->highest_used_slotid != NFS4_NO_SLOT) {
INIT_COMPLETION(tbl->complete);
spin_unlock(&tbl->slot_tbl_lock);
return wait_for_completion_interruptible(&tbl->complete);
}
spin_unlock(&tbl->slot_tbl_lock);
return 0;
}
I run: kill -9 5424, which fixed the issue immediately.
(as
[root@client root]# tshark -Y nfs host BADnfssrv1
Running as user "root" and group "root". This could be dangerous.
Capturing on 'eth0'
2 19.842157633 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
0x6428
3 19.916744562 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 2)
RENEW
6 39.874158462 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
0x6428
7 39.948760040 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 6)
RENEW
10 59.906166646 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
0x6428
11 59.980762920 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 10)
RENEW
...
All stuck df processes, etc. resumed as well and new i/o is working fine
against the file server.
I wasn't sure if it didn't left something in undefined state though so
scheduled the client reboot just in case, although it seemed to be working
fine.
There is another PID 26378 which I noticed in sysrq/t output, which might be
the culprit here:
Jun 6 19:31:35 client kernel: [10658517.214810] nfsv4.0-svc S
ffff880234b96eb0 0 26378 2 0x00000080
Jun 6 19:31:35 client kernel: [10658517.214814] Call Trace:
Jun 6 19:31:35 client kernel: [10658517.214820] [<ffffffff8156afaf>] ?
sock_destroy_inode+0x2f/0x40
Jun 6 19:31:35 client kernel: [10658517.214825] [<ffffffff8121f5a8>] ?
destroy_inode+0x38/0x60
Jun 6 19:31:35 client kernel: [10658517.214830] [<ffffffff816ab6d9>]
schedule+0x29/0x70
Jun 6 19:31:35 client kernel: [10658517.214835] [<ffffffff816a90e9>]
schedule_timeout+0x239/0x2c0
Jun 6 19:31:35 client kernel: [10658517.214880] [<ffffffffc02ffd61>] ?
svc_xprt_free+0x71/0x90 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.214915] [<ffffffffc03018d1>]
svc_recv+0x971/0xb60 [sunrpc]
Jun 6 19:31:35 client kernel: [10658517.214950] [<ffffffffc04f7d40>] ?
nfs_callback_authenticate+0x60/0x60 [nfsv4]
Jun 6 19:31:35 client kernel: [10658517.214984] [<ffffffffc04f7d40>] ?
nfs_callback_authenticate+0x60/0x60 [nfsv4]
Jun 6 19:31:35 client kernel: [10658517.215019] [<ffffffffc04f7d73>]
nfs4_callback_svc+0x33/0x60 [nfsv4]
Jun 6 19:31:35 client kernel: [10658517.215024] [<ffffffff810b252f>]
kthread+0xcf/0xe0
Jun 6 19:31:35 client kernel: [10658517.215030] [<ffffffff810b2460>] ?
insert_kthread_work+0x40/0x40
Jun 6 19:31:35 client kernel: [10658517.215035] [<ffffffff816b8798>]
ret_from_fork+0x58/0x90
Jun 6 19:31:35 client kernel: [10658517.215041] [<ffffffff810b2460>] ?
insert_kthread_work+0x40/0x40
It is showing up, with the same stack, in every sysrq/t output collected,
even almost 90 minutes later:
Jun 6 20:58:01 client kernel: [10663703.676521] nfsv4.0-svc S
ffff880234b96eb0 0 26378 2 0x00000080
Jun 6 20:58:01 client kernel: [10663703.676525] Call Trace:
Jun 6 20:58:01 client kernel: [10663703.676531] [<ffffffff8156afaf>] ?
sock_destroy_inode+0x2f/0x40
Jun 6 20:58:01 client kernel: [10663703.676536] [<ffffffff8121f5a8>] ?
destroy_inode+0x38/0x60
Jun 6 20:58:01 client kernel: [10663703.676541] [<ffffffff816ab6d9>]
schedule+0x29/0x70
Jun 6 20:58:01 client kernel: [10663703.676545] [<ffffffff816a90e9>]
schedule_timeout+0x239/0x2c0
Jun 6 20:58:01 client kernel: [10663703.676550] [<ffffffff811e0e03>] ?
kfree+0x103/0x140
Jun 6 20:58:01 client kernel: [10663703.676595] [<ffffffffc02ffd61>] ?
svc_xprt_free+0x71/0x90 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.676633] [<ffffffffc03018d1>]
svc_recv+0x971/0xb60 [sunrpc]
Jun 6 20:58:01 client kernel: [10663703.676676] [<ffffffffc04f7d40>] ?
nfs_callback_authenticate+0x60/0x60 [nfsv4]
Jun 6 20:58:01 client kernel: [10663703.676709] [<ffffffffc04f7d40>] ?
nfs_callback_authenticate+0x60/0x60 [nfsv4]
Jun 6 20:58:01 client kernel: [10663703.676745] [<ffffffffc04f7d73>]
nfs4_callback_svc+0x33/0x60 [nfsv4]
Jun 6 20:58:01 client kernel: [10663703.676750] [<ffffffff810b252f>]
kthread+0xcf/0xe0
Jun 6 20:58:01 client kernel: [10663703.676756] [<ffffffff810b2460>] ?
insert_kthread_work+0x40/0x40
Jun 6 20:58:01 client kernel: [10663703.676761] [<ffffffff816b8798>]
ret_from_fork+0x58/0x90
Jun 6 20:58:01 client kernel: [10663703.676766] [<ffffffff810b2460>] ?
insert_kthread_work+0x40/0x40
Looks like it is a deadlock which depends on timing during nfs server
failovers.
Hopefully the above will be useful for someone more familiar with the actual
implementation.
I'm happy to try to debug it more next time it happens - open to any ideas
what to look for.
I briefly went thru git changelog and maybe on of the below fixes might be
addressing the issue.
commit 6221f1d9b63fed6260273e59a2b89ab30537a811
Author: Chuck Lever <[email protected]>
Date: Fri Apr 17 12:40:31 2020 -0400
SUNRPC: Fix backchannel RPC soft lockups
...
commit 67e7b52d44e3d539dfbfcd866c3d3d69da23a909
Author: Trond Myklebust <[email protected]>
Date: Wed Aug 7 07:31:27 2019 -0400
NFSv4: Ensure state recovery handles ETIMEDOUT correctly
...
commit 76ee03540f316948c3bc89fc76ded86c88e887a5
Author: Anna Schumaker <[email protected]>
Date: Tue Jan 10 16:49:31 2017 -0500
NFS: Check if the slot table is draining from nfs4_setup_sequence()
...
commit 6994cdd798568a0ddb8e0a85e2af24dbe655c341
Author: Anna Schumaker <[email protected]>
Date: Tue Jan 10 16:13:27 2017 -0500
NFS: Lock the slot table from a single place during setup sequence
Rather than implementing this twice for NFS v4.0 and v4.1
...
--
Robert Milkowski
> On Jun 8, 2020, at 8:34 AM, Robert Milkowski <[email protected]> wrote:
>
> Hi,
>
> We see random nfsv4.0 Linux clients ending stuck on accessing an nfs server
> which was just failover. Specifically this happens with NetApps during ONTAP
> upgrades.
> An affected client has all i/o stuck against the specific nfs server, while
> accessing other nfs servers is fine. Rebooting the client fixes the issue.
>
>
> This weekend, it happened again and I managed to do some debugging.
>
> btw: I anonymized/changed some data below (hostnames, IP addresses, etc.)
>
> The affected client was sending RENEW operation against NetApp which was
> updated, getting NFS4ERR_STALE_CLIENTID in response, but then the client
> was not trying to recover (setclientid, etc.) instead it just sent another
> RENEW op, and that continued like this in a loop:
>
> [root@client root]# tshark -Y nfs host BADnfssrv1
> Running as user "root" and group "root". This could be dangerous.
> Capturing on 'eth0'
> 1 0.000000000 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
> 0xddb4
> 2 0.074544014 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 1)
> RENEW Status: NFS4ERR_STALE_CLIENTID
> 5 5.087870527 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
> 0xddb4
> 6 5.162370462 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 5)
> RENEW Status: NFS4ERR_STALE_CLIENTID
> 9 10.175844705 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
> 0xddb4
> 10 10.250376101 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 9)
> RENEW Status: NFS4ERR_STALE_CLIENTID
> 13 15.263984093 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
> 0xddb4
> 14 15.338614422 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 13)
> RENEW Status: NFS4ERR_STALE_CLIENTID
> ...
>
> Traffic to other nfs servers on the same client were working fine.
>
> There was a similar thread back in March reported here by James Pearson, see
> thread with subject "Stuck NFSv4 mounts of Isilon filer with repeated
> NFS4ERR_STALE_CLIENTID errors". I've managed to gather more data though, so
> hopefully this is going to be useful.
>
>
> This specific client is running:
>
> [root@client ~]$ cat /etc/redhat-release
> CentOS Linux release 7.3.1611 (Core)
>
> [root@client ~]$ uname -a
> Linux client.company.domain 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4
> 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Hi Robert-
The usual course of action for bugs in distributor kernels is to work
directly with the distributor. Upstream developers don't generally have
access to or expertise with those code bases. 7.3 is an older kernel,
and it's possible that upstream has addressed this issue and CentOS
has pulled that fix into a newer release.
The troubleshooting below is common to this class of bugs. To get to
the root cause, you will need to observe the very beginning of the
loop, to see what condition triggers it.
> Although we hit the issue before on more recent centos versions as well.
>
> All nfs filesystems are mounted as with vers=4.0,sec=krb5
>
>
> An example process hanging, in this case df (collected via sysrq/t):
>
>
> Jun 6 19:31:35 client kernel: [10658517.222819] df D
> ffff8802173d1fa0 0 20131 19284 0x00000086
> Jun 6 19:31:35 client kernel: [10658517.222837] Call Trace:
> Jun 6 19:31:35 client kernel: [10658517.222868] [<ffffffff816ab6d9>]
> schedule+0x29/0x70
> Jun 6 19:31:35 client kernel: [10658517.222879] [<ffffffffc03367f8>]
> gss_cred_init+0x2b8/0x3f0 [auth_rpcgss]
> Jun 6 19:31:35 client kernel: [10658517.222885] [<ffffffff810b34b0>] ?
> wake_up_atomic_t+0x30/0x30
> Jun 6 19:31:35 client kernel: [10658517.222924] [<ffffffffc02eebe9>]
> rpcauth_lookup_credcache+0x219/0x2b0 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.222989] [<ffffffffc0334623>]
> gss_lookup_cred+0x13/0x20 [auth_rpcgss]
> Jun 6 19:31:35 client kernel: [10658517.223150] [<ffffffffc02ee2db>]
> rpcauth_lookupcred+0x8b/0xd0 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.223232] [<ffffffffc02ef0dd>]
> rpcauth_refreshcred+0x14d/0x1d0 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.223285] [<ffffffffc02e299a>] ?
> xprt_lock_and_alloc_slot+0x6a/0x80 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.223420] [<ffffffffc02dddf0>] ?
> call_bc_transmit+0x1b0/0x1b0 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.223558] [<ffffffffc02dddf0>] ?
> call_bc_transmit+0x1b0/0x1b0 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.223627] [<ffffffffc02de320>] ?
> call_retry_reserve+0x60/0x60 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.223721] [<ffffffffc02de320>] ?
> call_retry_reserve+0x60/0x60 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.223779] [<ffffffffc02de35c>]
> call_refresh+0x3c/0x60 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.223821] [<ffffffffc02ec017>]
> __rpc_execute+0x97/0x410 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.223827] [<ffffffff810b3475>] ?
> wake_up_bit+0x25/0x30
> Jun 6 19:31:35 client kernel: [10658517.223867] [<ffffffffc02ed7b8>]
> rpc_execute+0x68/0xb0 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.223945] [<ffffffffc02dd776>]
> rpc_run_task+0xf6/0x150 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.224029] [<ffffffffc04d28c3>]
> nfs4_call_sync_sequence+0x63/0xa0 [nfsv4]
> Jun 6 19:31:35 client kernel: [10658517.224181] [<ffffffffc04d394c>]
> _nfs4_proc_getattr+0xcc/0xf0 [nfsv4]
> Jun 6 19:31:35 client kernel: [10658517.224238] [<ffffffffc04df8f2>]
> nfs4_proc_getattr+0x72/0xf0 [nfsv4]
> Jun 6 19:31:35 client kernel: [10658517.224305] [<ffffffffc049c43f>]
> __nfs_revalidate_inode+0xbf/0x310 [nfs]
> Jun 6 19:31:35 client kernel: [10658517.224350] [<ffffffffc049ca55>]
> nfs_getattr+0x95/0x250 [nfs]
> Jun 6 19:31:35 client kernel: [10658517.224356] [<ffffffff8120872f>] ?
> cp_new_stat+0x14f/0x180
> Jun 6 19:31:35 client kernel: [10658517.224363] [<ffffffff81208106>]
> vfs_getattr+0x46/0x80
> Jun 6 19:31:35 client kernel: [10658517.224369] [<ffffffff81208235>]
> vfs_fstatat+0x75/0xc0
> Jun 6 19:31:35 client kernel: [10658517.224375] [<ffffffff8120878e>]
> SYSC_newstat+0x2e/0x60
> Jun 6 19:31:35 client kernel: [10658517.224381] [<ffffffff811212c6>] ?
> __audit_syscall_exit+0x1e6/0x280
> Jun 6 19:31:35 client kernel: [10658517.224387] [<ffffffff81208a6e>]
> SyS_newstat+0xe/0x10
> Jun 6 19:31:35 client kernel: [10658517.224393] [<ffffffff816b89fd>]
> system_call_fastpath+0x16/0x1b
>
>
> Same stack for the process 90 minutes later:
>
>
> Jun 6 20:58:01 client kernel: [10663703.685238] df D
> ffff8802173d1fa0 0 20131 19284 0x00000086
> Jun 6 20:58:01 client kernel: [10663703.685242] Call Trace:
> Jun 6 20:58:01 client kernel: [10663703.685247] [<ffffffff816ab6d9>]
> schedule+0x29/0x70
> Jun 6 20:58:01 client kernel: [10663703.685257] [<ffffffffc03367f8>]
> gss_cred_init+0x2b8/0x3f0 [auth_rpcgss]
> Jun 6 20:58:01 client kernel: [10663703.685262] [<ffffffff810b34b0>] ?
> wake_up_atomic_t+0x30/0x30
> Jun 6 20:58:01 client kernel: [10663703.685297] [<ffffffffc02eebe9>]
> rpcauth_lookup_credcache+0x219/0x2b0 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.685391] [<ffffffffc0334623>]
> gss_lookup_cred+0x13/0x20 [auth_rpcgss]
> Jun 6 20:58:01 client kernel: [10663703.685560] [<ffffffffc02ee2db>]
> rpcauth_lookupcred+0x8b/0xd0 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.685597] [<ffffffffc02ef0dd>]
> rpcauth_refreshcred+0x14d/0x1d0 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.685836] [<ffffffffc02e299a>] ?
> xprt_lock_and_alloc_slot+0x6a/0x80 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.685888] [<ffffffffc02dddf0>] ?
> call_bc_transmit+0x1b0/0x1b0 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.685997] [<ffffffffc02dddf0>] ?
> call_bc_transmit+0x1b0/0x1b0 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.686179] [<ffffffffc02de320>] ?
> call_retry_reserve+0x60/0x60 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.686227] [<ffffffffc02de320>] ?
> call_retry_reserve+0x60/0x60 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.686415] [<ffffffffc02de35c>]
> call_refresh+0x3c/0x60 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.686514] [<ffffffffc02ec017>]
> __rpc_execute+0x97/0x410 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.686519] [<ffffffff810b3475>] ?
> wake_up_bit+0x25/0x30
> Jun 6 20:58:01 client kernel: [10663703.686556] [<ffffffffc02ed7b8>]
> rpc_execute+0x68/0xb0 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.686712] [<ffffffffc02dd776>]
> rpc_run_task+0xf6/0x150 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.686852] [<ffffffffc04d28c3>]
> nfs4_call_sync_sequence+0x63/0xa0 [nfsv4]
> Jun 6 20:58:01 client kernel: [10663703.686887] [<ffffffffc04d394c>]
> _nfs4_proc_getattr+0xcc/0xf0 [nfsv4]
> Jun 6 20:58:01 client kernel: [10663703.687127] [<ffffffffc04df8f2>]
> nfs4_proc_getattr+0x72/0xf0 [nfsv4]
> Jun 6 20:58:01 client kernel: [10663703.687153] [<ffffffffc049c43f>]
> __nfs_revalidate_inode+0xbf/0x310 [nfs]
> Jun 6 20:58:01 client kernel: [10663703.687403] [<ffffffffc049ca55>]
> nfs_getattr+0x95/0x250 [nfs]
> Jun 6 20:58:01 client kernel: [10663703.687409] [<ffffffff8120872f>] ?
> cp_new_stat+0x14f/0x180
> Jun 6 20:58:01 client kernel: [10663703.687414] [<ffffffff81208106>]
> vfs_getattr+0x46/0x80
> Jun 6 20:58:01 client kernel: [10663703.687419] [<ffffffff81208235>]
> vfs_fstatat+0x75/0xc0
> Jun 6 20:58:01 client kernel: [10663703.687425] [<ffffffff8120878e>]
> SYSC_newstat+0x2e/0x60
> Jun 6 20:58:01 client kernel: [10663703.687430] [<ffffffff811212c6>] ?
> __audit_syscall_exit+0x1e6/0x280
> Jun 6 20:58:01 client kernel: [10663703.687436] [<ffffffff81208a6e>]
> SyS_newstat+0xe/0x10
> Jun 6 20:58:01 client kernel: [10663703.687440] [<ffffffff816b89fd>]
> system_call_fastpath+0x16/0x1b
>
>
>
>
>
> The client doesn't call nfs4_recovery_handle_error():
>
>
> [root@client root]# stap -e 'probe
> module("nfsv4").function("nfs4_recovery_handle_error"){print_backtrace()}'
>
> I waited a couple of minutes - no output.
> Which is strange...
>
>
> The affected nfs server is BADnfssrv1 [10.100.139.63]
>
>
> [root@client root]# stap -e 'probe
> module("nfsv4").function("nfs4_renew_state") \
> { printf("%s %s least_time: %d secs_to_renew:
> %d server: %s\n", ctime(gettimeofday_s()), probefunc(),
> $clp->cl_lease_time/HZ(), (jiffies()-$clp->cl_last_renewal)/HZ(),
> kernel_string($clp->cl_hostname)) }'
>
>
> Sat Jun 6 14:03:07 2020 nfs4_renew_state least_time: 30 secs_to_renew: 20
> server: nfsserv1.company.domain
> Sat Jun 6 14:03:10 2020 nfs4_renew_state least_time: 30 secs_to_renew: 5826
> server: BADnfssrv1.company.domain
> Sat Jun 6 14:03:14 2020 nfs4_renew_state least_time: 30 secs_to_renew: 20
> server: nfsserv2.company.domain
> Sat Jun 6 14:03:15 2020 nfs4_renew_state least_time: 30 secs_to_renew: 5831
> server: BADnfssrv1.company.domain
> Sat Jun 6 14:03:20 2020 nfs4_renew_state least_time: 30 secs_to_renew: 20
> server: lovpesgfs1.company.domain
> Sat Jun 6 14:03:20 2020 nfs4_renew_state least_time: 30 secs_to_renew: 5836
> server: BADnfssrv1.company.domain
> Sat Jun 6 14:03:25 2020 nfs4_renew_state least_time: 30 secs_to_renew: 5841
> server: BADnfssrv1.company.domain
> Sat Jun 6 14:03:27 2020 nfs4_renew_state least_time: 30 secs_to_renew: 20
> server: nfsserv1.company.domain
> Sat Jun 6 14:03:31 2020 nfs4_renew_state least_time: 30 secs_to_renew: 5847
> server: BADnfssrv1.company.domain
> Sat Jun 6 14:03:34 2020 nfs4_renew_state least_time: 30 secs_to_renew: 20
> server: nfsserv2.company.domain
> Sat Jun 6 14:03:36 2020 nfs4_renew_state least_time: 30 secs_to_renew: 5852
> server: BADnfssrv1.company.domain
> ...
>
>
> Notice that the last time BADnfssrv1 renewed its lease was over 90 minutes
> ago. Other nfs servers are fine.
> The NetApp ONTAP upgrade (clustered) was executed about 90 minutes ago.
>
>
> [root@client root]# stap -e 'probe
> module("nfsv4").function("nfs4_proc_async_renew") \
> { printf("%s %s %d\n", ctime(gettimeofday_s()),
> kernel_string($clp->cl_hostname), $renew_flags) }'
> Sat Jun 6 14:13:05 2020 BADnfssrv1.company.domain 1
> Sat Jun 6 14:13:08 2020 nfsserv1.company.domain 1
> Sat Jun 6 14:13:11 2020 BADnfssrv1.company.domain 1
> ...
>
>
> The function nfs4_renew_done() in linux-3.10.0-693.11.6.el7 is:
>
> static void nfs4_renew_done(struct rpc_task *task, void *calldata)
> {
> struct nfs4_renewdata *data = calldata;
> struct nfs_client *clp = data->client;
> unsigned long timestamp = data->timestamp;
>
> trace_nfs4_renew_async(clp, task->tk_status);
> switch (task->tk_status) {
> case 0:
> break;
> case -NFS4ERR_LEASE_MOVED:
> nfs4_schedule_lease_moved_recovery(clp);
> break;
> default:
> /* Unless we're shutting down, schedule state recovery! */
> if (test_bit(NFS_CS_RENEWD, &clp->cl_res_state) == 0)
> return;
> if (task->tk_status != NFS4ERR_CB_PATH_DOWN) {
> nfs4_schedule_lease_recovery(clp);
> return;
> }
> nfs4_schedule_path_down_recovery(clp);
> }
> do_renew_lease(clp, timestamp);
> }
>
>
>
> [root@client root]# stap -e 'probe
> module("nfsv4").function("nfs4_renew_done") \
> { clp=$data->client; printf("%s %s tk_status:
> %d\n", ctime(gettimeofday_s()), kernel_string(clp->cl_hostname),
> $task->tk_status) } \
> probe
> module("nfsv4").function("nfs4_schedule_lease_recovery") \
> { printf("%s nfs4_schedule_lease_recovery(): %s
> cl_state: %d\n", ctime(gettimeofday_s()), kernel_string($clp->cl_hostname),
> $clp->cl_state) } \
> probe
> module("nfsv4").function("do_renew_lease")\
> { printf("%s do_renew_lease(): %s\n",
> ctime(gettimeofday_s()), kernel_string($clp->cl_hostname)) }'
>
> Sat Jun 6 14:42:06 2020 BADnfssrv1.company.domain tk_status: -10022
> Sat Jun 6 14:42:06 2020 nfs4_schedule_lease_recovery():
> BADnfssrv1.company.domain cl_state: 5
> Sat Jun 6 14:42:11 2020 nfsserv1.company.domain tk_status: 0
> Sat Jun 6 14:42:11 2020 do_renew_lease(): nfsserv1.company.domain
> Sat Jun 6 14:42:11 2020 BADnfssrv1.company.domain tk_status: -10022
> Sat Jun 6 14:42:11 2020 nfs4_schedule_lease_recovery():
> BADnfssrv1.company.domain cl_state: 5
> Sat Jun 6 14:42:16 2020 BADnfssrv1.company.domain tk_status: -10022
> Sat Jun 6 14:42:16 2020 nfs4_schedule_lease_recovery():
> BADnfssrv1.company.domain cl_state: 5
> Sat Jun 6 14:42:18 2020 nfsserv2.company.domain tk_status: 0
> Sat Jun 6 14:42:18 2020 do_renew_lease(): nfsserv2.company.domain
> Sat Jun 6 14:42:21 2020 BADnfssrv1.company.domain tk_status: -10022
> Sat Jun 6 14:42:21 2020 nfs4_schedule_lease_recovery():
> BADnfssrv1.company.domain cl_state: 5
> ^C
>
>
> The -10022 is NFS4ERR_STALE_CLIENTID as seen in network dump.
>
>
> The nfs4_schedule_lease_recovery() calls nfs4_schedule_state_manager():
>
>
> void nfs4_schedule_state_manager(struct nfs_client *clp)
> {
> struct task_struct *task;
> char buf[INET6_ADDRSTRLEN + sizeof("-manager") + 1];
>
> if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0)
> return;
> ...
>
>
> Where NFS4CLNT_MANAGER_RUNNING is bit 0, so given cl_state == 5 the bit is
> set.
> Let's get stack for the 10.200.0.10-man(ager) kthread:
>
>
> [root@client root]# echo t >/proc/sysrq-trigger
> [root@client root]# less /var/log/messages
> ...
> Jun 6 19:31:35 client kernel: [10658517.225231] 10.200.0.10-man S
> ffff880229e7af70 0 5424 2 0x00000080
> Jun 6 19:31:35 client kernel: [10658517.225236] Call Trace:
> Jun 6 19:31:35 client kernel: [10658517.225277] [<ffffffffc02dffb0>] ?
> rpc_release_client+0x90/0xb0 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.225309] [<ffffffff816ab6d9>]
> schedule+0x29/0x70
> Jun 6 19:31:35 client kernel: [10658517.225355] [<ffffffff816a90e9>]
> schedule_timeout+0x239/0x2c0
> Jun 6 19:31:35 client kernel: [10658517.225386] [<ffffffff811e0922>] ?
> kmem_cache_free+0x1e2/0x200
> Jun 6 19:31:35 client kernel: [10658517.225437] [<ffffffff81187177>] ?
> mempool_free_slab+0x17/0x20
> Jun 6 19:31:35 client kernel: [10658517.225469] [<ffffffff81187579>] ?
> mempool_free+0x49/0x90
> Jun 6 19:31:35 client kernel: [10658517.225513] [<ffffffff816ac441>]
> wait_for_completion_interruptible+0x131/0x1a0
> Jun 6 19:31:35 client kernel: [10658517.225543] [<ffffffff810c6440>] ?
> wake_up_state+0x20/0x20
> Jun 6 19:31:35 client kernel: [10658517.225590] [<ffffffffc04f0a70>]
> nfs4_drain_slot_tbl+0x60/0x70 [nfsv4]
> Jun 6 19:31:35 client kernel: [10658517.225650] [<ffffffffc04f0a97>]
> nfs4_begin_drain_session.isra.11+0x17/0x40 [nfsv4]
> Jun 6 19:31:35 client kernel: [10658517.225685] [<ffffffffc04f137f>]
> nfs4_establish_lease+0x2f/0x80 [nfsv4]
> Jun 6 19:31:35 client kernel: [10658517.225727] [<ffffffffc04f2818>]
> nfs4_state_manager+0x1d8/0x8c0 [nfsv4]
> Jun 6 19:31:35 client kernel: [10658517.225764] [<ffffffffc04f2f00>] ?
> nfs4_state_manager+0x8c0/0x8c0 [nfsv4]
> Jun 6 19:31:35 client kernel: [10658517.225854] [<ffffffffc04f2f1f>]
> nfs4_run_state_manager+0x1f/0x40 [nfsv4]
> Jun 6 19:31:35 client kernel: [10658517.225912] [<ffffffff810b252f>]
> kthread+0xcf/0xe0
> Jun 6 19:31:35 client kernel: [10658517.225976] [<ffffffff810b2460>] ?
> insert_kthread_work+0x40/0x40
> Jun 6 19:31:35 client kernel: [10658517.226000] [<ffffffff816b8798>]
> ret_from_fork+0x58/0x90
> Jun 6 19:31:35 client kernel: [10658517.226045] [<ffffffff810b2460>] ?
> insert_kthread_work+0x40/0x40
>
> ...
>
>
> I sent the sysrq/t couple of minutes later and then again later and each
> time I get exactly same stack as above for the same pid (5424).
> Notice that the kthread name is server IP-man(ager) and the IP address
> matches the IP of the nfs server we are having issue with on this client.
>
>
> So if I'm reading it correctly what is happening is that client sends RENEW
> op, gets NFS4ERR_STALE_CLIENTID (-10022),
> schedules lease recovery which sets NFS4CLNT_CHECK_LEASE (in
> nfs4_schedule_lease_recovery() and tries to run
> nfs4_schedule_state_manager(),
> which ends up not doing anything as the kthread is already running and
> should deal with the NFS4CLNT_CHECK_LEASE bit set. However as the
> nfs4_state_manager() is stuck
> on draining slot it never processes the NFS4CLNT_CHECK_LEASE. This explains
> why network dump is showing the client looping on RENEW ops being send,
> getting the error and not recovering (not setting new client id, etc.). It
> also explains why other processes issuing any i/o to a file system served by
> the server are stuck.
>
>
> Given the stack, it is stuck here on wait_for_completion_interruptible()
> (also notice the rpc_release_client() at the top of the stack).
>
>
> static int nfs4_drain_slot_tbl(struct nfs4_slot_table *tbl)
> {
> set_bit(NFS4_SLOT_TBL_DRAINING, &tbl->slot_tbl_state);
> spin_lock(&tbl->slot_tbl_lock);
> if (tbl->highest_used_slotid != NFS4_NO_SLOT) {
> INIT_COMPLETION(tbl->complete);
> spin_unlock(&tbl->slot_tbl_lock);
> return wait_for_completion_interruptible(&tbl->complete);
> }
> spin_unlock(&tbl->slot_tbl_lock);
> return 0;
> }
>
>
>
> I run: kill -9 5424, which fixed the issue immediately.
> (as
>
> [root@client root]# tshark -Y nfs host BADnfssrv1
> Running as user "root" and group "root". This could be dangerous.
> Capturing on 'eth0'
> 2 19.842157633 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
> 0x6428
> 3 19.916744562 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 2)
> RENEW
> 6 39.874158462 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
> 0x6428
> 7 39.948760040 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 6)
> RENEW
> 10 59.906166646 10.100.139.63 -> 10.200.0.10 NFS 230 V4 Call RENEW CID:
> 0x6428
> 11 59.980762920 10.200.0.10 -> 10.100.139.63 NFS 182 V4 Reply (Call In 10)
> RENEW
> ...
>
>
> All stuck df processes, etc. resumed as well and new i/o is working fine
> against the file server.
>
> I wasn't sure if it didn't left something in undefined state though so
> scheduled the client reboot just in case, although it seemed to be working
> fine.
>
>
>
> There is another PID 26378 which I noticed in sysrq/t output, which might be
> the culprit here:
>
> Jun 6 19:31:35 client kernel: [10658517.214810] nfsv4.0-svc S
> ffff880234b96eb0 0 26378 2 0x00000080
> Jun 6 19:31:35 client kernel: [10658517.214814] Call Trace:
> Jun 6 19:31:35 client kernel: [10658517.214820] [<ffffffff8156afaf>] ?
> sock_destroy_inode+0x2f/0x40
> Jun 6 19:31:35 client kernel: [10658517.214825] [<ffffffff8121f5a8>] ?
> destroy_inode+0x38/0x60
> Jun 6 19:31:35 client kernel: [10658517.214830] [<ffffffff816ab6d9>]
> schedule+0x29/0x70
> Jun 6 19:31:35 client kernel: [10658517.214835] [<ffffffff816a90e9>]
> schedule_timeout+0x239/0x2c0
> Jun 6 19:31:35 client kernel: [10658517.214880] [<ffffffffc02ffd61>] ?
> svc_xprt_free+0x71/0x90 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.214915] [<ffffffffc03018d1>]
> svc_recv+0x971/0xb60 [sunrpc]
> Jun 6 19:31:35 client kernel: [10658517.214950] [<ffffffffc04f7d40>] ?
> nfs_callback_authenticate+0x60/0x60 [nfsv4]
> Jun 6 19:31:35 client kernel: [10658517.214984] [<ffffffffc04f7d40>] ?
> nfs_callback_authenticate+0x60/0x60 [nfsv4]
> Jun 6 19:31:35 client kernel: [10658517.215019] [<ffffffffc04f7d73>]
> nfs4_callback_svc+0x33/0x60 [nfsv4]
> Jun 6 19:31:35 client kernel: [10658517.215024] [<ffffffff810b252f>]
> kthread+0xcf/0xe0
> Jun 6 19:31:35 client kernel: [10658517.215030] [<ffffffff810b2460>] ?
> insert_kthread_work+0x40/0x40
> Jun 6 19:31:35 client kernel: [10658517.215035] [<ffffffff816b8798>]
> ret_from_fork+0x58/0x90
> Jun 6 19:31:35 client kernel: [10658517.215041] [<ffffffff810b2460>] ?
> insert_kthread_work+0x40/0x40
>
>
> It is showing up, with the same stack, in every sysrq/t output collected,
> even almost 90 minutes later:
>
> Jun 6 20:58:01 client kernel: [10663703.676521] nfsv4.0-svc S
> ffff880234b96eb0 0 26378 2 0x00000080
> Jun 6 20:58:01 client kernel: [10663703.676525] Call Trace:
> Jun 6 20:58:01 client kernel: [10663703.676531] [<ffffffff8156afaf>] ?
> sock_destroy_inode+0x2f/0x40
> Jun 6 20:58:01 client kernel: [10663703.676536] [<ffffffff8121f5a8>] ?
> destroy_inode+0x38/0x60
> Jun 6 20:58:01 client kernel: [10663703.676541] [<ffffffff816ab6d9>]
> schedule+0x29/0x70
> Jun 6 20:58:01 client kernel: [10663703.676545] [<ffffffff816a90e9>]
> schedule_timeout+0x239/0x2c0
> Jun 6 20:58:01 client kernel: [10663703.676550] [<ffffffff811e0e03>] ?
> kfree+0x103/0x140
> Jun 6 20:58:01 client kernel: [10663703.676595] [<ffffffffc02ffd61>] ?
> svc_xprt_free+0x71/0x90 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.676633] [<ffffffffc03018d1>]
> svc_recv+0x971/0xb60 [sunrpc]
> Jun 6 20:58:01 client kernel: [10663703.676676] [<ffffffffc04f7d40>] ?
> nfs_callback_authenticate+0x60/0x60 [nfsv4]
> Jun 6 20:58:01 client kernel: [10663703.676709] [<ffffffffc04f7d40>] ?
> nfs_callback_authenticate+0x60/0x60 [nfsv4]
> Jun 6 20:58:01 client kernel: [10663703.676745] [<ffffffffc04f7d73>]
> nfs4_callback_svc+0x33/0x60 [nfsv4]
> Jun 6 20:58:01 client kernel: [10663703.676750] [<ffffffff810b252f>]
> kthread+0xcf/0xe0
> Jun 6 20:58:01 client kernel: [10663703.676756] [<ffffffff810b2460>] ?
> insert_kthread_work+0x40/0x40
> Jun 6 20:58:01 client kernel: [10663703.676761] [<ffffffff816b8798>]
> ret_from_fork+0x58/0x90
> Jun 6 20:58:01 client kernel: [10663703.676766] [<ffffffff810b2460>] ?
> insert_kthread_work+0x40/0x40
>
>
>
>
> Looks like it is a deadlock which depends on timing during nfs server
> failovers.
>
> Hopefully the above will be useful for someone more familiar with the actual
> implementation.
> I'm happy to try to debug it more next time it happens - open to any ideas
> what to look for.
>
>
> I briefly went thru git changelog and maybe on of the below fixes might be
> addressing the issue.
>
>
> commit 6221f1d9b63fed6260273e59a2b89ab30537a811
> Author: Chuck Lever <[email protected]>
> Date: Fri Apr 17 12:40:31 2020 -0400
>
> SUNRPC: Fix backchannel RPC soft lockups
> ...
>
> commit 67e7b52d44e3d539dfbfcd866c3d3d69da23a909
> Author: Trond Myklebust <[email protected]>
> Date: Wed Aug 7 07:31:27 2019 -0400
>
> NFSv4: Ensure state recovery handles ETIMEDOUT correctly
>
> ...
>
> commit 76ee03540f316948c3bc89fc76ded86c88e887a5
> Author: Anna Schumaker <[email protected]>
> Date: Tue Jan 10 16:49:31 2017 -0500
>
> NFS: Check if the slot table is draining from nfs4_setup_sequence()
> ...
>
>
> commit 6994cdd798568a0ddb8e0a85e2af24dbe655c341
> Author: Anna Schumaker <[email protected]>
> Date: Tue Jan 10 16:13:27 2017 -0500
>
> NFS: Lock the slot table from a single place during setup sequence
>
> Rather than implementing this twice for NFS v4.0 and v4.1
> ...
>
>
>
>
> --
> Robert Milkowski
>
>
>
--
Chuck Lever
>
> The usual course of action for bugs in distributor kernels is to work
> directly with the distributor. Upstream developers don't generally have
> access to or expertise with those code bases. 7.3 is an older kernel,
> and it's possible that upstream has addressed this issue and CentOS has
> pulled that fix into a newer release.
>
I was hoping someone here might get back with "hey, this has been fixed by
commit...".
We also did see it on centos 7.6
I will try to get it re-produce it and once I can then I'll try to reproduce
it against upstream.
--
Robert Milkowski
On Wed, 10 Jun 2020 at 17:24, Robert Milkowski <[email protected]> wrote:
>
> >
> > The usual course of action for bugs in distributor kernels is to work
> > directly with the distributor. Upstream developers don't generally have
> > access to or expertise with those code bases. 7.3 is an older kernel,
> > and it's possible that upstream has addressed this issue and CentOS has
> > pulled that fix into a newer release.
> >
>
>
> I was hoping someone here might get back with "hey, this has been fixed by
> commit...".
> We also did see it on centos 7.6
>
> I will try to get it re-produce it and once I can then I'll try to reproduce
> it against upstream.
I haven't been monitoring this list recently - but we are still having
the same problem with CentOS 7.7 clients and Isilon filers
I've just 'fixed' one client with the issue in the way you described
earlier in this thread by looking for a process in nfs4_state_manager
via a stack trace - killing that pid with -9 and all the stuck mount
pops back to life
Have you reported this issue via the Redhat Bugzilla?
Thanks
James Pearson
On Wed, 24 Jun 2020 at 09:54, James Pearson <[email protected]> wrote:
>
> On Wed, 10 Jun 2020 at 17:24, Robert Milkowski <[email protected]> wrote:
> >
> > I was hoping someone here might get back with "hey, this has been fixed by
> > commit...".
> > We also did see it on centos 7.6
> >
> > I will try to get it re-produce it and once I can then I'll try to reproduce
> > it against upstream.
>
> I haven't been monitoring this list recently - but we are still having
> the same problem with CentOS 7.7 clients and Isilon filers
>
> I've just 'fixed' one client with the issue in the way you described
> earlier in this thread by looking for a process in nfs4_state_manager
> via a stack trace - killing that pid with -9 and all the stuck mount
> pops back to life
I have tcpdumps of when this RENEW/NFSERR_STALE_CLIENTID loop starts
taken on different clients - and all show a similar behaviour - from
one of these tcpdumps:
27803 2020-06-23 13:40:36 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
27804 2020-06-23 13:40:36 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 27803) RENEW
27815 2020-06-23 13:41:06 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
27816 2020-06-23 13:41:06 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 27815) RENEW
27827 2020-06-23 13:41:36 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
27828 2020-06-23 13:41:36 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 27827) RENEW
27856 2020-06-23 13:42:06 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
27857 2020-06-23 13:42:06 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 27856) RENEW
28071 2020-06-23 13:42:29 10.78.202.217 -> 10.78.196.220 NFS 110 V4
NULL Call[Malformed Packet]
28073 2020-06-23 13:42:29 10.78.196.220 -> 10.78.202.217 NFS 94 V4
NULL Reply (Call In 28071)[Malformed Packet]
28082 2020-06-23 13:42:29 10.78.202.217 -> 10.78.196.220 NFS 110 V4
NULL Call[Malformed Packet]
28084 2020-06-23 13:42:29 10.78.196.220 -> 10.78.202.217 NFS 94 V4
NULL Reply (Call In 28082)[Malformed Packet]
28086 2020-06-23 13:42:29 10.78.202.217 -> 10.78.196.220 NFS 370 V4
Call EXCHANGE_ID
28087 2020-06-23 13:42:29 10.78.196.220 -> 10.78.202.217 NFS 106 V4
Reply (Call In 28086) Status: NFS4ERR_MINOR_VERS_MISMATCH
28092 2020-06-23 13:42:29 10.78.202.217 -> 10.78.196.220 NFS 206 V4
Call PUTROOTFH | GETATTR
28093 2020-06-23 13:42:29 10.78.196.220 -> 10.78.202.217 NFS 342 V4
Reply (Call In 28092) PUTROOTFH | GETATTR
28095 2020-06-23 13:42:29 10.78.202.217 -> 10.78.196.220 NFS 258 V4
Call GETATTR FH: 0x2e47f02d
28096 2020-06-23 13:42:29 10.78.196.220 -> 10.78.202.217 NFS 162 V4
Reply (Call In 28095) GETATTR
28097 2020-06-23 13:42:29 10.78.202.217 -> 10.78.196.220 NFS 262 V4
Call GETATTR FH: 0x2e47f02d
28098 2020-06-23 13:42:29 10.78.196.220 -> 10.78.202.217 NFS 178 V4
Reply (Call In 28097) GETATTR
28099 2020-06-23 13:42:29 10.78.202.217 -> 10.78.196.220 NFS 258 V4
Call GETATTR FH: 0x2e47f02d
28100 2020-06-23 13:42:29 10.78.196.220 -> 10.78.202.217 NFS 162 V4
Reply (Call In 28099) GETATTR
...
39660 2020-06-23 13:42:53 10.78.202.217 -> 10.78.196.220 NFS 346 V4
Call LOOKUP DH: 0x19cf3646/shared
39661 2020-06-23 13:42:53 10.78.196.220 -> 10.78.202.217 NFS 342 V4
Reply (Call In 39660) LOOKUP
39663 2020-06-23 13:42:53 10.78.202.217 -> 10.78.196.220 NFS 314 V4
Call READLINK
39665 2020-06-23 13:42:53 10.78.196.220 -> 10.78.202.217 NFS 162 V4
Reply (Call In 39663) READLINK
44182 2020-06-23 13:42:56 10.78.202.217 -> 10.78.196.220 NFS 326 V4
Call GETATTR FH: 0xff288ce9
44183 2020-06-23 13:42:56 10.78.196.220 -> 10.78.202.217 NFS 266 V4
Reply (Call In 44182) GETATTR
68152 2020-06-23 13:42:59 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
68153 2020-06-23 13:42:59 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 68152) RENEW Status: NFS4ERR_STALE_CLIENTID
68155 2020-06-23 13:42:59 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
68156 2020-06-23 13:42:59 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 68155) RENEW Status: NFS4ERR_STALE_CLIENTID
73012 2020-06-23 13:43:04 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
73013 2020-06-23 13:43:04 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 73012) RENEW Status: NFS4ERR_STALE_CLIENTID
73270 2020-06-23 13:43:09 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
73271 2020-06-23 13:43:09 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 73270) RENEW Status: NFS4ERR_STALE_CLIENTID
73273 2020-06-23 13:43:14 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
73274 2020-06-23 13:43:14 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 73273) RENEW Status: NFS4ERR_STALE_CLIENTID
73276 2020-06-23 13:43:19 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
73277 2020-06-23 13:43:19 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 73276) RENEW Status: NFS4ERR_STALE_CLIENTID
73279 2020-06-23 13:43:24 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
73280 2020-06-23 13:43:24 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 73279) RENEW Status: NFS4ERR_STALE_CLIENTID
i.e. all clients have:
28071 2020-06-23 13:42:29 10.78.202.217 -> 10.78.196.220 NFS 110 V4
NULL Call[Malformed Packet]
28073 2020-06-23 13:42:29 10.78.196.220 -> 10.78.202.217 NFS 94 V4
NULL Reply (Call In 28071)[Malformed Packet]
28082 2020-06-23 13:42:29 10.78.202.217 -> 10.78.196.220 NFS 110 V4
NULL Call[Malformed Packet]
28084 2020-06-23 13:42:29 10.78.196.220 -> 10.78.202.217 NFS 94 V4
NULL Reply (Call In 28082)[Malformed Packet]
28086 2020-06-23 13:42:29 10.78.202.217 -> 10.78.196.220 NFS 370 V4
Call EXCHANGE_ID
28087 2020-06-23 13:42:29 10.78.196.220 -> 10.78.202.217 NFS 106 V4
Reply (Call In 28086) Status: NFS4ERR_MINOR_VERS_MISMATCH
then 30 seconds later, the RENEW/NFSERR_STALE_CLIENTID loop starts:
68152 2020-06-23 13:42:59 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
68153 2020-06-23 13:42:59 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 68152) RENEW Status: NFS4ERR_STALE_CLIENTID
68155 2020-06-23 13:42:59 10.78.202.217 -> 10.78.196.220 NFS 194 V4
Call RENEW CID: 0x3091
68156 2020-06-23 13:42:59 10.78.196.220 -> 10.78.202.217 NFS 114 V4
Reply (Call In 68155) RENEW Status: NFS4ERR_STALE_CLIENTID
Does anyone have any idea if the above might point to something that
has already been fixed in the mainline kernel?
Thanks
James Pearson