2021-03-08 18:17:45

by Jason Breitman

[permalink] [raw]
Subject: NFS Mount Hangs

Issue
NFSv4 mounts periodically hang on the NFS Client.

During this time, it is possible to manually mount from another NFS Server on the NFS Client having issues.
Also, other NFS Clients are successfully mounting from the NFS Server in question.
Rebooting the NFS Client appears to be the only solution.

I believe this issue has been discussed in the past so I included an article that matched my symptoms.
I do not see a case statement for FIN_WAIT2 at https://elixir.bootlin.com/linux/v4.19.171/source/net/sunrpc/xprtsock.c.

NFS Client
OS: Debian Buster 10.8
Kernel: 4.19.171-2
Protocol: NFSv4 with Kerberos Security
Mount Options: nfs-server.domain.com:/data /mnt/data nfs4 lookupcache=pos,noresvport,sec=krb5,hard,rsize=1048576,wsize=1048576 00

Output from the NFS Client when the issue occurs
# netstat -an | grep NFS.Server.IP.X
tcp 0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 FIN_WAIT2

# cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info
netid: tcp
addr: NFS.Server.IP.X
port: 2049
state: 0x51

syslog
Mar 4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status -client- --rqstp- -timeout ---ops--
Mar 4 10:29:27 hostname kernel: [437414.133158] 57419 40a1 0 9b723c73 143cfadf 30000 4ca953b5 nfsv4 OPEN_NOATTR a:call_connect_status [sunrpc] q:xprt_pending
Mar 4 10:29:27 hostname kernel: [437414.135211] 57420 4081 -11 9b723c73 (null) 0 4ca953b5 nfsv4 OPEN_NOATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.137250] 57421 4081 -11 9b723c73 (null) 0 4ca953b5 nfsv4 OPEN_NOATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.139345] 57422 4081 -11 9b723c73 (null) 0 4ca953b5 nfsv4 OPEN_NOATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.141496] 57423 4081 -11 9b723c73 (null) 0 4ca953b5 nfsv4 OPEN_NOATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.143712] 57424 4081 -11 9b723c73 (null) 0 4ca953b5 nfsv4 OPEN_NOATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.145940] 57425 4081 -11 9b723c73 (null) 0 4ca953b5 nfsv4 OPEN_NOATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.148227] 57426 4081 -11 9b723c73 (null) 0 4ca953b5 nfsv4 OPEN_NOATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.150575] 57427 4081 -11 9b723c73 (null) 0 4ca953b5 nfsv4 OPEN_NOATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.152938] 57428 4080 -11 9b723c73 (null) 0 fb0400d nfsv4 LOOKUP a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.154478] 57433 4080 -11 27bf33c1 (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.156023] 57434 4080 -11 27bf33c1 (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.157549] 57435 4080 -11 27bf33c1 (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.159073] 57436 4080 -11 27bf33c1 (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.160587] 57437 4080 -11 27bf33c1 (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.162094] 57438 4080 -11 27bf33c1 (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.163597] 57431 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.165100] 57432 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.166598] 57439 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.168088] 57440 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.169573] 57441 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.171058] 57442 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.172532] 57443 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.173991] 57444 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.175452] 57445 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.176906] 57446 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.178349] 57447 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.179792] 57448 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.181227] 57449 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.182655] 57450 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.184081] 57451 4080 -11 3118865a (null) 0 fb0400d nfsv4 GETATTR a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.185494] 57418 4880 -11 d42d6144 ab8b1696 0 fb0400d nfsv4 STATFS a:call_connect_status [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.186905] 57430 4080 -11 d42d6144 (null) 0 fb0400d nfsv4 STATFS a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:27 hostname kernel: [437414.188310] 57429 5281 -11 907fb25c (null) 0 5fa6554c nfsv4 SEQUENCE a:call_reserveresult [sunrpc] q:xprt_sending
Mar 4 10:29:30 hostname kernel: [437417.110517] RPC: 57419 xprt_connect_status: connect attempt timed out
Mar 4 10:29:30 hostname kernel: [437417.112172] RPC: 57419 call_connect_status (status -110)
Mar 4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 call_timeout (major)
Mar 4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 call_bind (status 0)
Mar 4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 call_connect xprt 00000000e061831b is not connected
Mar 4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 xprt_connect xprt 00000000e061831b is not connected
Mar 4 10:30:31 hostname kernel: [437478.551090] RPC: 57419 xprt_connect_status: connect attempt timed out
Mar 4 10:30:31 hostname kernel: [437478.552396] RPC: 57419 call_connect_status (status -110)
Mar 4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 call_timeout (minor)
Mar 4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 call_bind (status 0)
Mar 4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 call_connect xprt 00000000e061831b is not connected
Mar 4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 xprt_connect xprt 00000000e061831b is not connected

Reference Article
https://patchwork.kernel.org/project/linux-nfs/patch/[email protected]/


Jason Breitman






2021-03-08 22:17:42

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS Mount Hangs

On Mon, 2021-03-08 at 13:16 -0500, Jason Breitman wrote:
> Issue
> NFSv4 mounts periodically hang on the NFS Client.
>
> During this time, it is possible to manually mount from another NFS
> Server on the NFS Client having issues.
> Also, other NFS Clients are successfully mounting from the NFS Server
> in question.
> Rebooting the NFS Client appears to be the only solution.
>
> I believe this issue has been discussed in the past so I included an
> article that matched my symptoms.
> I do not see a case statement for FIN_WAIT2 at
> https://elixir.bootlin.com/linux/v4.19.171/source/net/sunrpc/xprtsock.c
> .
>
> NFS Client
> OS:             Debian Buster 10.8
> Kernel: 4.19.171-2
> Protocol:       NFSv4 with Kerberos Security
> Mount Options:  nfs-
> server.domain.com:/data     /mnt/data       nfs4    lookupcache=pos,n
> oresvport,sec=krb5,hard,rsize=1048576,wsize=1048576    00
>
> Output from the NFS Client when the issue occurs
> # netstat -an | grep NFS.Server.IP.X
> tcp        0      0 NFS.Client.IP.X:46896     
> NFS.Server.IP.X:2049       FIN_WAIT2
>

Your client has closed the connection, and is waiting for the server to
close the connection on its side.

I'd suggest using a newer kernel, or else getting someone in the Debian
project to fix theirs.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-03-08 22:23:36

by Jason Breitman

[permalink] [raw]
Subject: Re: NFS Mount Hangs

Thank you.
Do you know which kernel version is fixed?
If not, is there something I can look for in the source code that will indicate that the kernel is fixed.

Jason Breitman


On Mar 8, 2021, at 5:15 PM, Trond Myklebust <[email protected]> wrote:

On Mon, 2021-03-08 at 13:16 -0500, Jason Breitman wrote:
> Issue
> NFSv4 mounts periodically hang on the NFS Client.
>
> During this time, it is possible to manually mount from another NFS
> Server on the NFS Client having issues.
> Also, other NFS Clients are successfully mounting from the NFS Server
> in question.
> Rebooting the NFS Client appears to be the only solution.
>
> I believe this issue has been discussed in the past so I included an
> article that matched my symptoms.
> I do not see a case statement for FIN_WAIT2 at
> https://elixir.bootlin.com/linux/v4.19.171/source/net/sunrpc/xprtsock.c
> .
>
> NFS Client
> OS: Debian Buster 10.8
> Kernel: 4.19.171-2
> Protocol: NFSv4 with Kerberos Security
> Mount Options: nfs-
> server.domain.com:/data /mnt/data nfs4 lookupcache=pos,n
> oresvport,sec=krb5,hard,rsize=1048576,wsize=1048576 00
>
> Output from the NFS Client when the issue occurs
> # netstat -an | grep NFS.Server.IP.X
> tcp 0 0 NFS.Client.IP.X:46896
> NFS.Server.IP.X:2049 FIN_WAIT2
>

Your client has closed the connection, and is waiting for the server to
close the connection on its side.

I'd suggest using a newer kernel, or else getting someone in the Debian
project to fix theirs.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]



2021-03-09 01:54:59

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS Mount Hangs

On Mon, 2021-03-08 at 17:22 -0500, Jason Breitman wrote:
> Thank you.
> Do you know which kernel version is fixed?
> If not, is there something I can look for in the source code that
> will indicate that the kernel is fixed.
>

As I said, the problem here is that the server doesn't seem to be
closing the socket when the client does. You didn't actually tell us
which server you are using, so I don't know how to answer that
question.


> Jason Breitman
>
>
> On Mar 8, 2021, at 5:15 PM, Trond Myklebust <[email protected]>
> wrote:
>
> On Mon, 2021-03-08 at 13:16 -0500, Jason Breitman wrote:
> > Issue
> > NFSv4 mounts periodically hang on the NFS Client.
> >
> > During this time, it is possible to manually mount from another NFS
> > Server on the NFS Client having issues.
> > Also, other NFS Clients are successfully mounting from the NFS
> > Server
> > in question.
> > Rebooting the NFS Client appears to be the only solution.
> >
> > I believe this issue has been discussed in the past so I included
> > an
> > article that matched my symptoms.
> > I do not see a case statement for FIN_WAIT2 at
> > https://elixir.bootlin.com/linux/v4.19.171/source/net/sunrpc/xprtsock.c
> > .
> >
> > NFS Client
> > OS:             Debian Buster 10.8
> > Kernel: 4.19.171-2
> > Protocol:       NFSv4 with Kerberos Security
> > Mount Options:  nfs-
> > server.domain.com:/data     /mnt/data       nfs4   
> > lookupcache=pos,n
> > oresvport,sec=krb5,hard,rsize=1048576,wsize=1048576    00
> >
> > Output from the NFS Client when the issue occurs
> > # netstat -an | grep NFS.Server.IP.X
> > tcp        0      0 NFS.Client.IP.X:46896    
> > NFS.Server.IP.X:2049       FIN_WAIT2
> >
>
> Your client has closed the connection, and is waiting for the server
> to
> close the connection on its side.
>
> I'd suggest using a newer kernel, or else getting someone in the
> Debian
> project to fix theirs.
>

--
Trond Myklebust
CTO, Hammerspace Inc
4984 El Camino Real, Suite 208
Los Altos, CA 94022

http://www.hammer.space

2021-03-09 03:34:35

by Jason Breitman

[permalink] [raw]
Subject: Re: NFS Mount Hangs

The server is running FreeBSD 12.1-RELEASE-p5.

Jason Breitman

On Mar 8, 2021, at 8:52 PM, Trond Myklebust <[email protected]> wrote:

On Mon, 2021-03-08 at 17:22 -0500, Jason Breitman wrote:
> Thank you.
> Do you know which kernel version is fixed?
> If not, is there something I can look for in the source code that
> will indicate that the kernel is fixed.
>

As I said, the problem here is that the server doesn't seem to be
closing the socket when the client does. You didn't actually tell us
which server you are using, so I don't know how to answer that
question.


> Jason Breitman
>
>
> On Mar 8, 2021, at 5:15 PM, Trond Myklebust <[email protected]>
> wrote:
>
> On Mon, 2021-03-08 at 13:16 -0500, Jason Breitman wrote:
> > Issue
> > NFSv4 mounts periodically hang on the NFS Client.
> >
> > During this time, it is possible to manually mount from another NFS
> > Server on the NFS Client having issues.
> > Also, other NFS Clients are successfully mounting from the NFS
> > Server
> > in question.
> > Rebooting the NFS Client appears to be the only solution.
> >
> > I believe this issue has been discussed in the past so I included
> > an
> > article that matched my symptoms.
> > I do not see a case statement for FIN_WAIT2 at
> > https://elixir.bootlin.com/linux/v4.19.171/source/net/sunrpc/xprtsock.c
> > .
> >
> > NFS Client
> > OS: Debian Buster 10.8
> > Kernel: 4.19.171-2
> > Protocol: NFSv4 with Kerberos Security
> > Mount Options: nfs-
> > server.domain.com:/data /mnt/data nfs4
> > lookupcache=pos,n
> > oresvport,sec=krb5,hard,rsize=1048576,wsize=1048576 00
> >
> > Output from the NFS Client when the issue occurs
> > # netstat -an | grep NFS.Server.IP.X
> > tcp 0 0 NFS.Client.IP.X:46896
> > NFS.Server.IP.X:2049 FIN_WAIT2
> >
>
> Your client has closed the connection, and is waiting for the server
> to
> close the connection on its side.
>
> I'd suggest using a newer kernel, or else getting someone in the
> Debian
> project to fix theirs.
>

--
Trond Myklebust
CTO, Hammerspace Inc
4984 El Camino Real, Suite 208
Los Altos, CA 94022

http://www.hammer.space


2021-03-10 15:53:40

by Jason Breitman

[permalink] [raw]
Subject: Re: NFS Mount Hangs

Do you know of a FreeBSD 12.X kernel that has the fix you suggested or is there a kernel setting that I can apply to eliminate the issue?

Jason Breitman


On Mar 8, 2021, at 10:32 PM, Jason Breitman <[email protected]> wrote:

The server is running FreeBSD 12.1-RELEASE-p5.

Jason Breitman

On Mar 8, 2021, at 8:52 PM, Trond Myklebust <[email protected]> wrote:

On Mon, 2021-03-08 at 17:22 -0500, Jason Breitman wrote:
> Thank you.
> Do you know which kernel version is fixed?
> If not, is there something I can look for in the source code that
> will indicate that the kernel is fixed.
>

As I said, the problem here is that the server doesn't seem to be
closing the socket when the client does. You didn't actually tell us
which server you are using, so I don't know how to answer that
question.


> Jason Breitman
>
>
> On Mar 8, 2021, at 5:15 PM, Trond Myklebust <[email protected]>
> wrote:
>
> On Mon, 2021-03-08 at 13:16 -0500, Jason Breitman wrote:
>> Issue
>> NFSv4 mounts periodically hang on the NFS Client.
>>
>> During this time, it is possible to manually mount from another NFS
>> Server on the NFS Client having issues.
>> Also, other NFS Clients are successfully mounting from the NFS
>> Server
>> in question.
>> Rebooting the NFS Client appears to be the only solution.
>>
>> I believe this issue has been discussed in the past so I included
>> an
>> article that matched my symptoms.
>> I do not see a case statement for FIN_WAIT2 at
>> https://elixir.bootlin.com/linux/v4.19.171/source/net/sunrpc/xprtsock.c
>> .
>>
>> NFS Client
>> OS: Debian Buster 10.8
>> Kernel: 4.19.171-2
>> Protocol: NFSv4 with Kerberos Security
>> Mount Options: nfs-
>> server.domain.com:/data /mnt/data nfs4
>> lookupcache=pos,n
>> oresvport,sec=krb5,hard,rsize=1048576,wsize=1048576 00
>>
>> Output from the NFS Client when the issue occurs
>> # netstat -an | grep NFS.Server.IP.X
>> tcp 0 0 NFS.Client.IP.X:46896
>> NFS.Server.IP.X:2049 FIN_WAIT2
>>
>
> Your client has closed the connection, and is waiting for the server
> to
> close the connection on its side.
>
> I'd suggest using a newer kernel, or else getting someone in the
> Debian
> project to fix theirs.
>

--
Trond Myklebust
CTO, Hammerspace Inc
4984 El Camino Real, Suite 208
Los Altos, CA 94022

http://www.hammer.space