2021-05-17 03:18:01

by Michael Wakabayashi

[permalink] [raw]
Subject: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

Hi,

We're seeing what looks like an NFSv4 issue.

Mounting an NFS server that is down (ping to this NFS server's IP address does not respond) will block _all_ other NFS mount attempts even if the NFS servers are available and working properly (these subsequent mounts hang).

If I kill the NFS mount process that's trying to mount the dead NFS server, the NFS mounts that were blocked will immediately unblock and mount successfully, which suggests the first mount command is blocking the other mount commands.


I verified this behavior using a newly built mount.nfs command from the recent nfs-utils 2.5.3 package installed on a recent version of Ubuntu Cloud Image 21.04:
* https://sourceforge.net/projects/nfs/files/nfs-utils/2.5.3/
* https://cloud-images.ubuntu.com/releases/hirsute/release-20210513/ubuntu-21.04-server-cloudimg-amd64.ova


The reason this looks like it is specific to NFSv4 is from the following output showing "vers=4.2":
> $ strace /sbin/mount.nfs <unreachable-IP-address>:/path /tmp/mnt
> [ ... cut ... ]
> mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs", 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z

Also, if I try the same mount.nfs commands but specifying NFSv3, the mount to the dead NFS server hangs, but the mounts to the operational NFS servers do not block and mount successfully; this bug doesn't happen when using NFSv3.


We reported this issue under util-linux here:
https://github.com/karelzak/util-linux/issues/1309
[mounting nfs server which is down blocks all other nfs mounts on same machine #1309]

I also found an older bug on this mailing list that had similar symptoms (but could not tell if it was the same problem or not):
https://patchwork.kernel.org/project/linux-nfs/patch/[email protected]/
[[PATCH/RFC] NFSv4: don't let hanging mounts block other mounts]

Thanks, Mike


2021-05-19 20:21:22

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
<[email protected]> wrote:
>
> Hi,
>
> We're seeing what looks like an NFSv4 issue.
>
> Mounting an NFS server that is down (ping to this NFS server's IP address does not respond) will block _all_ other NFS mount attempts even if the NFS servers are available and working properly (these subsequent mounts hang).
>
> If I kill the NFS mount process that's trying to mount the dead NFS server, the NFS mounts that were blocked will immediately unblock and mount successfully, which suggests the first mount command is blocking the other mount commands.
>
>
> I verified this behavior using a newly built mount.nfs command from the recent nfs-utils 2.5.3 package installed on a recent version of Ubuntu Cloud Image 21.04:
> * https://sourceforge.net/projects/nfs/files/nfs-utils/2.5.3/
> * https://cloud-images.ubuntu.com/releases/hirsute/release-20210513/ubuntu-21.04-server-cloudimg-amd64.ova
>
>
> The reason this looks like it is specific to NFSv4 is from the following output showing "vers=4.2":
> > $ strace /sbin/mount.nfs <unreachable-IP-address>:/path /tmp/mnt
> > [ ... cut ... ]
> > mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs", 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
>
> Also, if I try the same mount.nfs commands but specifying NFSv3, the mount to the dead NFS server hangs, but the mounts to the operational NFS servers do not block and mount successfully; this bug doesn't happen when using NFSv3.
>
>
> We reported this issue under util-linux here:
> https://github.com/karelzak/util-linux/issues/1309
> [mounting nfs server which is down blocks all other nfs mounts on same machine #1309]
>
> I also found an older bug on this mailing list that had similar symptoms (but could not tell if it was the same problem or not):
> https://patchwork.kernel.org/project/linux-nfs/patch/[email protected]/
> [[PATCH/RFC] NFSv4: don't let hanging mounts block other mounts]
>
> Thanks, Mike

Hi Mike,

This is not a helpful reply but I was curious if I could reproduce
your issue but was not successful. I'm able to initiate a mount to an
unreachable-IP-address which hangs and then do another mount to an
existing server without issues. Ubuntu 21.04 seems to be 5.11 based so
I tried upstream 5.11 and I tried the latest upstream nfs-utils
(instead of what my distro has which was an older version).

To debug, perhaps get an output of the nfs4 and sunrpc tracepoints.
Or also get output from dmesg after doing “echo t >
/proc/sysrq-trigger” to see where the mounts are hanging.

2021-05-20 09:57:06

by Michael Wakabayashi

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

Hi Orna,

Thank you for looking.

I spent a couple of hours trying to get various
SystemTap?NFS scripts working but mostly got errors.

For example:
> [email protected]:~/src/systemtap-scripts/tracepoints# stap nfs4_fsinfo.stp
>?semantic error: unable to find tracepoint variable '$status' (alternatives: $$parms, $$vars, $task, $$name): identifier '$status' at nfs4_fsinfo.stp:7:11
>?? ? ? ? source: terror = $status
>?? ? ? ? ? ? ? ? ? ? ? ? ^
>?Pass 2: analysis failed. ?[man error::pass2]

If you have any stap scripts that work on Ubuntu
that you'd?like me to run?or have pointers on how
to setup my Ubuntu?environment to run them
successfully,?please let me know and I can try again..


Here's the call trace for the mount.nfs command
mounting the bad NFS server/10.1.1.1:

[Thu May 20 08:53:35 2021] task:mount.nfs ? ? ? state:D stack: ? ?0 pid:13903 ppid: 13900 flags:0x00004000
[Thu May 20 08:53:35 2021] Call Trace:
[Thu May 20 08:53:35 2021] ?? rpc_init_task+0x150/0x150 [sunrpc]
[Thu May 20 08:53:35 2021] ?__schedule+0x23d/0x670
[Thu May 20 08:53:35 2021] ?? rpc_init_task+0x150/0x150 [sunrpc]
[Thu May 20 08:53:35 2021] ?schedule+0x4f/0xc0
[Thu May 20 08:53:35 2021] ?rpc_wait_bit_killable+0x25/0xb0 [sunrpc]
[Thu May 20 08:53:35 2021] ?__wait_on_bit+0x33/0xa0
[Thu May 20 08:53:35 2021] ?? call_reserveresult+0xa0/0xa0 [sunrpc]
[Thu May 20 08:53:35 2021] ?out_of_line_wait_on_bit+0x8d/0xb0
[Thu May 20 08:53:35 2021] ?? var_wake_function+0x30/0x30
[Thu May 20 08:53:35 2021] ?__rpc_execute+0xd4/0x290 [sunrpc]
[Thu May 20 08:53:35 2021] ?rpc_execute+0x5e/0x80 [sunrpc]
[Thu May 20 08:53:35 2021] ?rpc_run_task+0x13d/0x180 [sunrpc]
[Thu May 20 08:53:35 2021] ?rpc_call_sync+0x51/0xa0 [sunrpc]
[Thu May 20 08:53:35 2021] ?rpc_create_xprt+0x177/0x1c0 [sunrpc]
[Thu May 20 08:53:35 2021] ?rpc_create+0x11f/0x220 [sunrpc]
[Thu May 20 08:53:35 2021] ?? __memcg_kmem_charge+0x7d/0xf0
[Thu May 20 08:53:35 2021] ?? _cond_resched+0x1a/0x50
[Thu May 20 08:53:35 2021] ?nfs_create_rpc_client+0x13a/0x180 [nfs]
[Thu May 20 08:53:35 2021] ?nfs4_init_client+0x205/0x290 [nfsv4]
[Thu May 20 08:53:35 2021] ?? __fscache_acquire_cookie+0x10a/0x210 [fscache]
[Thu May 20 08:53:35 2021] ?? nfs_fscache_get_client_cookie+0xa9/0x120 [nfs]
[Thu May 20 08:53:35 2021] ?? nfs_match_client+0x37/0x2a0 [nfs]
[Thu May 20 08:53:35 2021] ?nfs_get_client+0x14d/0x190 [nfs]
[Thu May 20 08:53:35 2021] ?nfs4_set_client+0xd3/0x120 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs4_init_server+0xf8/0x270 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs4_create_server+0x58/0xa0 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs4_try_get_tree+0x3a/0xc0 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs_get_tree+0x38/0x50 [nfs]
[Thu May 20 08:53:35 2021] ?vfs_get_tree+0x2a/0xc0
[Thu May 20 08:53:35 2021] ?do_new_mount+0x14b/0x1a0
[Thu May 20 08:53:35 2021] ?path_mount+0x1d4/0x4e0
[Thu May 20 08:53:35 2021] ?__x64_sys_mount+0x108/0x140
[Thu May 20 08:53:35 2021] ?do_syscall_64+0x38/0x90
[Thu May 20 08:53:35 2021] ?entry_SYSCALL_64_after_hwframe+0x44/0xa9


Here's the call trace for the mount.nfs command
mounting an available NFS server/10.188.76.67 (which was
blocked by the first mount.nfs command above):
[Thu May 20 08:53:35 2021] task:mount.nfs ? ? ? state:D stack: ? ?0 pid:13910 ppid: 13907 flags:0x00004000
[Thu May 20 08:53:35 2021] Call Trace:
[Thu May 20 08:53:35 2021] ?__schedule+0x23d/0x670
[Thu May 20 08:53:35 2021] ?schedule+0x4f/0xc0
[Thu May 20 08:53:35 2021] ?nfs_wait_client_init_complete+0x5a/0x90 [nfs]
[Thu May 20 08:53:35 2021] ?? wait_woken+0x80/0x80
[Thu May 20 08:53:35 2021] ?nfs_match_client+0x1de/0x2a0 [nfs]
[Thu May 20 08:53:35 2021] ?? pcpu_block_update_hint_alloc+0xcc/0x2d0
[Thu May 20 08:53:35 2021] ?nfs_get_client+0x62/0x190 [nfs]
[Thu May 20 08:53:35 2021] ?nfs4_set_client+0xd3/0x120 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs4_init_server+0xf8/0x270 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs4_create_server+0x58/0xa0 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs4_try_get_tree+0x3a/0xc0 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs_get_tree+0x38/0x50 [nfs]
[Thu May 20 08:53:35 2021] ?vfs_get_tree+0x2a/0xc0
[Thu May 20 08:53:35 2021] ?do_new_mount+0x14b/0x1a0
[Thu May 20 08:53:35 2021] ?path_mount+0x1d4/0x4e0
[Thu May 20 08:53:35 2021] ?__x64_sys_mount+0x108/0x140
[Thu May 20 08:53:35 2021] ?do_syscall_64+0x38/0x90
[Thu May 20 08:53:35 2021] ?entry_SYSCALL_64_after_hwframe+0x44/0xa9

I've pasted the entire dmesg output?here:?https://pastebin.com/90QJyAL9


This is the command I ran to mount an unreachable NFS server:
date; time strace mount.nfs 10.1.1.1:/nopath /tmp/mnt.dead; date
The strace log:?https://pastebin.com/5yVhm77u

This is the command I ran to mount the available NFS server:
date; time strace mount.nfs 10.188.76.67:/ /tmp/mnt.alive ; date
The strace log:?https://pastebin.com/kTimQ6vH

The procedure:
- run dmesg -C to clear dmesg logs
- run mount.nfs on 10.1.1.1 (this IP address is down/not responding to ping) which hung
- run mount.nfs on 10.188.76.67? which also hung
- "echo t >?/proc/sysrq-trigger" to dump the call traces for hung processes
- dmesg -T > dmesg.log to save the dmesg logs
- control-Z the mount.nfs command to 10.1.1.1
- "kill -9 %1" in the terminal to kill the mount.nfs to 10.1.1.1
- mount.nfs to 10.188.76.67 immediately mounts successfully
after the first mount is killed (we can see this by the timestamp in the logs files)


Thanks, Mike



From: Olga Kornievskaia <[email protected]>
Sent: Wednesday, May 19, 2021 12:15 PM
To: Michael Wakabayashi <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
?
On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
<[email protected]> wrote:
>
> Hi,
>
> We're seeing what looks like an NFSv4 issue.
>
> Mounting an NFS server that is down (ping to this NFS server's IP address does not respond) will block _all_ other NFS mount attempts even if the NFS servers are available and working properly (these subsequent mounts hang).
>
> If I kill the NFS mount process that's trying to mount the dead NFS server, the NFS mounts that were blocked will immediately unblock and mount successfully, which suggests the first mount command is blocking the other mount commands.
>
>
> I verified this behavior using a newly built mount.nfs command from the recent nfs-utils 2.5.3 package installed on a recent version of Ubuntu Cloud Image 21.04:
> * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fnfs%2Ffiles%2Fnfs-utils%2F2.5.3%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7Cfe9df245d11945bd70fd08d91afa7565%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637570485288219912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=90wWL%2FDqjJMsdlFDxF3hlmyhuS86VwNrtOD%2BLTGxY20%3D&amp;reserved=0
> * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloud-images.ubuntu.com%2Freleases%2Fhirsute%2Frelease-20210513%2Fubuntu-21.04-server-cloudimg-amd64.ova&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7Cfe9df245d11945bd70fd08d91afa7565%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637570485288219912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=iWIB0PuQ1HiOpFGmoViTmzreirD8EJRAkG%2BOw57QTKs%3D&amp;reserved=0
>
>
> The reason this looks like it is specific to NFSv4 is from the following output showing "vers=4.2":
> > $ strace /sbin/mount.nfs <unreachable-IP-address>:/path /tmp/mnt
> > [ ... cut ... ]
> > mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs", 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
>
> Also, if I try the same mount.nfs commands but specifying NFSv3, the mount to the dead NFS server hangs, but the mounts to the operational NFS servers do not block and mount successfully; this bug doesn't happen when using NFSv3.
>
>
> We reported this issue under util-linux here:
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkarelzak%2Futil-linux%2Fissues%2F1309&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7Cfe9df245d11945bd70fd08d91afa7565%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637570485288219912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WLxFM2Ls5PodPjrvA%2FZninvvHF6LlrO9ywSEMwgcR50%3D&amp;reserved=0
> [mounting nfs server which is down blocks all other nfs mounts on same machine #1309]
>
> I also found an older bug on this mailing list that had similar symptoms (but could not tell if it was the same problem or not):
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-nfs%2Fpatch%2F87vaori26c.fsf%40notabene.neil.brown.name%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7Cfe9df245d11945bd70fd08d91afa7565%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637570485288219912%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Y6A61VJQ6IDwsvUjjBc%2Fjrf80rvGSkaIjc0UhWRQ9kk%3D&amp;reserved=0
> [[PATCH/RFC] NFSv4: don't let hanging mounts block other mounts]
>
> Thanks, Mike

Hi Mike,

This is not a helpful reply but I was curious if I could reproduce
your issue but was not successful. I'm able to initiate a mount to an
unreachable-IP-address which hangs and then do another mount to an
existing server without issues. Ubuntu 21.04 seems to be 5.11 based so
I tried upstream 5.11 and I tried the latest upstream nfs-utils
(instead of what my distro has which was an older version).

To debug, perhaps get an output of the nfs4 and sunrpc tracepoints.
Or also get output from dmesg after doing ?echo t >
/proc/sysrq-trigger? to see where the mounts are hanging.

2021-05-20 10:47:20

by Michael Wakabayashi

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

Hi Olga,

If you are able to run privileged Docker containers
you might be able to reproduce the issue running
the following docker commands.

docker pull ubuntu:hirsute-20210514 # ubuntu hirsute is the latest version of ubuntu

# Run this to find the id of the ubuntu image ?which is needed in the next command
docker image ls # image id looks like "274cadba4412"

# Run the ubuntu container and start a bash shell.
# Replace <ubuntu_hirsute_image_id> with your ubuntu image id from the previous step.
docker container run --rm -it --privileged <ubuntu_hirsute_image_id> /bin/bash


# You should be inside the ubuntu container now and can run these Linux commands
apt-get update # this is needed, otherwise the next command fails

# This installs mount.nfs. Answer the two questions about geographic area and city.
# Ignore all the debconf error message (readline, dialog, frontend, etc)
apt install -y nfs-common

# execute mount commands
mkdir /tmp/mnt1 /tmp/mnt2
mount.nfs 1.1.1.1:/ /tmp/mnt1 &
mount.nfs <accessible-nfs-host:path> /tmp/mnt2 &
jobs ?# shows both mounts are hung

Thanks, Mike


From: Michael Wakabayashi <[email protected]>
Sent: Thursday, May 20, 2021 2:51 AM
To: Olga Kornievskaia <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
?
Hi Orna,

Thank you for looking.

I spent a couple of hours trying to get various
SystemTap?NFS scripts working but mostly got errors.

For example:
> [email protected]:~/src/systemtap-scripts/tracepoints# stap nfs4_fsinfo.stp
>?semantic error: unable to find tracepoint variable '$status' (alternatives: $$parms, $$vars, $task, $$name): identifier '$status' at nfs4_fsinfo.stp:7:11
>?? ? ? ? source: terror = $status
>?? ? ? ? ? ? ? ? ? ? ? ? ^
>?Pass 2: analysis failed. ?[man error::pass2]

If you have any stap scripts that work on Ubuntu
that you'd?like me to run?or have pointers on how
to setup my Ubuntu?environment to run them
successfully,?please let me know and I can try again..


Here's the call trace for the mount.nfs command
mounting the bad NFS server/10.1.1.1:

[Thu May 20 08:53:35 2021] task:mount.nfs ? ? ? state:D stack: ? ?0 pid:13903 ppid: 13900 flags:0x00004000
[Thu May 20 08:53:35 2021] Call Trace:
[Thu May 20 08:53:35 2021] ?? rpc_init_task+0x150/0x150 [sunrpc]
[Thu May 20 08:53:35 2021] ?__schedule+0x23d/0x670
[Thu May 20 08:53:35 2021] ?? rpc_init_task+0x150/0x150 [sunrpc]
[Thu May 20 08:53:35 2021] ?schedule+0x4f/0xc0
[Thu May 20 08:53:35 2021] ?rpc_wait_bit_killable+0x25/0xb0 [sunrpc]
[Thu May 20 08:53:35 2021] ?__wait_on_bit+0x33/0xa0
[Thu May 20 08:53:35 2021] ?? call_reserveresult+0xa0/0xa0 [sunrpc]
[Thu May 20 08:53:35 2021] ?out_of_line_wait_on_bit+0x8d/0xb0
[Thu May 20 08:53:35 2021] ?? var_wake_function+0x30/0x30
[Thu May 20 08:53:35 2021] ?__rpc_execute+0xd4/0x290 [sunrpc]
[Thu May 20 08:53:35 2021] ?rpc_execute+0x5e/0x80 [sunrpc]
[Thu May 20 08:53:35 2021] ?rpc_run_task+0x13d/0x180 [sunrpc]
[Thu May 20 08:53:35 2021] ?rpc_call_sync+0x51/0xa0 [sunrpc]
[Thu May 20 08:53:35 2021] ?rpc_create_xprt+0x177/0x1c0 [sunrpc]
[Thu May 20 08:53:35 2021] ?rpc_create+0x11f/0x220 [sunrpc]
[Thu May 20 08:53:35 2021] ?? __memcg_kmem_charge+0x7d/0xf0
[Thu May 20 08:53:35 2021] ?? _cond_resched+0x1a/0x50
[Thu May 20 08:53:35 2021] ?nfs_create_rpc_client+0x13a/0x180 [nfs]
[Thu May 20 08:53:35 2021] ?nfs4_init_client+0x205/0x290 [nfsv4]
[Thu May 20 08:53:35 2021] ?? __fscache_acquire_cookie+0x10a/0x210 [fscache]
[Thu May 20 08:53:35 2021] ?? nfs_fscache_get_client_cookie+0xa9/0x120 [nfs]
[Thu May 20 08:53:35 2021] ?? nfs_match_client+0x37/0x2a0 [nfs]
[Thu May 20 08:53:35 2021] ?nfs_get_client+0x14d/0x190 [nfs]
[Thu May 20 08:53:35 2021] ?nfs4_set_client+0xd3/0x120 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs4_init_server+0xf8/0x270 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs4_create_server+0x58/0xa0 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs4_try_get_tree+0x3a/0xc0 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs_get_tree+0x38/0x50 [nfs]
[Thu May 20 08:53:35 2021] ?vfs_get_tree+0x2a/0xc0
[Thu May 20 08:53:35 2021] ?do_new_mount+0x14b/0x1a0
[Thu May 20 08:53:35 2021] ?path_mount+0x1d4/0x4e0
[Thu May 20 08:53:35 2021] ?__x64_sys_mount+0x108/0x140
[Thu May 20 08:53:35 2021] ?do_syscall_64+0x38/0x90
[Thu May 20 08:53:35 2021] ?entry_SYSCALL_64_after_hwframe+0x44/0xa9


Here's the call trace for the mount.nfs command
mounting an available NFS server/10.188.76.67 (which was
blocked by the first mount.nfs command above):
[Thu May 20 08:53:35 2021] task:mount.nfs ? ? ? state:D stack: ? ?0 pid:13910 ppid: 13907 flags:0x00004000
[Thu May 20 08:53:35 2021] Call Trace:
[Thu May 20 08:53:35 2021] ?__schedule+0x23d/0x670
[Thu May 20 08:53:35 2021] ?schedule+0x4f/0xc0
[Thu May 20 08:53:35 2021] ?nfs_wait_client_init_complete+0x5a/0x90 [nfs]
[Thu May 20 08:53:35 2021] ?? wait_woken+0x80/0x80
[Thu May 20 08:53:35 2021] ?nfs_match_client+0x1de/0x2a0 [nfs]
[Thu May 20 08:53:35 2021] ?? pcpu_block_update_hint_alloc+0xcc/0x2d0
[Thu May 20 08:53:35 2021] ?nfs_get_client+0x62/0x190 [nfs]
[Thu May 20 08:53:35 2021] ?nfs4_set_client+0xd3/0x120 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs4_init_server+0xf8/0x270 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs4_create_server+0x58/0xa0 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs4_try_get_tree+0x3a/0xc0 [nfsv4]
[Thu May 20 08:53:35 2021] ?nfs_get_tree+0x38/0x50 [nfs]
[Thu May 20 08:53:35 2021] ?vfs_get_tree+0x2a/0xc0
[Thu May 20 08:53:35 2021] ?do_new_mount+0x14b/0x1a0
[Thu May 20 08:53:35 2021] ?path_mount+0x1d4/0x4e0
[Thu May 20 08:53:35 2021] ?__x64_sys_mount+0x108/0x140
[Thu May 20 08:53:35 2021] ?do_syscall_64+0x38/0x90
[Thu May 20 08:53:35 2021] ?entry_SYSCALL_64_after_hwframe+0x44/0xa9

I've pasted the entire dmesg output?here:?https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2F90QJyAL9&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932297691%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=xpENxRqSdn4aDPwB890%2Bh1RpB4dD2DfpnVGQeM6n0Io%3D&amp;reserved=0


This is the command I ran to mount an unreachable NFS server:
date; time strace mount.nfs 10.1.1.1:/nopath /tmp/mnt.dead; date
The strace log:?https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2F5yVhm77u&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932297691%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1uaPa3RDcR32%2Buh6NQd2F7wq5%2BvYZ0dnWcGve0bSj%2FQ%3D&amp;reserved=0

This is the command I ran to mount the available NFS server:
date; time strace mount.nfs 10.188.76.67:/ /tmp/mnt.alive ; date
The strace log:?https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FkTimQ6vH&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932297691%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1YwGfM0u1zQsWmOuZX6FJjYwlf3e01cMpJCDtK8JRI0%3D&amp;reserved=0

The procedure:
- run dmesg -C to clear dmesg logs
- run mount.nfs on 10.1.1.1 (this IP address is down/not responding to ping) which hung
- run mount.nfs on 10.188.76.67? which also hung
- "echo t >?/proc/sysrq-trigger" to dump the call traces for hung processes
- dmesg -T > dmesg.log to save the dmesg logs
- control-Z the mount.nfs command to 10.1.1.1
- "kill -9 %1" in the terminal to kill the mount.nfs to 10.1.1.1
- mount.nfs to 10.188.76.67 immediately mounts successfully
? after the first mount is killed (we can see this by the timestamp in the logs files)


Thanks, Mike



From: Olga Kornievskaia <[email protected]>
Sent: Wednesday, May 19, 2021 12:15 PM
To: Michael Wakabayashi <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
?
On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
<[email protected]> wrote:
>
> Hi,
>
> We're seeing what looks like an NFSv4 issue.
>
> Mounting an NFS server that is down (ping to this NFS server's IP address does not respond) will block _all_ other NFS mount attempts even if the NFS servers are available and working properly (these subsequent mounts hang).
>
> If I kill the NFS mount process that's trying to mount the dead NFS server, the NFS mounts that were blocked will immediately unblock and mount successfully, which suggests the first mount command is blocking the other mount commands.
>
>
> I verified this behavior using a newly built mount.nfs command from the recent nfs-utils 2.5.3 package installed on a recent version of Ubuntu Cloud Image 21.04:
> * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fnfs%2Ffiles%2Fnfs-utils%2F2.5.3%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932297691%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=M6sP%2BH1XO2Ojhe0WOSCuFqDQDKJoOUdi2xR6dO5vkjQ%3D&amp;reserved=0
> * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloud-images.ubuntu.com%2Freleases%2Fhirsute%2Frelease-20210513%2Fubuntu-21.04-server-cloudimg-amd64.ova&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932307646%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1iJYjPv5CHpQ6VJ0DRPvHXvbiqQZgaE9pt7VeGNKq0U%3D&amp;reserved=0
>
>
> The reason this looks like it is specific to NFSv4 is from the following output showing "vers=4.2":
> > $ strace /sbin/mount.nfs <unreachable-IP-address>:/path /tmp/mnt
> > [ ... cut ... ]
> > mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs", 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
>
> Also, if I try the same mount.nfs commands but specifying NFSv3, the mount to the dead NFS server hangs, but the mounts to the operational NFS servers do not block and mount successfully; this bug doesn't happen when using NFSv3.
>
>
> We reported this issue under util-linux here:
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkarelzak%2Futil-linux%2Fissues%2F1309&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932307646%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9N5DFtCJykN%2F3YigbajvZrBt6T2jpfGPyY6QujZDS2o%3D&amp;reserved=0
> [mounting nfs server which is down blocks all other nfs mounts on same machine #1309]
>
> I also found an older bug on this mailing list that had similar symptoms (but could not tell if it was the same problem or not):
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-nfs%2Fpatch%2F87vaori26c.fsf%40notabene.neil.brown.name%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932307646%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z8agzK7vCLHVnY8yzJ23D3aBrWF0jsD7SYxwT3mtqjg%3D&amp;reserved=0
> [[PATCH/RFC] NFSv4: don't let hanging mounts block other mounts]
>
> Thanks, Mike

Hi Mike,

This is not a helpful reply but I was curious if I could reproduce
your issue but was not successful. I'm able to initiate a mount to an
unreachable-IP-address which hangs and then do another mount to an
existing server without issues. Ubuntu 21.04 seems to be 5.11 based so
I tried upstream 5.11 and I tried the latest upstream nfs-utils
(instead of what my distro has which was an older version).

To debug, perhaps get an output of the nfs4 and sunrpc tracepoints.
Or also get output from dmesg after doing ?echo t >
/proc/sysrq-trigger? to see where the mounts are hanging.

2021-05-21 06:50:26

by Steve Dickson

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

Hey.

On 5/19/21 3:15 PM, Olga Kornievskaia wrote:
> On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
> <[email protected]> wrote:
>>
>> Hi,
>>
>> We're seeing what looks like an NFSv4 issue.
>>
>> Mounting an NFS server that is down (ping to this NFS server's IP address does not respond) will block _all_ other NFS mount attempts even if the NFS servers are available and working properly (these subsequent mounts hang).
>>
>> If I kill the NFS mount process that's trying to mount the dead NFS server, the NFS mounts that were blocked will immediately unblock and mount successfully, which suggests the first mount command is blocking the other mount commands.
>>
>>
>> I verified this behavior using a newly built mount.nfs command from the recent nfs-utils 2.5.3 package installed on a recent version of Ubuntu Cloud Image 21.04:
>> * https://sourceforge.net/projects/nfs/files/nfs-utils/2.5.3/
>> * https://cloud-images.ubuntu.com/releases/hirsute/release-20210513/ubuntu-21.04-server-cloudimg-amd64.ova
>>
>>
>> The reason this looks like it is specific to NFSv4 is from the following output showing "vers=4.2":
>>> $ strace /sbin/mount.nfs <unreachable-IP-address>:/path /tmp/mnt
>>> [ ... cut ... ]
>>> mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs", 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
>>
>> Also, if I try the same mount.nfs commands but specifying NFSv3, the mount to the dead NFS server hangs, but the mounts to the operational NFS servers do not block and mount successfully; this bug doesn't happen when using NFSv3.
>>
>>
>> We reported this issue under util-linux here:
>> https://github.com/karelzak/util-linux/issues/1309
>> [mounting nfs server which is down blocks all other nfs mounts on same machine #1309]
>>
>> I also found an older bug on this mailing list that had similar symptoms (but could not tell if it was the same problem or not):
>> https://patchwork.kernel.org/project/linux-nfs/patch/[email protected]/
>> [[PATCH/RFC] NFSv4: don't let hanging mounts block other mounts]
>>
>> Thanks, Mike
>
> Hi Mike,
>
> This is not a helpful reply but I was curious if I could reproduce
> your issue but was not successful. I'm able to initiate a mount to an
> unreachable-IP-address which hangs and then do another mount to an
> existing server without issues. Ubuntu 21.04 seems to be 5.11 based so
> I tried upstream 5.11 and I tried the latest upstream nfs-utils
> (instead of what my distro has which was an older version).
>
> To debug, perhaps get an output of the nfs4 and sunrpc tracepoints.
> Or also get output from dmesg after doing “echo t >
> /proc/sysrq-trigger” to see where the mounts are hanging.
>
It looks like Mike is correct... The first process (mount 1.1.1.1:/mnt) is
hung in trying the connection:

PID: 3394 TASK: ffff9da8c42734c0 CPU: 0 COMMAND: "mount.nfs"
#0 [ffffb44780f638c8] __schedule at ffffffff82d7959d
#1 [ffffb44780f63950] schedule at ffffffff82d79f2b
#2 [ffffb44780f63968] rpc_wait_bit_killable at ffffffffc05265ce [sunrpc]
#3 [ffffb44780f63980] __wait_on_bit at ffffffff82d7a4ba
#4 [ffffb44780f639b8] out_of_line_wait_on_bit at ffffffff82d7a5a6
#5 [ffffb44780f63a00] __rpc_execute at ffffffffc052fc8a [sunrpc]
#6 [ffffb44780f63a48] rpc_execute at ffffffffc05305a2 [sunrpc]
#7 [ffffb44780f63a68] rpc_run_task at ffffffffc05164e4 [sunrpc]
#8 [ffffb44780f63aa8] rpc_call_sync at ffffffffc0516573 [sunrpc]
#9 [ffffb44780f63b00] rpc_create_xprt at ffffffffc051672e [sunrpc]
#10 [ffffb44780f63b40] rpc_create at ffffffffc0516881 [sunrpc]
#11 [ffffb44780f63be8] nfs_create_rpc_client at ffffffffc0972319 [nfs]
#12 [ffffb44780f63c80] nfs4_init_client at ffffffffc0a17882 [nfsv4]
#13 [ffffb44780f63d70] nfs4_set_client at ffffffffc0a16ef8 [nfsv4]
#14 [ffffb44780f63de8] nfs4_create_server at ffffffffc0a188d8 [nfsv4]
#15 [ffffb44780f63e60] nfs4_try_get_tree at ffffffffc0a0bf69 [nfsv4]
#16 [ffffb44780f63e80] vfs_get_tree at ffffffff823b6068
#17 [ffffb44780f63ea0] path_mount at ffffffff823e3d8f
#18 [ffffb44780f63ef8] __x64_sys_mount at ffffffff823e45a3
#19 [ffffb44780f63f38] do_syscall_64 at ffffffff82d6aa50
#20 [ffffb44780f63f50] entry_SYSCALL_64_after_hwframe at ffffffff82e0007c

The second mount is hung up in nfs_match_client()/nfs_wait_client_init_complete
waiting for the first process to compile
nfs_match_client:

/* If a client is still initializing then we need to wait */
if (clp->cl_cons_state > NFS_CS_READY) {
refcount_inc(&clp->cl_count);
spin_unlock(&nn->nfs_client_lock);
error = nfs_wait_client_init_complete(clp);
nfs_put_client(clp);
spin_lock(&nn->nfs_client_lock);
if (error < 0)
return ERR_PTR(error);
goto again;
}

This code came in with commit c156618e15101 which fixed
a deadlock in nfs client initialization.

steved.

2021-05-21 08:42:29

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

Hi Mike,

Ok so I can reproduce it but the scenario is not exact what you are
stating. This is not an unreachable server. It requires that the
server is reachable but not responsive.

Using iptables I can say drop packets to port 2049 on one machine.
Then I mount that IP from the client. Your stack requires an "active"
but not responsive connection. And indeed until the 1st mount times
out (which it does), no other mounts go through.

But I think I agree with the previous comment that Trond made about
the need to wait for any clients that are in the middle initializing
until making any matches with it. I think a match can be made but the
initializing client can fail the initialization (based on what the
server returns).

My conclusion is: an unresponsive server will block other mounts but
only until timeout is reached.

On Thu, May 20, 2021 at 6:43 AM Michael Wakabayashi
<[email protected]> wrote:
>
> Hi Olga,
>
> If you are able to run privileged Docker containers
> you might be able to reproduce the issue running
> the following docker commands.
>
> docker pull ubuntu:hirsute-20210514 # ubuntu hirsute is the latest version of ubuntu
>
> # Run this to find the id of the ubuntu image which is needed in the next command
> docker image ls # image id looks like "274cadba4412"
>
> # Run the ubuntu container and start a bash shell.
> # Replace <ubuntu_hirsute_image_id> with your ubuntu image id from the previous step.
> docker container run --rm -it --privileged <ubuntu_hirsute_image_id> /bin/bash
>
>
> # You should be inside the ubuntu container now and can run these Linux commands
> apt-get update # this is needed, otherwise the next command fails
>
> # This installs mount.nfs. Answer the two questions about geographic area and city.
> # Ignore all the debconf error message (readline, dialog, frontend, etc)
> apt install -y nfs-common
>
> # execute mount commands
> mkdir /tmp/mnt1 /tmp/mnt2
> mount.nfs 1.1.1.1:/ /tmp/mnt1 &
> mount.nfs <accessible-nfs-host:path> /tmp/mnt2 &
> jobs # shows both mounts are hung
>
> Thanks, Mike
>
>
> From: Michael Wakabayashi <[email protected]>
> Sent: Thursday, May 20, 2021 2:51 AM
> To: Olga Kornievskaia <[email protected]>
> Cc: [email protected] <[email protected]>
> Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
>
> Hi Orna,
>
> Thank you for looking.
>
> I spent a couple of hours trying to get various
> SystemTap NFS scripts working but mostly got errors.
>
> For example:
> > [email protected]:~/src/systemtap-scripts/tracepoints# stap nfs4_fsinfo.stp
> > semantic error: unable to find tracepoint variable '$status' (alternatives: $$parms, $$vars, $task, $$name): identifier '$status' at nfs4_fsinfo.stp:7:11
> > source: terror = $status
> > ^
> > Pass 2: analysis failed. [man error::pass2]
>
> If you have any stap scripts that work on Ubuntu
> that you'd like me to run or have pointers on how
> to setup my Ubuntu environment to run them
> successfully, please let me know and I can try again..
>
>
> Here's the call trace for the mount.nfs command
> mounting the bad NFS server/10.1.1.1:
>
> [Thu May 20 08:53:35 2021] task:mount.nfs state:D stack: 0 pid:13903 ppid: 13900 flags:0x00004000
> [Thu May 20 08:53:35 2021] Call Trace:
> [Thu May 20 08:53:35 2021] ? rpc_init_task+0x150/0x150 [sunrpc]
> [Thu May 20 08:53:35 2021] __schedule+0x23d/0x670
> [Thu May 20 08:53:35 2021] ? rpc_init_task+0x150/0x150 [sunrpc]
> [Thu May 20 08:53:35 2021] schedule+0x4f/0xc0
> [Thu May 20 08:53:35 2021] rpc_wait_bit_killable+0x25/0xb0 [sunrpc]
> [Thu May 20 08:53:35 2021] __wait_on_bit+0x33/0xa0
> [Thu May 20 08:53:35 2021] ? call_reserveresult+0xa0/0xa0 [sunrpc]
> [Thu May 20 08:53:35 2021] out_of_line_wait_on_bit+0x8d/0xb0
> [Thu May 20 08:53:35 2021] ? var_wake_function+0x30/0x30
> [Thu May 20 08:53:35 2021] __rpc_execute+0xd4/0x290 [sunrpc]
> [Thu May 20 08:53:35 2021] rpc_execute+0x5e/0x80 [sunrpc]
> [Thu May 20 08:53:35 2021] rpc_run_task+0x13d/0x180 [sunrpc]
> [Thu May 20 08:53:35 2021] rpc_call_sync+0x51/0xa0 [sunrpc]
> [Thu May 20 08:53:35 2021] rpc_create_xprt+0x177/0x1c0 [sunrpc]
> [Thu May 20 08:53:35 2021] rpc_create+0x11f/0x220 [sunrpc]
> [Thu May 20 08:53:35 2021] ? __memcg_kmem_charge+0x7d/0xf0
> [Thu May 20 08:53:35 2021] ? _cond_resched+0x1a/0x50
> [Thu May 20 08:53:35 2021] nfs_create_rpc_client+0x13a/0x180 [nfs]
> [Thu May 20 08:53:35 2021] nfs4_init_client+0x205/0x290 [nfsv4]
> [Thu May 20 08:53:35 2021] ? __fscache_acquire_cookie+0x10a/0x210 [fscache]
> [Thu May 20 08:53:35 2021] ? nfs_fscache_get_client_cookie+0xa9/0x120 [nfs]
> [Thu May 20 08:53:35 2021] ? nfs_match_client+0x37/0x2a0 [nfs]
> [Thu May 20 08:53:35 2021] nfs_get_client+0x14d/0x190 [nfs]
> [Thu May 20 08:53:35 2021] nfs4_set_client+0xd3/0x120 [nfsv4]
> [Thu May 20 08:53:35 2021] nfs4_init_server+0xf8/0x270 [nfsv4]
> [Thu May 20 08:53:35 2021] nfs4_create_server+0x58/0xa0 [nfsv4]
> [Thu May 20 08:53:35 2021] nfs4_try_get_tree+0x3a/0xc0 [nfsv4]
> [Thu May 20 08:53:35 2021] nfs_get_tree+0x38/0x50 [nfs]
> [Thu May 20 08:53:35 2021] vfs_get_tree+0x2a/0xc0
> [Thu May 20 08:53:35 2021] do_new_mount+0x14b/0x1a0
> [Thu May 20 08:53:35 2021] path_mount+0x1d4/0x4e0
> [Thu May 20 08:53:35 2021] __x64_sys_mount+0x108/0x140
> [Thu May 20 08:53:35 2021] do_syscall_64+0x38/0x90
> [Thu May 20 08:53:35 2021] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
>
> Here's the call trace for the mount.nfs command
> mounting an available NFS server/10.188.76.67 (which was
> blocked by the first mount.nfs command above):
> [Thu May 20 08:53:35 2021] task:mount.nfs state:D stack: 0 pid:13910 ppid: 13907 flags:0x00004000
> [Thu May 20 08:53:35 2021] Call Trace:
> [Thu May 20 08:53:35 2021] __schedule+0x23d/0x670
> [Thu May 20 08:53:35 2021] schedule+0x4f/0xc0
> [Thu May 20 08:53:35 2021] nfs_wait_client_init_complete+0x5a/0x90 [nfs]
> [Thu May 20 08:53:35 2021] ? wait_woken+0x80/0x80
> [Thu May 20 08:53:35 2021] nfs_match_client+0x1de/0x2a0 [nfs]
> [Thu May 20 08:53:35 2021] ? pcpu_block_update_hint_alloc+0xcc/0x2d0
> [Thu May 20 08:53:35 2021] nfs_get_client+0x62/0x190 [nfs]
> [Thu May 20 08:53:35 2021] nfs4_set_client+0xd3/0x120 [nfsv4]
> [Thu May 20 08:53:35 2021] nfs4_init_server+0xf8/0x270 [nfsv4]
> [Thu May 20 08:53:35 2021] nfs4_create_server+0x58/0xa0 [nfsv4]
> [Thu May 20 08:53:35 2021] nfs4_try_get_tree+0x3a/0xc0 [nfsv4]
> [Thu May 20 08:53:35 2021] nfs_get_tree+0x38/0x50 [nfs]
> [Thu May 20 08:53:35 2021] vfs_get_tree+0x2a/0xc0
> [Thu May 20 08:53:35 2021] do_new_mount+0x14b/0x1a0
> [Thu May 20 08:53:35 2021] path_mount+0x1d4/0x4e0
> [Thu May 20 08:53:35 2021] __x64_sys_mount+0x108/0x140
> [Thu May 20 08:53:35 2021] do_syscall_64+0x38/0x90
> [Thu May 20 08:53:35 2021] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> I've pasted the entire dmesg output here: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2F90QJyAL9&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932297691%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=xpENxRqSdn4aDPwB890%2Bh1RpB4dD2DfpnVGQeM6n0Io%3D&amp;reserved=0
>
>
> This is the command I ran to mount an unreachable NFS server:
> date; time strace mount.nfs 10.1.1.1:/nopath /tmp/mnt.dead; date
> The strace log: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2F5yVhm77u&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932297691%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1uaPa3RDcR32%2Buh6NQd2F7wq5%2BvYZ0dnWcGve0bSj%2FQ%3D&amp;reserved=0
>
> This is the command I ran to mount the available NFS server:
> date; time strace mount.nfs 10.188.76.67:/ /tmp/mnt.alive ; date
> The strace log: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FkTimQ6vH&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932297691%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1YwGfM0u1zQsWmOuZX6FJjYwlf3e01cMpJCDtK8JRI0%3D&amp;reserved=0
>
> The procedure:
> - run dmesg -C to clear dmesg logs
> - run mount.nfs on 10.1.1.1 (this IP address is down/not responding to ping) which hung
> - run mount.nfs on 10.188.76.67 which also hung
> - "echo t > /proc/sysrq-trigger" to dump the call traces for hung processes
> - dmesg -T > dmesg.log to save the dmesg logs
> - control-Z the mount.nfs command to 10.1.1.1
> - "kill -9 %1" in the terminal to kill the mount.nfs to 10.1.1.1
> - mount.nfs to 10.188.76.67 immediately mounts successfully
> after the first mount is killed (we can see this by the timestamp in the logs files)
>
>
> Thanks, Mike
>
>
>
> From: Olga Kornievskaia <[email protected]>
> Sent: Wednesday, May 19, 2021 12:15 PM
> To: Michael Wakabayashi <[email protected]>
> Cc: [email protected] <[email protected]>
> Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
>
> On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
> <[email protected]> wrote:
> >
> > Hi,
> >
> > We're seeing what looks like an NFSv4 issue.
> >
> > Mounting an NFS server that is down (ping to this NFS server's IP address does not respond) will block _all_ other NFS mount attempts even if the NFS servers are available and working properly (these subsequent mounts hang).
> >
> > If I kill the NFS mount process that's trying to mount the dead NFS server, the NFS mounts that were blocked will immediately unblock and mount successfully, which suggests the first mount command is blocking the other mount commands.
> >
> >
> > I verified this behavior using a newly built mount.nfs command from the recent nfs-utils 2.5.3 package installed on a recent version of Ubuntu Cloud Image 21.04:
> > * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fnfs%2Ffiles%2Fnfs-utils%2F2.5.3%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932297691%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=M6sP%2BH1XO2Ojhe0WOSCuFqDQDKJoOUdi2xR6dO5vkjQ%3D&amp;reserved=0
> > * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloud-images.ubuntu.com%2Freleases%2Fhirsute%2Frelease-20210513%2Fubuntu-21.04-server-cloudimg-amd64.ova&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932307646%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1iJYjPv5CHpQ6VJ0DRPvHXvbiqQZgaE9pt7VeGNKq0U%3D&amp;reserved=0
> >
> >
> > The reason this looks like it is specific to NFSv4 is from the following output showing "vers=4.2":
> > > $ strace /sbin/mount.nfs <unreachable-IP-address>:/path /tmp/mnt
> > > [ ... cut ... ]
> > > mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs", 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
> >
> > Also, if I try the same mount.nfs commands but specifying NFSv3, the mount to the dead NFS server hangs, but the mounts to the operational NFS servers do not block and mount successfully; this bug doesn't happen when using NFSv3.
> >
> >
> > We reported this issue under util-linux here:
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkarelzak%2Futil-linux%2Fissues%2F1309&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932307646%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9N5DFtCJykN%2F3YigbajvZrBt6T2jpfGPyY6QujZDS2o%3D&amp;reserved=0
> > [mounting nfs server which is down blocks all other nfs mounts on same machine #1309]
> >
> > I also found an older bug on this mailing list that had similar symptoms (but could not tell if it was the same problem or not):
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-nfs%2Fpatch%2F87vaori26c.fsf%40notabene.neil.brown.name%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C2759b06e70634987fce108d91b754e5f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571012932307646%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z8agzK7vCLHVnY8yzJ23D3aBrWF0jsD7SYxwT3mtqjg%3D&amp;reserved=0
> > [[PATCH/RFC] NFSv4: don't let hanging mounts block other mounts]
> >
> > Thanks, Mike
>
> Hi Mike,
>
> This is not a helpful reply but I was curious if I could reproduce
> your issue but was not successful. I'm able to initiate a mount to an
> unreachable-IP-address which hangs and then do another mount to an
> existing server without issues. Ubuntu 21.04 seems to be 5.11 based so
> I tried upstream 5.11 and I tried the latest upstream nfs-utils
> (instead of what my distro has which was an older version).
>
> To debug, perhaps get an output of the nfs4 and sunrpc tracepoints.
> Or also get output from dmesg after doing “echo t >
> /proc/sysrq-trigger” to see where the mounts are hanging.

2021-05-21 20:27:20

by Michael Wakabayashi

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

Hi Steve and Olga,

We run multiple Kubernetes clusters.
These clusters are composed of hundreds of Kubernetes nodes.
Any of these nodes can NFS mount on behalf of the containers running on these nodes.
We've seen several times in the past few months an NFS mount hang, and then several hundred up to several thousand NFS mounts blocked by this hung NFS mount processes (we have many "testing" workloads that access NFS).

Having several hundred NFS mounts blocked on a node causes the Kubernetes?node to become unstable and **severely** degrades service.

We did not expect a hung NFS mount to block every other NFS mount, especially when the other mounts are unrelated and otherwise working properly/healthy.

Can this behavior be changed?

Thanks, Mike



From: Olga Kornievskaia <[email protected]>
Sent: Thursday, May 20, 2021 4:51 PM
To: Michael Wakabayashi <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
?
Hi Mike,

Ok so I can reproduce it but the scenario is not exact what you are
stating. This is not an unreachable server. It requires that the
server is reachable but not responsive.

Using iptables I can say drop packets to port 2049 on one machine.
Then I mount that IP from the client. Your stack requires an "active"
but not responsive connection. And indeed until the 1st mount times
out (which it does), no other mounts go through.

But I think I agree with the previous comment that Trond made about
the need to wait for any clients that are in the middle initializing
until making any matches with it. I think a match can be made but the
initializing client can fail the initialization (based on what the
server returns).

My conclusion is: an unresponsive server will block other mounts but
only until timeout is reached.

On Thu, May 20, 2021 at 6:43 AM Michael Wakabayashi
<[email protected]> wrote:
>
> Hi Olga,
>
> If you are able to run privileged Docker containers
> you might be able to reproduce the issue running
> the following docker commands.
>
> docker pull ubuntu:hirsute-20210514 # ubuntu hirsute is the latest version of ubuntu
>
> # Run this to find the id of the ubuntu image? which is needed in the next command
> docker image ls # image id looks like "274cadba4412"
>
> # Run the ubuntu container and start a bash shell.
> # Replace <ubuntu_hirsute_image_id> with your ubuntu image id from the previous step.
> docker container run --rm -it --privileged <ubuntu_hirsute_image_id> /bin/bash
>
>
> # You should be inside the ubuntu container now and can run these Linux commands
> apt-get update # this is needed, otherwise the next command fails
>
> # This installs mount.nfs. Answer the two questions about geographic area and city.
> # Ignore all the debconf error message (readline, dialog, frontend, etc)
> apt install -y nfs-common
>
> # execute mount commands
> mkdir /tmp/mnt1 /tmp/mnt2
> mount.nfs 1.1.1.1:/ /tmp/mnt1 &
> mount.nfs <accessible-nfs-host:path> /tmp/mnt2 &
> jobs? # shows both mounts are hung
>
> Thanks, Mike
>
>
> From: Michael Wakabayashi <[email protected]>
> Sent: Thursday, May 20, 2021 2:51 AM
> To: Olga Kornievskaia <[email protected]>
> Cc: [email protected] <[email protected]>
> Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
>
> Hi Orna,
>
> Thank you for looking.
>
> I spent a couple of hours trying to get various
> SystemTap NFS scripts working but mostly got errors.
>
> For example:
> > [email protected]:~/src/systemtap-scripts/tracepoints# stap nfs4_fsinfo.stp
> > semantic error: unable to find tracepoint variable '$status' (alternatives: $$parms, $$vars, $task, $$name): identifier '$status' at nfs4_fsinfo.stp:7:11
> >???????? source: terror = $status
> >???????????????????????? ^
> > Pass 2: analysis failed.? [man error::pass2]
>
> If you have any stap scripts that work on Ubuntu
> that you'd like me to run or have pointers on how
> to setup my Ubuntu environment to run them
> successfully, please let me know and I can try again..
>
>
> Here's the call trace for the mount.nfs command
> mounting the bad NFS server/10.1.1.1:
>
> [Thu May 20 08:53:35 2021] task:mount.nfs?????? state:D stack:??? 0 pid:13903 ppid: 13900 flags:0x00004000
> [Thu May 20 08:53:35 2021] Call Trace:
> [Thu May 20 08:53:35 2021]? ? rpc_init_task+0x150/0x150 [sunrpc]
> [Thu May 20 08:53:35 2021]? __schedule+0x23d/0x670
> [Thu May 20 08:53:35 2021]? ? rpc_init_task+0x150/0x150 [sunrpc]
> [Thu May 20 08:53:35 2021]? schedule+0x4f/0xc0
> [Thu May 20 08:53:35 2021]? rpc_wait_bit_killable+0x25/0xb0 [sunrpc]
> [Thu May 20 08:53:35 2021]? __wait_on_bit+0x33/0xa0
> [Thu May 20 08:53:35 2021]? ? call_reserveresult+0xa0/0xa0 [sunrpc]
> [Thu May 20 08:53:35 2021]? out_of_line_wait_on_bit+0x8d/0xb0
> [Thu May 20 08:53:35 2021]? ? var_wake_function+0x30/0x30
> [Thu May 20 08:53:35 2021]? __rpc_execute+0xd4/0x290 [sunrpc]
> [Thu May 20 08:53:35 2021]? rpc_execute+0x5e/0x80 [sunrpc]
> [Thu May 20 08:53:35 2021]? rpc_run_task+0x13d/0x180 [sunrpc]
> [Thu May 20 08:53:35 2021]? rpc_call_sync+0x51/0xa0 [sunrpc]
> [Thu May 20 08:53:35 2021]? rpc_create_xprt+0x177/0x1c0 [sunrpc]
> [Thu May 20 08:53:35 2021]? rpc_create+0x11f/0x220 [sunrpc]
> [Thu May 20 08:53:35 2021]? ? __memcg_kmem_charge+0x7d/0xf0
> [Thu May 20 08:53:35 2021]? ? _cond_resched+0x1a/0x50
> [Thu May 20 08:53:35 2021]? nfs_create_rpc_client+0x13a/0x180 [nfs]
> [Thu May 20 08:53:35 2021]? nfs4_init_client+0x205/0x290 [nfsv4]
> [Thu May 20 08:53:35 2021]? ? __fscache_acquire_cookie+0x10a/0x210 [fscache]
> [Thu May 20 08:53:35 2021]? ? nfs_fscache_get_client_cookie+0xa9/0x120 [nfs]
> [Thu May 20 08:53:35 2021]? ? nfs_match_client+0x37/0x2a0 [nfs]
> [Thu May 20 08:53:35 2021]? nfs_get_client+0x14d/0x190 [nfs]
> [Thu May 20 08:53:35 2021]? nfs4_set_client+0xd3/0x120 [nfsv4]
> [Thu May 20 08:53:35 2021]? nfs4_init_server+0xf8/0x270 [nfsv4]
> [Thu May 20 08:53:35 2021]? nfs4_create_server+0x58/0xa0 [nfsv4]
> [Thu May 20 08:53:35 2021]? nfs4_try_get_tree+0x3a/0xc0 [nfsv4]
> [Thu May 20 08:53:35 2021]? nfs_get_tree+0x38/0x50 [nfs]
> [Thu May 20 08:53:35 2021]? vfs_get_tree+0x2a/0xc0
> [Thu May 20 08:53:35 2021]? do_new_mount+0x14b/0x1a0
> [Thu May 20 08:53:35 2021]? path_mount+0x1d4/0x4e0
> [Thu May 20 08:53:35 2021]? __x64_sys_mount+0x108/0x140
> [Thu May 20 08:53:35 2021]? do_syscall_64+0x38/0x90
> [Thu May 20 08:53:35 2021]? entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
>
> Here's the call trace for the mount.nfs command
> mounting an available NFS server/10.188.76.67 (which was
> blocked by the first mount.nfs command above):
> [Thu May 20 08:53:35 2021] task:mount.nfs?????? state:D stack:??? 0 pid:13910 ppid: 13907 flags:0x00004000
> [Thu May 20 08:53:35 2021] Call Trace:
> [Thu May 20 08:53:35 2021]? __schedule+0x23d/0x670
> [Thu May 20 08:53:35 2021]? schedule+0x4f/0xc0
> [Thu May 20 08:53:35 2021]? nfs_wait_client_init_complete+0x5a/0x90 [nfs]
> [Thu May 20 08:53:35 2021]? ? wait_woken+0x80/0x80
> [Thu May 20 08:53:35 2021]? nfs_match_client+0x1de/0x2a0 [nfs]
> [Thu May 20 08:53:35 2021]? ? pcpu_block_update_hint_alloc+0xcc/0x2d0
> [Thu May 20 08:53:35 2021]? nfs_get_client+0x62/0x190 [nfs]
> [Thu May 20 08:53:35 2021]? nfs4_set_client+0xd3/0x120 [nfsv4]
> [Thu May 20 08:53:35 2021]? nfs4_init_server+0xf8/0x270 [nfsv4]
> [Thu May 20 08:53:35 2021]? nfs4_create_server+0x58/0xa0 [nfsv4]
> [Thu May 20 08:53:35 2021]? nfs4_try_get_tree+0x3a/0xc0 [nfsv4]
> [Thu May 20 08:53:35 2021]? nfs_get_tree+0x38/0x50 [nfs]
> [Thu May 20 08:53:35 2021]? vfs_get_tree+0x2a/0xc0
> [Thu May 20 08:53:35 2021]? do_new_mount+0x14b/0x1a0
> [Thu May 20 08:53:35 2021]? path_mount+0x1d4/0x4e0
> [Thu May 20 08:53:35 2021]? __x64_sys_mount+0x108/0x140
> [Thu May 20 08:53:35 2021]? do_syscall_64+0x38/0x90
> [Thu May 20 08:53:35 2021]? entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> I've pasted the entire dmesg output here: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2F90QJyAL9&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274079272%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=IRCXJjIXojFm1xaFuSy11LW0QFbMN04C%2BjzTwI3FCVM%3D&amp;reserved=0
>
>
> This is the command I ran to mount an unreachable NFS server:
> date; time strace mount.nfs 10.1.1.1:/nopath /tmp/mnt.dead; date
> The strace log: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2F5yVhm77u&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274079272%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NMI42XLcCZiECy5VUAQ7VyAWepTVLX22%2BBnBWaDOD4s%3D&amp;reserved=0
>
> This is the command I ran to mount the available NFS server:
> date; time strace mount.nfs 10.188.76.67:/ /tmp/mnt.alive ; date
> The strace log: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FkTimQ6vH&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274089229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Jy7dGBjQrViwXNJFYowg7x%2BLJZjxf6PUwvEQV7Ql5JM%3D&amp;reserved=0
>
> The procedure:
> - run dmesg -C to clear dmesg logs
> - run mount.nfs on 10.1.1.1 (this IP address is down/not responding to ping) which hung
> - run mount.nfs on 10.188.76.67? which also hung
> - "echo t > /proc/sysrq-trigger" to dump the call traces for hung processes
> - dmesg -T > dmesg.log to save the dmesg logs
> - control-Z the mount.nfs command to 10.1.1.1
> - "kill -9 %1" in the terminal to kill the mount.nfs to 10.1.1.1
> - mount.nfs to 10.188.76.67 immediately mounts successfully
>?? after the first mount is killed (we can see this by the timestamp in the logs files)
>
>
> Thanks, Mike
>
>
>
> From: Olga Kornievskaia <[email protected]>
> Sent: Wednesday, May 19, 2021 12:15 PM
> To: Michael Wakabayashi <[email protected]>
> Cc: [email protected] <[email protected]>
> Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
>
> On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
> <[email protected]> wrote:
> >
> > Hi,
> >
> > We're seeing what looks like an NFSv4 issue.
> >
> > Mounting an NFS server that is down (ping to this NFS server's IP address does not respond) will block _all_ other NFS mount attempts even if the NFS servers are available and working properly (these subsequent mounts hang).
> >
> > If I kill the NFS mount process that's trying to mount the dead NFS server, the NFS mounts that were blocked will immediately unblock and mount successfully, which suggests the first mount command is blocking the other mount commands.
> >
> >
> > I verified this behavior using a newly built mount.nfs command from the recent nfs-utils 2.5.3 package installed on a recent version of Ubuntu Cloud Image 21.04:
> > * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fnfs%2Ffiles%2Fnfs-utils%2F2.5.3%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274089229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9U%2FiCu0sH6gKkMk8Jq4nM6B5Mwhx7LYYb8%2FtvtezoCY%3D&amp;reserved=0
> > * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloud-images.ubuntu.com%2Freleases%2Fhirsute%2Frelease-20210513%2Fubuntu-21.04-server-cloudimg-amd64.ova&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274089229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=58zX0jg4xJrwxb%2BDtw5uzyOxhEvdxR25bYka65XNdyI%3D&amp;reserved=0
> >
> >
> > The reason this looks like it is specific to NFSv4 is from the following output showing "vers=4.2":
> > > $ strace /sbin/mount.nfs <unreachable-IP-address>:/path /tmp/mnt
> > > [ ... cut ... ]
> > > mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs", 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
> >
> > Also, if I try the same mount.nfs commands but specifying NFSv3, the mount to the dead NFS server hangs, but the mounts to the operational NFS servers do not block and mount successfully; this bug doesn't happen when using NFSv3.
> >
> >
> > We reported this issue under util-linux here:
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkarelzak%2Futil-linux%2Fissues%2F1309&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274089229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=vzhKPE8kaesr1jGRCghvn%2Be84%2FUXXXj7CbivwIrpLUE%3D&amp;reserved=0
> > [mounting nfs server which is down blocks all other nfs mounts on same machine #1309]
> >
> > I also found an older bug on this mailing list that had similar symptoms (but could not tell if it was the same problem or not):
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-nfs%2Fpatch%2F87vaori26c.fsf%40notabene.neil.brown.name%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C64c924b4b7054405307108d91bea44fe%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571515274089229%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4kr9PxATC2Sv74njqJcajMGterOIIKCvtrZQHgGkqPE%3D&amp;reserved=0
> > [[PATCH/RFC] NFSv4: don't let hanging mounts block other mounts]
> >
> > Thanks, Mike
>
> Hi Mike,
>
> This is not a helpful reply but I was curious if I could reproduce
> your issue but was not successful. I'm able to initiate a mount to an
> unreachable-IP-address which hangs and then do another mount to an
> existing server without issues. Ubuntu 21.04 seems to be 5.11 based so
> I tried upstream 5.11 and I tried the latest upstream nfs-utils
> (instead of what my distro has which was an older version).
>
> To debug, perhaps get an output of the nfs4 and sunrpc tracepoints.
> Or also get output from dmesg after doing ?echo t >
> /proc/sysrq-trigger? to see where the mounts are hanging.

2021-05-21 20:27:58

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

On Fri, May 21, 2021 at 3:09 PM Michael Wakabayashi
<[email protected]> wrote:
>
> > This code came in with commit c156618e15101 which fixed a deadlock in nfs client initialization.
>
> > My conclusion is: an unresponsive server will block other mounts but only until timeout is reached.
>
> Hi Steve and Olga,
>
> We run multiple Kubernetes clusters.
> These clusters are composed of hundreds of Kubernetes nodes.
> Any of these nodes can NFS mount on behalf of the containers running on these nodes.
> We've seen several times in the past few months an NFS mount hang, and then several hundred up to several thousand NFS mounts blocked by this hung NFS mount processes (we have many "testing" workloads that access NFS).
> Having several hundred NFS mounts blocked on a node causes the Kubernetes node to become unstable and severely degrades service.
>
> We did not expect a hung NFS mount to block every other NFS mount, especially when the other mounts are unrelated and otherwise working properly.
>
> Can this behavior be changed?

Hi Michael,

I'm not sure if the design can be changed. But I would like to know
what's a legitimate reason for a machine to have a listening but not
responsive port 2049 (I'm sorry I don't particularly care for the
explanation of "because this is how things currently work in
containers, Kubernetes"). Seems like the problem should be fixed
there. There is no issue if a mount goes to an IP that has nothing
listening on port 2049.

Again I have no comments on the design change: or rather my comment
was I don't see a way. If you have 2 parallel clients initializing and
the goal is to have at most one client if both are the same, then I
don't see a way besides serializing it as it's done now.

>
> Thanks, Mike
>
>
> ________________________________
> From: Steve Dickson <[email protected]>
> Sent: Thursday, May 20, 2021 11:42 AM
> To: Olga Kornievskaia <[email protected]>; Michael Wakabayashi <[email protected]>
> Cc: [email protected] <[email protected]>
> Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
>
> Hey.
>
> On 5/19/21 3:15 PM, Olga Kornievskaia wrote:
> > On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
> > <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> We're seeing what looks like an NFSv4 issue.
> >>
> >> Mounting an NFS server that is down (ping to this NFS server's IP address does not respond) will block _all_ other NFS mount attempts even if the NFS servers are available and working properly (these subsequent mounts hang).
> >>
> >> If I kill the NFS mount process that's trying to mount the dead NFS server, the NFS mounts that were blocked will immediately unblock and mount successfully, which suggests the first mount command is blocking the other mount commands.
> >>
> >>
> >> I verified this behavior using a newly built mount.nfs command from the recent nfs-utils 2.5.3 package installed on a recent version of Ubuntu Cloud Image 21.04:
> >> * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fnfs%2Ffiles%2Fnfs-utils%2F2.5.3%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7Cb6738a1c16d84aaeea4708d91bbeae36%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571328052661113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9hzJL1yA2EfyWsfuMOhH%2FpMUF%2BneUy3Q2XgCazi3CQw%3D&amp;reserved=0
> >> * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloud-images.ubuntu.com%2Freleases%2Fhirsute%2Frelease-20210513%2Fubuntu-21.04-server-cloudimg-amd64.ova&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7Cb6738a1c16d84aaeea4708d91bbeae36%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571328052661113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=JbfU1Ym%2B6ZTJUwzPZlPpZtk8Kc0TLfSY7aDDJWr5AuU%3D&amp;reserved=0
> >>
> >>
> >> The reason this looks like it is specific to NFSv4 is from the following output showing "vers=4.2":
> >>> $ strace /sbin/mount.nfs <unreachable-IP-address>:/path /tmp/mnt
> >>> [ ... cut ... ]
> >>> mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs", 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
> >>
> >> Also, if I try the same mount.nfs commands but specifying NFSv3, the mount to the dead NFS server hangs, but the mounts to the operational NFS servers do not block and mount successfully; this bug doesn't happen when using NFSv3.
> >>
> >>
> >> We reported this issue under util-linux here:
> >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkarelzak%2Futil-linux%2Fissues%2F1309&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7Cb6738a1c16d84aaeea4708d91bbeae36%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571328052661113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9dDoFaVP%2FjW6PPZTopYItCWTL7G8Fp62PKPf7ZtCWqs%3D&amp;reserved=0
> >> [mounting nfs server which is down blocks all other nfs mounts on same machine #1309]
> >>
> >> I also found an older bug on this mailing list that had similar symptoms (but could not tell if it was the same problem or not):
> >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-nfs%2Fpatch%2F87vaori26c.fsf%40notabene.neil.brown.name%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7Cb6738a1c16d84aaeea4708d91bbeae36%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637571328052661113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=u0F20wNC6L40aTwYKtF2GD7eAzlrgotrPvGED%2FWwlzc%3D&amp;reserved=0
> >> [[PATCH/RFC] NFSv4: don't let hanging mounts block other mounts]
> >>
> >> Thanks, Mike
> >
> > Hi Mike,
> >
> > This is not a helpful reply but I was curious if I could reproduce
> > your issue but was not successful. I'm able to initiate a mount to an
> > unreachable-IP-address which hangs and then do another mount to an
> > existing server without issues. Ubuntu 21.04 seems to be 5.11 based so
> > I tried upstream 5.11 and I tried the latest upstream nfs-utils
> > (instead of what my distro has which was an older version).
> >
> > To debug, perhaps get an output of the nfs4 and sunrpc tracepoints.
> > Or also get output from dmesg after doing “echo t >
> > /proc/sysrq-trigger” to see where the mounts are hanging.
> >
> It looks like Mike is correct... The first process (mount 1.1.1.1:/mnt) is
> hung in trying the connection:
>
> PID: 3394 TASK: ffff9da8c42734c0 CPU: 0 COMMAND: "mount.nfs"
> #0 [ffffb44780f638c8] __schedule at ffffffff82d7959d
> #1 [ffffb44780f63950] schedule at ffffffff82d79f2b
> #2 [ffffb44780f63968] rpc_wait_bit_killable at ffffffffc05265ce [sunrpc]
> #3 [ffffb44780f63980] __wait_on_bit at ffffffff82d7a4ba
> #4 [ffffb44780f639b8] out_of_line_wait_on_bit at ffffffff82d7a5a6
> #5 [ffffb44780f63a00] __rpc_execute at ffffffffc052fc8a [sunrpc]
> #6 [ffffb44780f63a48] rpc_execute at ffffffffc05305a2 [sunrpc]
> #7 [ffffb44780f63a68] rpc_run_task at ffffffffc05164e4 [sunrpc]
> #8 [ffffb44780f63aa8] rpc_call_sync at ffffffffc0516573 [sunrpc]
> #9 [ffffb44780f63b00] rpc_create_xprt at ffffffffc051672e [sunrpc]
> #10 [ffffb44780f63b40] rpc_create at ffffffffc0516881 [sunrpc]
> #11 [ffffb44780f63be8] nfs_create_rpc_client at ffffffffc0972319 [nfs]
> #12 [ffffb44780f63c80] nfs4_init_client at ffffffffc0a17882 [nfsv4]
> #13 [ffffb44780f63d70] nfs4_set_client at ffffffffc0a16ef8 [nfsv4]
> #14 [ffffb44780f63de8] nfs4_create_server at ffffffffc0a188d8 [nfsv4]
> #15 [ffffb44780f63e60] nfs4_try_get_tree at ffffffffc0a0bf69 [nfsv4]
> #16 [ffffb44780f63e80] vfs_get_tree at ffffffff823b6068
> #17 [ffffb44780f63ea0] path_mount at ffffffff823e3d8f
> #18 [ffffb44780f63ef8] __x64_sys_mount at ffffffff823e45a3
> #19 [ffffb44780f63f38] do_syscall_64 at ffffffff82d6aa50
> #20 [ffffb44780f63f50] entry_SYSCALL_64_after_hwframe at ffffffff82e0007c
>
> The second mount is hung up in nfs_match_client()/nfs_wait_client_init_complete
> waiting for the first process to compile
> nfs_match_client:
>
> /* If a client is still initializing then we need to wait */
> if (clp->cl_cons_state > NFS_CS_READY) {
> refcount_inc(&clp->cl_count);
> spin_unlock(&nn->nfs_client_lock);
> error = nfs_wait_client_init_complete(clp);
> nfs_put_client(clp);
> spin_lock(&nn->nfs_client_lock);
> if (error < 0)
> return ERR_PTR(error);
> goto again;
> }
>
> This code came in with commit c156618e15101 which fixed
> a deadlock in nfs client initialization.
>
> steved.
>

2021-05-21 20:31:54

by Michael Wakabayashi

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

Hi Olga,

> But I would like to know what's a legitimate reason for a machine to
> have a listening but not responsive port 2049

The NFS server in this case was decommissioned and taken offline.
The user's automated tests was not updated and still trying to
mount this offline NFS server.

We refer to this decommissioned server as being unreachable.
Maybe it's a difference in terminology, but for us if the IP address does not
respond to ping (as in this case), we refer to it as being unreachable.
Other tools use this same terminology. For example "fping":
$ fping 2.2.2.2
2.2.2.2 is unreachable

We can't really prevent users from making mistakes.
* Users will continue to accidentally mount decommissioned servers.
* Users will continue to mount the wrong IP addresses in their tests and elsewhere.
And when these situation occur, it will block valid NFS mounts.

Should I be prevented from mounting NFS shares because
someone else mistyped the NFS server name in their mount command?

From a user perspective, it's not clear why a mount of a
decommissioned(and therefore down) NFS server is blocking
mounts of every other valid NFS server?
Shouldn't these valid NFS servers be allowed to mount?

Thanks, Mike





From: Olga Kornievskaia <[email protected]>
Sent: Friday, May 21, 2021 12:35 PM
To: Michael Wakabayashi <[email protected]>
Cc: Steve Dickson <[email protected]>; [email protected] <[email protected]>
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
?
On Fri, May 21, 2021 at 3:09 PM Michael Wakabayashi
<[email protected]> wrote:
>
> > This code came in with commit c156618e15101 which fixed a deadlock in nfs client initialization.
>
> > My conclusion is: an unresponsive server will block other mounts but only until timeout is reached.
>
> Hi Steve and Olga,
>
> We run multiple Kubernetes clusters.
> These clusters are composed of hundreds of Kubernetes nodes.
> Any of these nodes can NFS mount on behalf of the containers running on these nodes.
> We've seen several times in the past few months an NFS mount hang, and then several hundred up to several thousand NFS mounts blocked by this hung NFS mount processes (we have many "testing" workloads that access NFS).
> Having several hundred NFS mounts blocked on a node causes the Kubernetes node to become unstable and severely degrades service.
>
> We did not expect a hung NFS mount to block every other NFS mount, especially when the other mounts are unrelated and otherwise working properly.
>
> Can this behavior be changed?

Hi Michael,

I'm not sure if the design can be changed. But I would like to know
what's a legitimate reason for a machine to have a listening but not
responsive port 2049 (I'm sorry I don't particularly care for the
explanation of "because this is how things currently work in
containers, Kubernetes"). Seems like the problem should be fixed
there. There is no issue if a mount goes to an IP that has nothing
listening on port 2049.

Again I have no comments on the design change: or rather my comment
was I don't see a way. If you have 2 parallel clients initializing and
the goal is to have at most one client if both are the same, then I
don't see a way besides serializing it as it's done now.

>
> Thanks, Mike
>
>
> ________________________________
> From: Steve Dickson <[email protected]>
> Sent: Thursday, May 20, 2021 11:42 AM
> To: Olga Kornievskaia <[email protected]>; Michael Wakabayashi <[email protected]>
> Cc: [email protected] <[email protected]>
> Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
>
> Hey.
>
> On 5/19/21 3:15 PM, Olga Kornievskaia wrote:
> > On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
> > <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> We're seeing what looks like an NFSv4 issue.
> >>
> >> Mounting an NFS server that is down (ping to this NFS server's IP address does not respond) will block _all_ other NFS mount attempts even if the NFS servers are available and working properly (these subsequent mounts hang).
> >>
> >> If I kill the NFS mount process that's trying to mount the dead NFS server, the NFS mounts that were blocked will immediately unblock and mount successfully, which suggests the first mount command is blocking the other mount commands.
> >>
> >>
> >> I verified this behavior using a newly built mount.nfs command from the recent nfs-utils 2.5.3 package installed on a recent version of Ubuntu Cloud Image 21.04:
> >> * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fnfs%2Ffiles%2Fnfs-utils%2F2.5.3%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qWsYLeSLC0k89%2FHJGqhMlBnEvGR%2Bdqxve4n56bww%2Bnk%3D&amp;reserved=0
> >> * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloud-images.ubuntu.com%2Freleases%2Fhirsute%2Frelease-20210513%2Fubuntu-21.04-server-cloudimg-amd64.ova&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5sHt34ZBtsa7MjRP0RbymhbjOn%2FT5X5JUWvIQV93PUU%3D&amp;reserved=0
> >>
> >>
> >> The reason this looks like it is specific to NFSv4 is from the following output showing "vers=4.2":
> >>> $ strace /sbin/mount.nfs <unreachable-IP-address>:/path /tmp/mnt
> >>> [ ... cut ... ]
> >>> mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs", 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
> >>
> >> Also, if I try the same mount.nfs commands but specifying NFSv3, the mount to the dead NFS server hangs, but the mounts to the operational NFS servers do not block and mount successfully; this bug doesn't happen when using NFSv3.
> >>
> >>
> >> We reported this issue under util-linux here:
> >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkarelzak%2Futil-linux%2Fissues%2F1309&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UrD%2FaBX2S4Qq7CrgltIc9lEzA8oEQQn0srMXtrq%2B6CE%3D&amp;reserved=0
> >> [mounting nfs server which is down blocks all other nfs mounts on same machine #1309]
> >>
> >> I also found an older bug on this mailing list that had similar symptoms (but could not tell if it was the same problem or not):
> >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-nfs%2Fpatch%2F87vaori26c.fsf%40notabene.neil.brown.name%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lBAE9wZbykGQ8VPH6YsAtuaIpMpDcAtqsxVNDV%2BaNTk%3D&amp;reserved=0
> >> [[PATCH/RFC] NFSv4: don't let hanging mounts block other mounts]
> >>
> >> Thanks, Mike
> >
> > Hi Mike,
> >
> > This is not a helpful reply but I was curious if I could reproduce
> > your issue but was not successful. I'm able to initiate a mount to an
> > unreachable-IP-address which hangs and then do another mount to an
> > existing server without issues. Ubuntu 21.04 seems to be 5.11 based so
> > I tried upstream 5.11 and I tried the latest upstream nfs-utils
> > (instead of what my distro has which was an older version).
> >
> > To debug, perhaps get an output of the nfs4 and sunrpc tracepoints.
> > Or also get output from dmesg after doing ?echo t >
> > /proc/sysrq-trigger? to see where the mounts are hanging.
> >
> It looks like Mike is correct... The first process (mount 1.1.1.1:/mnt) is
> hung in trying the connection:
>
> PID: 3394?? TASK: ffff9da8c42734c0? CPU: 0?? COMMAND: "mount.nfs"
>? #0 [ffffb44780f638c8] __schedule at ffffffff82d7959d
>? #1 [ffffb44780f63950] schedule at ffffffff82d79f2b
>? #2 [ffffb44780f63968] rpc_wait_bit_killable at ffffffffc05265ce [sunrpc]
>? #3 [ffffb44780f63980] __wait_on_bit at ffffffff82d7a4ba
>? #4 [ffffb44780f639b8] out_of_line_wait_on_bit at ffffffff82d7a5a6
>? #5 [ffffb44780f63a00] __rpc_execute at ffffffffc052fc8a [sunrpc]
>? #6 [ffffb44780f63a48] rpc_execute at ffffffffc05305a2 [sunrpc]
>? #7 [ffffb44780f63a68] rpc_run_task at ffffffffc05164e4 [sunrpc]
>? #8 [ffffb44780f63aa8] rpc_call_sync at ffffffffc0516573 [sunrpc]
>? #9 [ffffb44780f63b00] rpc_create_xprt at ffffffffc051672e [sunrpc]
> #10 [ffffb44780f63b40] rpc_create at ffffffffc0516881 [sunrpc]
> #11 [ffffb44780f63be8] nfs_create_rpc_client at ffffffffc0972319 [nfs]
> #12 [ffffb44780f63c80] nfs4_init_client at ffffffffc0a17882 [nfsv4]
> #13 [ffffb44780f63d70] nfs4_set_client at ffffffffc0a16ef8 [nfsv4]
> #14 [ffffb44780f63de8] nfs4_create_server at ffffffffc0a188d8 [nfsv4]
> #15 [ffffb44780f63e60] nfs4_try_get_tree at ffffffffc0a0bf69 [nfsv4]
> #16 [ffffb44780f63e80] vfs_get_tree at ffffffff823b6068
> #17 [ffffb44780f63ea0] path_mount at ffffffff823e3d8f
> #18 [ffffb44780f63ef8] __x64_sys_mount at ffffffff823e45a3
> #19 [ffffb44780f63f38] do_syscall_64 at ffffffff82d6aa50
> #20 [ffffb44780f63f50] entry_SYSCALL_64_after_hwframe at ffffffff82e0007c
>
> The second mount is hung up in nfs_match_client()/nfs_wait_client_init_complete
> waiting for the first process to compile
> nfs_match_client:
>
>??????? /* If a client is still initializing then we need to wait */
>???????? if (clp->cl_cons_state > NFS_CS_READY) {
>???????????? refcount_inc(&clp->cl_count);
>???????????? spin_unlock(&nn->nfs_client_lock);
>???????????? error = nfs_wait_client_init_complete(clp);
>???????????? nfs_put_client(clp);
>???????????? spin_lock(&nn->nfs_client_lock);
>???????????? if (error < 0)
>???????????????? return ERR_PTR(error);
>???????????? goto again;
>???????? }
>
> This code came in with commit c156618e15101 which fixed
> a deadlock in nfs client initialization.
>
> steved.
>

2021-05-21 21:07:09

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

Hi Michael,

On Fri, May 21, 2021 at 4:31 PM Michael Wakabayashi
<[email protected]> wrote:
>
> Hi Olga,
>
> > But I would like to know what's a legitimate reason for a machine to
> > have a listening but not responsive port 2049
>
> The NFS server in this case was decommissioned and taken offline.

You say that it's taken offline. If it's offline there shouldn't be
anything listening on port 2049. I was only able to reproduce the
problem when a client is able to send a SYN to the server and not
getting a reply back. If the server is offline, there will always be a
reply back (RST or something of the sorts). Client tries a bit but it
never gets stuck in the rpc_execute() state because it would get a
reply from the TCP layer. Your stack is where there is no TCP reply
from the server.

> The user's automated tests was not updated and still trying to
> mount this offline NFS server.
>
> We refer to this decommissioned server as being unreachable.
> Maybe it's a difference in terminology, but for us if the IP address does not
> respond to ping (as in this case), we refer to it as being unreachable.
> Other tools use this same terminology. For example "fping":
> $ fping 2.2.2.2
> 2.2.2.2 is unreachable
>
> We can't really prevent users from making mistakes.
> * Users will continue to accidentally mount decommissioned servers.
> * Users will continue to mount the wrong IP addresses in their tests and elsewhere.
> And when these situation occur, it will block valid NFS mounts.
>
> Should I be prevented from mounting NFS shares because
> someone else mistyped the NFS server name in their mount command?
>
> From a user perspective, it's not clear why a mount of a
> decommissioned(and therefore down) NFS server is blocking
> mounts of every other valid NFS server?
> Shouldn't these valid NFS servers be allowed to mount?
>
> Thanks, Mike
>
>
>
>
>
> From: Olga Kornievskaia <[email protected]>
> Sent: Friday, May 21, 2021 12:35 PM
> To: Michael Wakabayashi <[email protected]>
> Cc: Steve Dickson <[email protected]>; [email protected] <[email protected]>
> Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
>
> On Fri, May 21, 2021 at 3:09 PM Michael Wakabayashi
> <[email protected]> wrote:
> >
> > > This code came in with commit c156618e15101 which fixed a deadlock in nfs client initialization.
> >
> > > My conclusion is: an unresponsive server will block other mounts but only until timeout is reached.
> >
> > Hi Steve and Olga,
> >
> > We run multiple Kubernetes clusters.
> > These clusters are composed of hundreds of Kubernetes nodes.
> > Any of these nodes can NFS mount on behalf of the containers running on these nodes.
> > We've seen several times in the past few months an NFS mount hang, and then several hundred up to several thousand NFS mounts blocked by this hung NFS mount processes (we have many "testing" workloads that access NFS).
> > Having several hundred NFS mounts blocked on a node causes the Kubernetes node to become unstable and severely degrades service.
> >
> > We did not expect a hung NFS mount to block every other NFS mount, especially when the other mounts are unrelated and otherwise working properly.
> >
> > Can this behavior be changed?
>
> Hi Michael,
>
> I'm not sure if the design can be changed. But I would like to know
> what's a legitimate reason for a machine to have a listening but not
> responsive port 2049 (I'm sorry I don't particularly care for the
> explanation of "because this is how things currently work in
> containers, Kubernetes"). Seems like the problem should be fixed
> there. There is no issue if a mount goes to an IP that has nothing
> listening on port 2049.
>
> Again I have no comments on the design change: or rather my comment
> was I don't see a way. If you have 2 parallel clients initializing and
> the goal is to have at most one client if both are the same, then I
> don't see a way besides serializing it as it's done now.
>
> >
> > Thanks, Mike
> >
> >
> > ________________________________
> > From: Steve Dickson <[email protected]>
> > Sent: Thursday, May 20, 2021 11:42 AM
> > To: Olga Kornievskaia <[email protected]>; Michael Wakabayashi <[email protected]>
> > Cc: [email protected] <[email protected]>
> > Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
> >
> > Hey.
> >
> > On 5/19/21 3:15 PM, Olga Kornievskaia wrote:
> > > On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
> > > <[email protected]> wrote:
> > >>
> > >> Hi,
> > >>
> > >> We're seeing what looks like an NFSv4 issue.
> > >>
> > >> Mounting an NFS server that is down (ping to this NFS server's IP address does not respond) will block _all_ other NFS mount attempts even if the NFS servers are available and working properly (these subsequent mounts hang).
> > >>
> > >> If I kill the NFS mount process that's trying to mount the dead NFS server, the NFS mounts that were blocked will immediately unblock and mount successfully, which suggests the first mount command is blocking the other mount commands.
> > >>
> > >>
> > >> I verified this behavior using a newly built mount.nfs command from the recent nfs-utils 2.5.3 package installed on a recent version of Ubuntu Cloud Image 21.04:
> > >> * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fnfs%2Ffiles%2Fnfs-utils%2F2.5.3%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qWsYLeSLC0k89%2FHJGqhMlBnEvGR%2Bdqxve4n56bww%2Bnk%3D&amp;reserved=0
> > >> * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloud-images.ubuntu.com%2Freleases%2Fhirsute%2Frelease-20210513%2Fubuntu-21.04-server-cloudimg-amd64.ova&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5sHt34ZBtsa7MjRP0RbymhbjOn%2FT5X5JUWvIQV93PUU%3D&amp;reserved=0
> > >>
> > >>
> > >> The reason this looks like it is specific to NFSv4 is from the following output showing "vers=4.2":
> > >>> $ strace /sbin/mount.nfs <unreachable-IP-address>:/path /tmp/mnt
> > >>> [ ... cut ... ]
> > >>> mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs", 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
> > >>
> > >> Also, if I try the same mount.nfs commands but specifying NFSv3, the mount to the dead NFS server hangs, but the mounts to the operational NFS servers do not block and mount successfully; this bug doesn't happen when using NFSv3.
> > >>
> > >>
> > >> We reported this issue under util-linux here:
> > >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkarelzak%2Futil-linux%2Fissues%2F1309&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UrD%2FaBX2S4Qq7CrgltIc9lEzA8oEQQn0srMXtrq%2B6CE%3D&amp;reserved=0
> > >> [mounting nfs server which is down blocks all other nfs mounts on same machine #1309]
> > >>
> > >> I also found an older bug on this mailing list that had similar symptoms (but could not tell if it was the same problem or not):
> > >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-nfs%2Fpatch%2F87vaori26c.fsf%40notabene.neil.brown.name%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lBAE9wZbykGQ8VPH6YsAtuaIpMpDcAtqsxVNDV%2BaNTk%3D&amp;reserved=0
> > >> [[PATCH/RFC] NFSv4: don't let hanging mounts block other mounts]
> > >>
> > >> Thanks, Mike
> > >
> > > Hi Mike,
> > >
> > > This is not a helpful reply but I was curious if I could reproduce
> > > your issue but was not successful. I'm able to initiate a mount to an
> > > unreachable-IP-address which hangs and then do another mount to an
> > > existing server without issues. Ubuntu 21.04 seems to be 5.11 based so
> > > I tried upstream 5.11 and I tried the latest upstream nfs-utils
> > > (instead of what my distro has which was an older version).
> > >
> > > To debug, perhaps get an output of the nfs4 and sunrpc tracepoints.
> > > Or also get output from dmesg after doing “echo t >
> > > /proc/sysrq-trigger” to see where the mounts are hanging.
> > >
> > It looks like Mike is correct... The first process (mount 1.1.1.1:/mnt) is
> > hung in trying the connection:
> >
> > PID: 3394 TASK: ffff9da8c42734c0 CPU: 0 COMMAND: "mount.nfs"
> > #0 [ffffb44780f638c8] __schedule at ffffffff82d7959d
> > #1 [ffffb44780f63950] schedule at ffffffff82d79f2b
> > #2 [ffffb44780f63968] rpc_wait_bit_killable at ffffffffc05265ce [sunrpc]
> > #3 [ffffb44780f63980] __wait_on_bit at ffffffff82d7a4ba
> > #4 [ffffb44780f639b8] out_of_line_wait_on_bit at ffffffff82d7a5a6
> > #5 [ffffb44780f63a00] __rpc_execute at ffffffffc052fc8a [sunrpc]
> > #6 [ffffb44780f63a48] rpc_execute at ffffffffc05305a2 [sunrpc]
> > #7 [ffffb44780f63a68] rpc_run_task at ffffffffc05164e4 [sunrpc]
> > #8 [ffffb44780f63aa8] rpc_call_sync at ffffffffc0516573 [sunrpc]
> > #9 [ffffb44780f63b00] rpc_create_xprt at ffffffffc051672e [sunrpc]
> > #10 [ffffb44780f63b40] rpc_create at ffffffffc0516881 [sunrpc]
> > #11 [ffffb44780f63be8] nfs_create_rpc_client at ffffffffc0972319 [nfs]
> > #12 [ffffb44780f63c80] nfs4_init_client at ffffffffc0a17882 [nfsv4]
> > #13 [ffffb44780f63d70] nfs4_set_client at ffffffffc0a16ef8 [nfsv4]
> > #14 [ffffb44780f63de8] nfs4_create_server at ffffffffc0a188d8 [nfsv4]
> > #15 [ffffb44780f63e60] nfs4_try_get_tree at ffffffffc0a0bf69 [nfsv4]
> > #16 [ffffb44780f63e80] vfs_get_tree at ffffffff823b6068
> > #17 [ffffb44780f63ea0] path_mount at ffffffff823e3d8f
> > #18 [ffffb44780f63ef8] __x64_sys_mount at ffffffff823e45a3
> > #19 [ffffb44780f63f38] do_syscall_64 at ffffffff82d6aa50
> > #20 [ffffb44780f63f50] entry_SYSCALL_64_after_hwframe at ffffffff82e0007c
> >
> > The second mount is hung up in nfs_match_client()/nfs_wait_client_init_complete
> > waiting for the first process to compile
> > nfs_match_client:
> >
> > /* If a client is still initializing then we need to wait */
> > if (clp->cl_cons_state > NFS_CS_READY) {
> > refcount_inc(&clp->cl_count);
> > spin_unlock(&nn->nfs_client_lock);
> > error = nfs_wait_client_init_complete(clp);
> > nfs_put_client(clp);
> > spin_lock(&nn->nfs_client_lock);
> > if (error < 0)
> > return ERR_PTR(error);
> > goto again;
> > }
> >
> > This code came in with commit c156618e15101 which fixed
> > a deadlock in nfs client initialization.
> >
> > steved.
> >

2021-05-21 22:08:46

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

On Fri, 2021-05-21 at 17:06 -0400, Olga Kornievskaia wrote:
> Hi Michael,
>
> On Fri, May 21, 2021 at 4:31 PM Michael Wakabayashi
> <[email protected]> wrote:
> >
> > Hi Olga,
> >
> > > But I would like to know  what's a legitimate reason for a
> > > machine to
> > > have a listening but not responsive port 2049
> >
> > The NFS server in this case was decommissioned and taken offline.
>
> You say that it's taken offline. If it's offline there shouldn't be
> anything listening on port 2049. I was only able to reproduce the
> problem when a client is able to send a SYN to the server and not
> getting a reply back. If the server is offline, there will always be
> a
> reply back (RST or something of the sorts). Client tries a bit but it
> never gets stuck in the rpc_execute() state because it would get a
> reply from the TCP layer. Your stack is where  there is no TCP reply
> from the server.
>
> > The user's automated tests was not updated and still trying to
> > mount this offline NFS server.
> >
> > We refer to this decommissioned server as being unreachable.
> > Maybe it's a difference in terminology, but for us if the IP
> > address does not
> > respond to ping (as in this case), we refer to it as being 
> > unreachable.
> > Other tools use this same terminology. For example "fping":
> > $ fping 2.2.2.2
> > 2.2.2.2 is unreachable
> >
> > We can't really prevent users from making mistakes.
> > * Users will continue to accidentally mount decommissioned servers.
> > * Users will continue to mount the wrong IP addresses in their
> > tests and elsewhere.
> > And when these situation occur, it will block valid NFS mounts.
> >
> > Should I be prevented from mounting NFS shares because
> > someone else mistyped the NFS server name in their mount command?
> >
> > From a user perspective, it's not clear why a mount of a
> > decommissioned(and therefore down) NFS server is blocking
> > mounts of every other valid NFS server?
> > Shouldn't these valid NFS servers be allowed to mount?

There are valid protocol reasons why the NFSv4 client has to check
whether or not the new mount is really talking to the same server but
over a different IP addresses. When a server is down, then that blocks
the ability to perform that kind of probe, and so the client tries to
wait for the server to come back up (until it times out).
That's what I understand Olga to be claiming is happening here, and is
why we can't fix the problem.

Instead of turning this server completely off, why can't you simply set
up a referral to the new server on that IP address? That's a much more
user-friendly way of dealing with the problem.

> >
> > Thanks, Mike
> >
> >
> >
> >
> >
> > From: Olga Kornievskaia <[email protected]>
> > Sent: Friday, May 21, 2021 12:35 PM
> > To: Michael Wakabayashi <[email protected]>
> > Cc: Steve Dickson <[email protected]>; [email protected] <
> > [email protected]>
> > Subject: Re: NFSv4: Mounting NFS server which is down, blocks all
> > other NFS mounts on same machine
> >
> > On Fri, May 21, 2021 at 3:09 PM Michael Wakabayashi
> > <[email protected]> wrote:
> > >
> > > > This code came in with commit c156618e15101 which fixed a
> > > > deadlock in nfs client initialization.
> > >
> > > > My conclusion is: an unresponsive server will block other
> > > > mounts but only until timeout is reached.
> > >
> > > Hi Steve and Olga,
> > >
> > > We run multiple Kubernetes clusters.
> > > These clusters are composed of hundreds of Kubernetes nodes.
> > > Any of these nodes can NFS mount on behalf of the containers
> > > running on these nodes.
> > > We've seen several times in the past few months an NFS mount
> > > hang, and then several hundred up to several thousand NFS mounts
> > > blocked by this hung NFS mount processes (we have many "testing"
> > > workloads that access NFS).
> > > Having several hundred NFS mounts blocked on a node causes the
> > > Kubernetes node to become unstable and severely degrades service.
> > >
> > > We did not expect a hung NFS mount to block every other NFS
> > > mount, especially when the other mounts are unrelated and
> > > otherwise working properly.
> > >
> > > Can this behavior be changed?
> >
> > Hi Michael,
> >
> > I'm not sure if the design can be changed. But I would like to know
> > what's a legitimate reason for a machine to have a listening but
> > not
> > responsive port 2049 (I'm sorry I don't particularly care for the
> > explanation of "because this is how things currently work in
> > containers, Kubernetes"). Seems like the problem should be fixed
> > there. There is no issue if a mount goes to an IP that has nothing
> > listening on port 2049.
> >
> > Again I have no comments on the design change: or rather my comment
> > was I don't see a way. If you have 2 parallel clients initializing
> > and
> > the goal is to have at most one client if both are the same, then I
> > don't see a way besides serializing it as it's done now.
> >
> > >
> > > Thanks, Mike
> > >
> > >
> > > ________________________________
> > > From: Steve Dickson <[email protected]>
> > > Sent: Thursday, May 20, 2021 11:42 AM
> > > To: Olga Kornievskaia <[email protected]>; Michael Wakabayashi <
> > > [email protected]>
> > > Cc: [email protected] <[email protected]>
> > > Subject: Re: NFSv4: Mounting NFS server which is down, blocks all
> > > other NFS mounts on same machine
> > >
> > > Hey.
> > >
> > > On 5/19/21 3:15 PM, Olga Kornievskaia wrote:
> > > > On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
> > > > <[email protected]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > We're seeing what looks like an NFSv4 issue.
> > > > >
> > > > > Mounting an NFS server that is down (ping to this NFS
> > > > > server's IP address does not respond) will block _all_ other
> > > > > NFS mount attempts even if the NFS servers are available and
> > > > > working properly (these subsequent mounts hang).
> > > > >
> > > > > If I kill the NFS mount process that's trying to mount the
> > > > > dead NFS server, the NFS mounts that were blocked will
> > > > > immediately unblock and mount successfully, which suggests
> > > > > the first mount command is blocking the other mount commands.
> > > > >
> > > > >
> > > > > I verified this behavior using a newly built mount.nfs
> > > > > command from the recent nfs-utils 2.5.3 package installed on
> > > > > a recent version of Ubuntu Cloud Image 21.04:
> > > > > *
> > > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fnfs%2Ffiles%2Fnfs-utils%2F2.5.3%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qWsYLeSLC0k89%2FHJGqhMlBnEvGR%2Bdqxve4n56bww%2Bnk%3D&amp;reserved=0
> > > > > *
> > > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloud-images.ubuntu.com%2Freleases%2Fhirsute%2Frelease-20210513%2Fubuntu-21.04-server-cloudimg-amd64.ova&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5sHt34ZBtsa7MjRP0RbymhbjOn%2FT5X5JUWvIQV93PUU%3D&amp;reserved=0
> > > > >
> > > > >
> > > > > The reason this looks like it is specific to NFSv4 is from
> > > > > the following output showing "vers=4.2":
> > > > > > $ strace /sbin/mount.nfs <unreachable-IP-address>:/path
> > > > > > /tmp/mnt
> > > > > > [ ... cut ... ]
> > > > > > mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs",
> > > > > > 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
> > > > >
> > > > > Also, if I try the same mount.nfs commands but specifying
> > > > > NFSv3, the mount to the dead NFS server hangs, but the mounts
> > > > > to the operational NFS servers do not block and mount
> > > > > successfully; this bug doesn't happen when using NFSv3.
> > > > >
> > > > >
> > > > > We reported this issue under util-linux here:
> > > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkarelzak%2Futil-linux%2Fissues%2F1309&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UrD%2FaBX2S4Qq7CrgltIc9lEzA8oEQQn0srMXtrq%2B6CE%3D&amp;reserved=0
> > > > > [mounting nfs server which is down blocks all other nfs
> > > > > mounts on same machine #1309]
> > > > >
> > > > > I also found an older bug on this mailing list that had
> > > > > similar symptoms (but could not tell if it was the same
> > > > > problem or not):
> > > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-nfs%2Fpatch%2F87vaori26c.fsf%40notabene.neil.brown.name%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lBAE9wZbykGQ8VPH6YsAtuaIpMpDcAtqsxVNDV%2BaNTk%3D&amp;reserved=0
> > > > > [[PATCH/RFC] NFSv4: don't let hanging mounts block other
> > > > > mounts]
> > > > >
> > > > > Thanks, Mike
> > > >
> > > > Hi Mike,
> > > >
> > > > This is not a helpful reply but I was curious if I could
> > > > reproduce
> > > > your issue but was not successful. I'm able to initiate a mount
> > > > to an
> > > > unreachable-IP-address which hangs and then do another mount to
> > > > an
> > > > existing server without issues. Ubuntu 21.04 seems to be 5.11
> > > > based so
> > > > I tried upstream 5.11 and I tried the latest upstream nfs-utils
> > > > (instead of what my distro has which was an older version).
> > > >
> > > > To debug, perhaps get an output of the nfs4 and sunrpc
> > > > tracepoints.
> > > > Or also get output from dmesg after doing “echo t >
> > > > /proc/sysrq-trigger” to see where the mounts are hanging.
> > > >
> > > It looks like Mike is correct... The first process (mount
> > > 1.1.1.1:/mnt) is
> > > hung in trying the connection:
> > >
> > > PID: 3394   TASK: ffff9da8c42734c0  CPU: 0   COMMAND: "mount.nfs"
> > >  #0 [ffffb44780f638c8] __schedule at ffffffff82d7959d
> > >  #1 [ffffb44780f63950] schedule at ffffffff82d79f2b
> > >  #2 [ffffb44780f63968] rpc_wait_bit_killable at ffffffffc05265ce
> > > [sunrpc]
> > >  #3 [ffffb44780f63980] __wait_on_bit at ffffffff82d7a4ba
> > >  #4 [ffffb44780f639b8] out_of_line_wait_on_bit at
> > > ffffffff82d7a5a6
> > >  #5 [ffffb44780f63a00] __rpc_execute at ffffffffc052fc8a [sunrpc]
> > >  #6 [ffffb44780f63a48] rpc_execute at ffffffffc05305a2 [sunrpc]
> > >  #7 [ffffb44780f63a68] rpc_run_task at ffffffffc05164e4 [sunrpc]
> > >  #8 [ffffb44780f63aa8] rpc_call_sync at ffffffffc0516573 [sunrpc]
> > >  #9 [ffffb44780f63b00] rpc_create_xprt at ffffffffc051672e
> > > [sunrpc]
> > > #10 [ffffb44780f63b40] rpc_create at ffffffffc0516881 [sunrpc]
> > > #11 [ffffb44780f63be8] nfs_create_rpc_client at ffffffffc0972319
> > > [nfs]
> > > #12 [ffffb44780f63c80] nfs4_init_client at ffffffffc0a17882
> > > [nfsv4]
> > > #13 [ffffb44780f63d70] nfs4_set_client at ffffffffc0a16ef8
> > > [nfsv4]
> > > #14 [ffffb44780f63de8] nfs4_create_server at ffffffffc0a188d8
> > > [nfsv4]
> > > #15 [ffffb44780f63e60] nfs4_try_get_tree at ffffffffc0a0bf69
> > > [nfsv4]
> > > #16 [ffffb44780f63e80] vfs_get_tree at ffffffff823b6068
> > > #17 [ffffb44780f63ea0] path_mount at ffffffff823e3d8f
> > > #18 [ffffb44780f63ef8] __x64_sys_mount at ffffffff823e45a3
> > > #19 [ffffb44780f63f38] do_syscall_64 at ffffffff82d6aa50
> > > #20 [ffffb44780f63f50] entry_SYSCALL_64_after_hwframe at
> > > ffffffff82e0007c
> > >
> > > The second mount is hung up in
> > > nfs_match_client()/nfs_wait_client_init_complete
> > > waiting for the first process to compile
> > > nfs_match_client:
> > >
> > >        /* If a client is still initializing then we need to wait
> > > */
> > >         if (clp->cl_cons_state > NFS_CS_READY) {
> > >             refcount_inc(&clp->cl_count);
> > >             spin_unlock(&nn->nfs_client_lock);
> > >             error = nfs_wait_client_init_complete(clp);
> > >             nfs_put_client(clp);
> > >             spin_lock(&nn->nfs_client_lock);
> > >             if (error < 0)
> > >                 return ERR_PTR(error);
> > >             goto again;
> > >         }
> > >
> > > This code came in with commit c156618e15101 which fixed
> > > a deadlock in nfs client initialization.
> > >
> > > steved.
> > >

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-05-21 22:39:13

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

On Fri, May 21, 2021 at 5:06 PM Olga Kornievskaia <[email protected]> wrote:
>
> Hi Michael,
>
> On Fri, May 21, 2021 at 4:31 PM Michael Wakabayashi
> <[email protected]> wrote:
> >
> > Hi Olga,
> >
> > > But I would like to know what's a legitimate reason for a machine to
> > > have a listening but not responsive port 2049
> >
> > The NFS server in this case was decommissioned and taken offline.
>
> You say that it's taken offline. If it's offline there shouldn't be
> anything listening on port 2049. I was only able to reproduce the
> problem when a client is able to send a SYN to the server and not
> getting a reply back. If the server is offline, there will always be a
> reply back (RST or something of the sorts). Client tries a bit but it
> never gets stuck in the rpc_execute() state because it would get a
> reply from the TCP layer. Your stack is where there is no TCP reply
> from the server.
>

More on my explanation. I'm attaching 2 sets of tracepoints. 1 case is
"unreachable" (there is no machine with such IP 192.168.1.108 in my
case -- this is your example of somebody providing a bad name or
address. It will not prevent any other mounts from proceeding) and 2nd
case is "unresponsible" (192.168.1.68, something is listening on 2049
but not responding... it produced the stack trace you have provided.
It also demonstrates that mounts will succeed but only after things
timeout.

In the "unreachable" case, TCP is able to figure out its unreachable
pretty fast and timeout. The line to look for in the "unreachable.log"
file:
<...>-67711 [001] .... 341123.253898: rpc_stats_latency:
task:[email protected] xid=0xb8a00f1d nfsv4 NULL backlog=0 rtt=0 execute=3082006

In case of the un-responsible server, TCP takes a lot longer to timeout:
<...>-67806 [001] .... 341558.088633: rpc_stats_latency:
task:[email protected] xid=0x0f8fd007 nfsv4 NULL backlog=0 rtt=0
execute=181069428


There is a better way of using tracepoint but the quick (my preferred
way) way I turn on trace points is (as root) and then examining them:
cd /sys/kernel/debug/tracing
echo 1 > events/sunrpc/enable
echo 1 > events/nfs4/enable
cat trace_pipe > /tmp/trace_pipe.output

ctrl-c after test is done. Examine the output file.

I suggest that you get tracepoints from your system and see if it
behaves more like the "unresponsive" case than "unreachable" case by
comparing to the tracepoints I provided. The "unreachable" case: TCP
layer is failing us with -113 which is "host unreachable" so we give
up and go back to the NFS layer. The "unresponsive" case: TCP layer is
failing with -107 "not connected" so the RPC layer will retry several
times.

> > The user's automated tests was not updated and still trying to
> > mount this offline NFS server.
> >
> > We refer to this decommissioned server as being unreachable.
> > Maybe it's a difference in terminology, but for us if the IP address does not
> > respond to ping (as in this case), we refer to it as being unreachable.
> > Other tools use this same terminology. For example "fping":
> > $ fping 2.2.2.2
> > 2.2.2.2 is unreachable
> >
> > We can't really prevent users from making mistakes.
> > * Users will continue to accidentally mount decommissioned servers.
> > * Users will continue to mount the wrong IP addresses in their tests and elsewhere.
> > And when these situation occur, it will block valid NFS mounts.
> >
> > Should I be prevented from mounting NFS shares because
> > someone else mistyped the NFS server name in their mount command?
> >
> > From a user perspective, it's not clear why a mount of a
> > decommissioned(and therefore down) NFS server is blocking
> > mounts of every other valid NFS server?
> > Shouldn't these valid NFS servers be allowed to mount?
> >
> > Thanks, Mike
> >
> >
> >
> >
> >
> > From: Olga Kornievskaia <[email protected]>
> > Sent: Friday, May 21, 2021 12:35 PM
> > To: Michael Wakabayashi <[email protected]>
> > Cc: Steve Dickson <[email protected]>; [email protected] <[email protected]>
> > Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
> >
> > On Fri, May 21, 2021 at 3:09 PM Michael Wakabayashi
> > <[email protected]> wrote:
> > >
> > > > This code came in with commit c156618e15101 which fixed a deadlock in nfs client initialization.
> > >
> > > > My conclusion is: an unresponsive server will block other mounts but only until timeout is reached.
> > >
> > > Hi Steve and Olga,
> > >
> > > We run multiple Kubernetes clusters.
> > > These clusters are composed of hundreds of Kubernetes nodes.
> > > Any of these nodes can NFS mount on behalf of the containers running on these nodes.
> > > We've seen several times in the past few months an NFS mount hang, and then several hundred up to several thousand NFS mounts blocked by this hung NFS mount processes (we have many "testing" workloads that access NFS).
> > > Having several hundred NFS mounts blocked on a node causes the Kubernetes node to become unstable and severely degrades service.
> > >
> > > We did not expect a hung NFS mount to block every other NFS mount, especially when the other mounts are unrelated and otherwise working properly.
> > >
> > > Can this behavior be changed?
> >
> > Hi Michael,
> >
> > I'm not sure if the design can be changed. But I would like to know
> > what's a legitimate reason for a machine to have a listening but not
> > responsive port 2049 (I'm sorry I don't particularly care for the
> > explanation of "because this is how things currently work in
> > containers, Kubernetes"). Seems like the problem should be fixed
> > there. There is no issue if a mount goes to an IP that has nothing
> > listening on port 2049.
> >
> > Again I have no comments on the design change: or rather my comment
> > was I don't see a way. If you have 2 parallel clients initializing and
> > the goal is to have at most one client if both are the same, then I
> > don't see a way besides serializing it as it's done now.
> >
> > >
> > > Thanks, Mike
> > >
> > >
> > > ________________________________
> > > From: Steve Dickson <[email protected]>
> > > Sent: Thursday, May 20, 2021 11:42 AM
> > > To: Olga Kornievskaia <[email protected]>; Michael Wakabayashi <[email protected]>
> > > Cc: [email protected] <[email protected]>
> > > Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine
> > >
> > > Hey.
> > >
> > > On 5/19/21 3:15 PM, Olga Kornievskaia wrote:
> > > > On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
> > > > <[email protected]> wrote:
> > > >>
> > > >> Hi,
> > > >>
> > > >> We're seeing what looks like an NFSv4 issue.
> > > >>
> > > >> Mounting an NFS server that is down (ping to this NFS server's IP address does not respond) will block _all_ other NFS mount attempts even if the NFS servers are available and working properly (these subsequent mounts hang).
> > > >>
> > > >> If I kill the NFS mount process that's trying to mount the dead NFS server, the NFS mounts that were blocked will immediately unblock and mount successfully, which suggests the first mount command is blocking the other mount commands.
> > > >>
> > > >>
> > > >> I verified this behavior using a newly built mount.nfs command from the recent nfs-utils 2.5.3 package installed on a recent version of Ubuntu Cloud Image 21.04:
> > > >> * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fnfs%2Ffiles%2Fnfs-utils%2F2.5.3%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qWsYLeSLC0k89%2FHJGqhMlBnEvGR%2Bdqxve4n56bww%2Bnk%3D&amp;reserved=0
> > > >> * https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloud-images.ubuntu.com%2Freleases%2Fhirsute%2Frelease-20210513%2Fubuntu-21.04-server-cloudimg-amd64.ova&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5sHt34ZBtsa7MjRP0RbymhbjOn%2FT5X5JUWvIQV93PUU%3D&amp;reserved=0
> > > >>
> > > >>
> > > >> The reason this looks like it is specific to NFSv4 is from the following output showing "vers=4.2":
> > > >>> $ strace /sbin/mount.nfs <unreachable-IP-address>:/path /tmp/mnt
> > > >>> [ ... cut ... ]
> > > >>> mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs", 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
> > > >>
> > > >> Also, if I try the same mount.nfs commands but specifying NFSv3, the mount to the dead NFS server hangs, but the mounts to the operational NFS servers do not block and mount successfully; this bug doesn't happen when using NFSv3.
> > > >>
> > > >>
> > > >> We reported this issue under util-linux here:
> > > >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkarelzak%2Futil-linux%2Fissues%2F1309&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UrD%2FaBX2S4Qq7CrgltIc9lEzA8oEQQn0srMXtrq%2B6CE%3D&amp;reserved=0
> > > >> [mounting nfs server which is down blocks all other nfs mounts on same machine #1309]
> > > >>
> > > >> I also found an older bug on this mailing list that had similar symptoms (but could not tell if it was the same problem or not):
> > > >> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-nfs%2Fpatch%2F87vaori26c.fsf%40notabene.neil.brown.name%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lBAE9wZbykGQ8VPH6YsAtuaIpMpDcAtqsxVNDV%2BaNTk%3D&amp;reserved=0
> > > >> [[PATCH/RFC] NFSv4: don't let hanging mounts block other mounts]
> > > >>
> > > >> Thanks, Mike
> > > >
> > > > Hi Mike,
> > > >
> > > > This is not a helpful reply but I was curious if I could reproduce
> > > > your issue but was not successful. I'm able to initiate a mount to an
> > > > unreachable-IP-address which hangs and then do another mount to an
> > > > existing server without issues. Ubuntu 21.04 seems to be 5.11 based so
> > > > I tried upstream 5.11 and I tried the latest upstream nfs-utils
> > > > (instead of what my distro has which was an older version).
> > > >
> > > > To debug, perhaps get an output of the nfs4 and sunrpc tracepoints.
> > > > Or also get output from dmesg after doing “echo t >
> > > > /proc/sysrq-trigger” to see where the mounts are hanging.
> > > >
> > > It looks like Mike is correct... The first process (mount 1.1.1.1:/mnt) is
> > > hung in trying the connection:
> > >
> > > PID: 3394 TASK: ffff9da8c42734c0 CPU: 0 COMMAND: "mount.nfs"
> > > #0 [ffffb44780f638c8] __schedule at ffffffff82d7959d
> > > #1 [ffffb44780f63950] schedule at ffffffff82d79f2b
> > > #2 [ffffb44780f63968] rpc_wait_bit_killable at ffffffffc05265ce [sunrpc]
> > > #3 [ffffb44780f63980] __wait_on_bit at ffffffff82d7a4ba
> > > #4 [ffffb44780f639b8] out_of_line_wait_on_bit at ffffffff82d7a5a6
> > > #5 [ffffb44780f63a00] __rpc_execute at ffffffffc052fc8a [sunrpc]
> > > #6 [ffffb44780f63a48] rpc_execute at ffffffffc05305a2 [sunrpc]
> > > #7 [ffffb44780f63a68] rpc_run_task at ffffffffc05164e4 [sunrpc]
> > > #8 [ffffb44780f63aa8] rpc_call_sync at ffffffffc0516573 [sunrpc]
> > > #9 [ffffb44780f63b00] rpc_create_xprt at ffffffffc051672e [sunrpc]
> > > #10 [ffffb44780f63b40] rpc_create at ffffffffc0516881 [sunrpc]
> > > #11 [ffffb44780f63be8] nfs_create_rpc_client at ffffffffc0972319 [nfs]
> > > #12 [ffffb44780f63c80] nfs4_init_client at ffffffffc0a17882 [nfsv4]
> > > #13 [ffffb44780f63d70] nfs4_set_client at ffffffffc0a16ef8 [nfsv4]
> > > #14 [ffffb44780f63de8] nfs4_create_server at ffffffffc0a188d8 [nfsv4]
> > > #15 [ffffb44780f63e60] nfs4_try_get_tree at ffffffffc0a0bf69 [nfsv4]
> > > #16 [ffffb44780f63e80] vfs_get_tree at ffffffff823b6068
> > > #17 [ffffb44780f63ea0] path_mount at ffffffff823e3d8f
> > > #18 [ffffb44780f63ef8] __x64_sys_mount at ffffffff823e45a3
> > > #19 [ffffb44780f63f38] do_syscall_64 at ffffffff82d6aa50
> > > #20 [ffffb44780f63f50] entry_SYSCALL_64_after_hwframe at ffffffff82e0007c
> > >
> > > The second mount is hung up in nfs_match_client()/nfs_wait_client_init_complete
> > > waiting for the first process to compile
> > > nfs_match_client:
> > >
> > > /* If a client is still initializing then we need to wait */
> > > if (clp->cl_cons_state > NFS_CS_READY) {
> > > refcount_inc(&clp->cl_count);
> > > spin_unlock(&nn->nfs_client_lock);
> > > error = nfs_wait_client_init_complete(clp);
> > > nfs_put_client(clp);
> > > spin_lock(&nn->nfs_client_lock);
> > > if (error < 0)
> > > return ERR_PTR(error);
> > > goto again;
> > > }
> > >
> > > This code came in with commit c156618e15101 which fixed
> > > a deadlock in nfs client initialization.
> > >
> > > steved.
> > >


Attachments:
unreachable.log.gz (7.94 kB)
unresponsive.log.gz (6.46 kB)
Download all attachments

2021-05-21 22:43:34

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

On Fri, May 21, 2021 at 6:08 PM Trond Myklebust <[email protected]> wrote:
>
> On Fri, 2021-05-21 at 17:06 -0400, Olga Kornievskaia wrote:
> > Hi Michael,
> >
> > On Fri, May 21, 2021 at 4:31 PM Michael Wakabayashi
> > <[email protected]> wrote:
> > >
> > > Hi Olga,
> > >
> > > > But I would like to know what's a legitimate reason for a
> > > > machine to
> > > > have a listening but not responsive port 2049
> > >
> > > The NFS server in this case was decommissioned and taken offline.
> >
> > You say that it's taken offline. If it's offline there shouldn't be
> > anything listening on port 2049. I was only able to reproduce the
> > problem when a client is able to send a SYN to the server and not
> > getting a reply back. If the server is offline, there will always be
> > a
> > reply back (RST or something of the sorts). Client tries a bit but it
> > never gets stuck in the rpc_execute() state because it would get a
> > reply from the TCP layer. Your stack is where there is no TCP reply
> > from the server.
> >
> > > The user's automated tests was not updated and still trying to
> > > mount this offline NFS server.
> > >
> > > We refer to this decommissioned server as being unreachable.
> > > Maybe it's a difference in terminology, but for us if the IP
> > > address does not
> > > respond to ping (as in this case), we refer to it as being
> > > unreachable.
> > > Other tools use this same terminology. For example "fping":
> > > $ fping 2.2.2.2
> > > 2.2.2.2 is unreachable
> > >
> > > We can't really prevent users from making mistakes.
> > > * Users will continue to accidentally mount decommissioned servers.
> > > * Users will continue to mount the wrong IP addresses in their
> > > tests and elsewhere.
> > > And when these situation occur, it will block valid NFS mounts.
> > >
> > > Should I be prevented from mounting NFS shares because
> > > someone else mistyped the NFS server name in their mount command?
> > >
> > > From a user perspective, it's not clear why a mount of a
> > > decommissioned(and therefore down) NFS server is blocking
> > > mounts of every other valid NFS server?
> > > Shouldn't these valid NFS servers be allowed to mount?
>
> There are valid protocol reasons why the NFSv4 client has to check
> whether or not the new mount is really talking to the same server but
> over a different IP addresses. When a server is down, then that blocks
> the ability to perform that kind of probe, and so the client tries to
> wait for the server to come back up (until it times out).
> That's what I understand Olga to be claiming is happening here, and is
> why we can't fix the problem.

I'm actually claiming their infrastructure is broken. He says the
server is down. If that's the case, the TCP layer will time out fast
and it will not visibly block other mounts. However, if the server is
unresponsive, that's what the provided stack shows, then the TCP
timeout is much larger. I'm saying the server should truly be
unreachable and not unresponsive.

> Instead of turning this server completely off, why can't you simply set
> up a referral to the new server on that IP address? That's a much more
> user-friendly way of dealing with the problem.
>
> > >
> > > Thanks, Mike
> > >
> > >
> > >
> > >
> > >
> > > From: Olga Kornievskaia <[email protected]>
> > > Sent: Friday, May 21, 2021 12:35 PM
> > > To: Michael Wakabayashi <[email protected]>
> > > Cc: Steve Dickson <[email protected]>; [email protected] <
> > > [email protected]>
> > > Subject: Re: NFSv4: Mounting NFS server which is down, blocks all
> > > other NFS mounts on same machine
> > >
> > > On Fri, May 21, 2021 at 3:09 PM Michael Wakabayashi
> > > <[email protected]> wrote:
> > > >
> > > > > This code came in with commit c156618e15101 which fixed a
> > > > > deadlock in nfs client initialization.
> > > >
> > > > > My conclusion is: an unresponsive server will block other
> > > > > mounts but only until timeout is reached.
> > > >
> > > > Hi Steve and Olga,
> > > >
> > > > We run multiple Kubernetes clusters.
> > > > These clusters are composed of hundreds of Kubernetes nodes.
> > > > Any of these nodes can NFS mount on behalf of the containers
> > > > running on these nodes.
> > > > We've seen several times in the past few months an NFS mount
> > > > hang, and then several hundred up to several thousand NFS mounts
> > > > blocked by this hung NFS mount processes (we have many "testing"
> > > > workloads that access NFS).
> > > > Having several hundred NFS mounts blocked on a node causes the
> > > > Kubernetes node to become unstable and severely degrades service.
> > > >
> > > > We did not expect a hung NFS mount to block every other NFS
> > > > mount, especially when the other mounts are unrelated and
> > > > otherwise working properly.
> > > >
> > > > Can this behavior be changed?
> > >
> > > Hi Michael,
> > >
> > > I'm not sure if the design can be changed. But I would like to know
> > > what's a legitimate reason for a machine to have a listening but
> > > not
> > > responsive port 2049 (I'm sorry I don't particularly care for the
> > > explanation of "because this is how things currently work in
> > > containers, Kubernetes"). Seems like the problem should be fixed
> > > there. There is no issue if a mount goes to an IP that has nothing
> > > listening on port 2049.
> > >
> > > Again I have no comments on the design change: or rather my comment
> > > was I don't see a way. If you have 2 parallel clients initializing
> > > and
> > > the goal is to have at most one client if both are the same, then I
> > > don't see a way besides serializing it as it's done now.
> > >
> > > >
> > > > Thanks, Mike
> > > >
> > > >
> > > > ________________________________
> > > > From: Steve Dickson <[email protected]>
> > > > Sent: Thursday, May 20, 2021 11:42 AM
> > > > To: Olga Kornievskaia <[email protected]>; Michael Wakabayashi <
> > > > [email protected]>
> > > > Cc: [email protected] <[email protected]>
> > > > Subject: Re: NFSv4: Mounting NFS server which is down, blocks all
> > > > other NFS mounts on same machine
> > > >
> > > > Hey.
> > > >
> > > > On 5/19/21 3:15 PM, Olga Kornievskaia wrote:
> > > > > On Sun, May 16, 2021 at 11:18 PM Michael Wakabayashi
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > We're seeing what looks like an NFSv4 issue.
> > > > > >
> > > > > > Mounting an NFS server that is down (ping to this NFS
> > > > > > server's IP address does not respond) will block _all_ other
> > > > > > NFS mount attempts even if the NFS servers are available and
> > > > > > working properly (these subsequent mounts hang).
> > > > > >
> > > > > > If I kill the NFS mount process that's trying to mount the
> > > > > > dead NFS server, the NFS mounts that were blocked will
> > > > > > immediately unblock and mount successfully, which suggests
> > > > > > the first mount command is blocking the other mount commands.
> > > > > >
> > > > > >
> > > > > > I verified this behavior using a newly built mount.nfs
> > > > > > command from the recent nfs-utils 2.5.3 package installed on
> > > > > > a recent version of Ubuntu Cloud Image 21.04:
> > > > > > *
> > > > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fprojects%2Fnfs%2Ffiles%2Fnfs-utils%2F2.5.3%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qWsYLeSLC0k89%2FHJGqhMlBnEvGR%2Bdqxve4n56bww%2Bnk%3D&amp;reserved=0
> > > > > > *
> > > > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloud-images.ubuntu.com%2Freleases%2Fhirsute%2Frelease-20210513%2Fubuntu-21.04-server-cloudimg-amd64.ova&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5sHt34ZBtsa7MjRP0RbymhbjOn%2FT5X5JUWvIQV93PUU%3D&amp;reserved=0
> > > > > >
> > > > > >
> > > > > > The reason this looks like it is specific to NFSv4 is from
> > > > > > the following output showing "vers=4.2":
> > > > > > > $ strace /sbin/mount.nfs <unreachable-IP-address>:/path
> > > > > > > /tmp/mnt
> > > > > > > [ ... cut ... ]
> > > > > > > mount("<unreadhable-IP-address>:/path", "/tmp/mnt", "nfs",
> > > > > > > 0, "vers=4.2,addr=<unreachable-IP-address>,clien"...^C^Z
> > > > > >
> > > > > > Also, if I try the same mount.nfs commands but specifying
> > > > > > NFSv3, the mount to the dead NFS server hangs, but the mounts
> > > > > > to the operational NFS servers do not block and mount
> > > > > > successfully; this bug doesn't happen when using NFSv3.
> > > > > >
> > > > > >
> > > > > > We reported this issue under util-linux here:
> > > > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fkarelzak%2Futil-linux%2Fissues%2F1309&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=UrD%2FaBX2S4Qq7CrgltIc9lEzA8oEQQn0srMXtrq%2B6CE%3D&amp;reserved=0
> > > > > > [mounting nfs server which is down blocks all other nfs
> > > > > > mounts on same machine #1309]
> > > > > >
> > > > > > I also found an older bug on this mailing list that had
> > > > > > similar symptoms (but could not tell if it was the same
> > > > > > problem or not):
> > > > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.kernel.org%2Fproject%2Flinux-nfs%2Fpatch%2F87vaori26c.fsf%40notabene.neil.brown.name%2F&amp;data=04%7C01%7Cmwakabayashi%40vmware.com%7C254806799e3f45388def08d91c8f9c45%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637572225410414697%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lBAE9wZbykGQ8VPH6YsAtuaIpMpDcAtqsxVNDV%2BaNTk%3D&amp;reserved=0
> > > > > > [[PATCH/RFC] NFSv4: don't let hanging mounts block other
> > > > > > mounts]
> > > > > >
> > > > > > Thanks, Mike
> > > > >
> > > > > Hi Mike,
> > > > >
> > > > > This is not a helpful reply but I was curious if I could
> > > > > reproduce
> > > > > your issue but was not successful. I'm able to initiate a mount
> > > > > to an
> > > > > unreachable-IP-address which hangs and then do another mount to
> > > > > an
> > > > > existing server without issues. Ubuntu 21.04 seems to be 5.11
> > > > > based so
> > > > > I tried upstream 5.11 and I tried the latest upstream nfs-utils
> > > > > (instead of what my distro has which was an older version).
> > > > >
> > > > > To debug, perhaps get an output of the nfs4 and sunrpc
> > > > > tracepoints.
> > > > > Or also get output from dmesg after doing “echo t >
> > > > > /proc/sysrq-trigger” to see where the mounts are hanging.
> > > > >
> > > > It looks like Mike is correct... The first process (mount
> > > > 1.1.1.1:/mnt) is
> > > > hung in trying the connection:
> > > >
> > > > PID: 3394 TASK: ffff9da8c42734c0 CPU: 0 COMMAND: "mount.nfs"
> > > > #0 [ffffb44780f638c8] __schedule at ffffffff82d7959d
> > > > #1 [ffffb44780f63950] schedule at ffffffff82d79f2b
> > > > #2 [ffffb44780f63968] rpc_wait_bit_killable at ffffffffc05265ce
> > > > [sunrpc]
> > > > #3 [ffffb44780f63980] __wait_on_bit at ffffffff82d7a4ba
> > > > #4 [ffffb44780f639b8] out_of_line_wait_on_bit at
> > > > ffffffff82d7a5a6
> > > > #5 [ffffb44780f63a00] __rpc_execute at ffffffffc052fc8a [sunrpc]
> > > > #6 [ffffb44780f63a48] rpc_execute at ffffffffc05305a2 [sunrpc]
> > > > #7 [ffffb44780f63a68] rpc_run_task at ffffffffc05164e4 [sunrpc]
> > > > #8 [ffffb44780f63aa8] rpc_call_sync at ffffffffc0516573 [sunrpc]
> > > > #9 [ffffb44780f63b00] rpc_create_xprt at ffffffffc051672e
> > > > [sunrpc]
> > > > #10 [ffffb44780f63b40] rpc_create at ffffffffc0516881 [sunrpc]
> > > > #11 [ffffb44780f63be8] nfs_create_rpc_client at ffffffffc0972319
> > > > [nfs]
> > > > #12 [ffffb44780f63c80] nfs4_init_client at ffffffffc0a17882
> > > > [nfsv4]
> > > > #13 [ffffb44780f63d70] nfs4_set_client at ffffffffc0a16ef8
> > > > [nfsv4]
> > > > #14 [ffffb44780f63de8] nfs4_create_server at ffffffffc0a188d8
> > > > [nfsv4]
> > > > #15 [ffffb44780f63e60] nfs4_try_get_tree at ffffffffc0a0bf69
> > > > [nfsv4]
> > > > #16 [ffffb44780f63e80] vfs_get_tree at ffffffff823b6068
> > > > #17 [ffffb44780f63ea0] path_mount at ffffffff823e3d8f
> > > > #18 [ffffb44780f63ef8] __x64_sys_mount at ffffffff823e45a3
> > > > #19 [ffffb44780f63f38] do_syscall_64 at ffffffff82d6aa50
> > > > #20 [ffffb44780f63f50] entry_SYSCALL_64_after_hwframe at
> > > > ffffffff82e0007c
> > > >
> > > > The second mount is hung up in
> > > > nfs_match_client()/nfs_wait_client_init_complete
> > > > waiting for the first process to compile
> > > > nfs_match_client:
> > > >
> > > > /* If a client is still initializing then we need to wait
> > > > */
> > > > if (clp->cl_cons_state > NFS_CS_READY) {
> > > > refcount_inc(&clp->cl_count);
> > > > spin_unlock(&nn->nfs_client_lock);
> > > > error = nfs_wait_client_init_complete(clp);
> > > > nfs_put_client(clp);
> > > > spin_lock(&nn->nfs_client_lock);
> > > > if (error < 0)
> > > > return ERR_PTR(error);
> > > > goto again;
> > > > }
> > > >
> > > > This code came in with commit c156618e15101 which fixed
> > > > a deadlock in nfs client initialization.
> > > >
> > > > steved.
> > > >
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2021-06-08 09:17:34

by Michael Wakabayashi

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

Hi Olga,

> > You say that it's taken offline. If it's offline there shouldn't be
> > anything listening on port 2049. I was only able to reproduce the
> > problem when a client is able to send a SYN to the server and not
> > getting a reply back. If the server is offline, there will always be a
> > reply back (RST or something of the sorts). Client tries a bit but it
> > never gets stuck in the rpc_execute() state because it would get a
> > reply from the TCP layer. Your stack is where ?there is no TCP reply
> > from the server.

> I'm actually claiming their infrastructure is broken. He says the
> server is down. If that's the case, the TCP layer will time out fast
> and it will not visibly block other mounts. However, if the server is
> unresponsive, that's what the provided stack shows, then the TCP
> timeout is much larger. I'm saying the server should truly be
> unreachable and not unresponsive.

I don't think the infrastructure is broken.
The behavior we're seeing seems to be correct (explained below).

> If the server is offline, there will always be a reply back
> (RST or something of the sorts).

> He says the server is down. If that's the case, the TCP layer will
> time out fast

I believe these two statements are incorrect. Let me explain.

If the server is offline (for example physically unplugged from power)
then this unplugged server cannot reply. It's a brick.
CPU, memory, disk and NIC all have zero power. ?There's no operating
system running, the dead server won't see the TCP request packet
and cannot reply so the following cannot be true:
> If the server is offline, there will always be a reply back


Here's what's happening at the TCP layer:
I took a TCP packet capture (see attached nfsv4.pcap file) and can see the
NFS client(10.162.132.231) attempting a 3-way TCP handshake with the
powered-off/offline server(2.2.2.2). The client sends a TCP SYN to
the NFS server. ?But the NFS server is powered off, so the NFS client
times out waiting for the TCP SYN-ACK reply. On timeout, the NFS client
will retransmit the TCP SYN packet, and eventually time out again
waiting for the SYN-ACK reply. This process repeats itself until TCP
retransmits are exhausted. Eventually the NFS client mount command
gives up (after 3 minutes) and exits.

During this 3 minute period every other NFS mount command on the host where
the NFS client is running is blocked; this effectively caused a
denial of service attack since no other user was able to NFS mount anything,
including perfectly valid NFS mounts.
To make matters worse, after the mount command exited, the workload would
retry the powered off mount command again extending the duration of the
inadvertent denial of service.

> He says the server is down. If that's the case, the TCP layer will
> time out fast

As described above, the 3-way TCP handshake timeout is relatively slow and
the mount command takes 3 minutes to exit.

I believe you're thinking of the case when the NFS server is powered-on,
but has no Linux process listening on NFS port 2049. In this case
the NFS server --will-- reply quickly (within milliseconds) with a
TCP RST/Reset packet and the NFS client will quickly exit the mount process
with an error code.

> There are valid protocol reasons why the NFSv4 client has to check
> whether or not the new mount is really talking to the same server but
> over a different IP addresses.

Hi Trond,

I ran the following script:
? ? #!/bin/sh -x
? ? mount -o vers=4 -v -v -v 2.2.2.2:/fake_path /tmp/mnt.dead &
echo PID_OF_DEAD_BG_MOUNT=$!
? ? sleep 5 # give time for the first mount to execute in the background
? ? mount -o vers=4 -v -v -v 10.188.76.67:/git /tmp/mnt.alive
on Ubuntu 21.04, MacOS 11.1 and FreeBSD 11.4.

The undesirable blocking behavior only appeared on Ubuntu.
MacOs and FreeBSD executed the script fine meaning the 10.188.76.67:/git
NFS share immediately and successfully mounted without blocking.

On Ubuntu, a user runs "mount <unreachable-ip-address>:<fake-path>"
which blocks mounts for every other user on the system, this is undesirable.
Our virtual machines basically crashed because we had several hundred
to several thousand blocked mount processes preventing workloads from
making progress.

We'd prefer the mount behavior of the MacOS or FreeBSD implementations,
even if it's less secure, since at least it's not taking down
our servers with an unintentional DoS attack.

Is there any chance of looking at the FreeBSD mount implementation and seeing
if it is correct, and if so, have the Linux mount command emulate this behavior?

Thanks, Mike

p.s.
I've attached the following 4 files which were generated on Ubuntu 21.04:
1. mount_hang.sh: script running mount test
2. mount_hang.sh.log: output of mount_hang.sh
3. trace_pipe.txt: output of: cat /sys/kernel/debug/tracing/trace_pipe > trace_pipe.txt
with events/sunrpc/enable set to 1
events/nfs4/enable set to 1
4. nfsv4.pcap: output of "tcpdump -v -v -v -v -s 0 -w /tmp/nfsv4.pcap port 2049"

The test procedure was:
- run mount_hang.sh on the Ubunut 21.04 VM
- after seeing the second mount command execute
- wait about 5 seconds
- press control-C to exit the mount_hang.sh script.


Attachments:
mount_hang.sh (170.00 B)
mount_hang.sh
mount_hang.sh.log (634.00 B)
mount_hang.sh.log
nfsv4.pcap (474.00 B)
nfsv4.pcap
trace_pipe.txt (3.43 kB)
trace_pipe.txt
Download all attachments

2021-06-08 16:14:10

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

On Tue, Jun 8, 2021 at 5:16 AM Michael Wakabayashi
<[email protected]> wrote:
>
> Hi Olga,
>
> > > You say that it's taken offline. If it's offline there shouldn't be
> > > anything listening on port 2049. I was only able to reproduce the
> > > problem when a client is able to send a SYN to the server and not
> > > getting a reply back. If the server is offline, there will always be a
> > > reply back (RST or something of the sorts). Client tries a bit but it
> > > never gets stuck in the rpc_execute() state because it would get a
> > > reply from the TCP layer. Your stack is where there is no TCP reply
> > > from the server.
>
> > I'm actually claiming their infrastructure is broken. He says the
> > server is down. If that's the case, the TCP layer will time out fast
> > and it will not visibly block other mounts. However, if the server is
> > unresponsive, that's what the provided stack shows, then the TCP
> > timeout is much larger. I'm saying the server should truly be
> > unreachable and not unresponsive.
>
> I don't think the infrastructure is broken.
> The behavior we're seeing seems to be correct (explained below).
>
> > If the server is offline, there will always be a reply back
> > (RST or something of the sorts).
>
> > He says the server is down. If that's the case, the TCP layer will
> > time out fast
>
> I believe these two statements are incorrect. Let me explain.
>
> If the server is offline (for example physically unplugged from power)
> then this unplugged server cannot reply. It's a brick.
> CPU, memory, disk and NIC all have zero power. There's no operating
> system running, the dead server won't see the TCP request packet
> and cannot reply so the following cannot be true:
> > If the server is offline, there will always be a reply back
>
>
> Here's what's happening at the TCP layer:
> I took a TCP packet capture (see attached nfsv4.pcap file) and can see the
> NFS client(10.162.132.231) attempting a 3-way TCP handshake with the

Sending a SYN will only happen during the time that client had a valid
ARP entry in its cache from a previous connection. As soon as the ARP
cache is invalidated/updated and there is no value address to resolve
the 'bad' mount would be time out faster (that's when all the other
mounts would be unblocked but the hanging mount would take longer to
retry a number of times).

> powered-off/offline server(2.2.2.2). The client sends a TCP SYN to
> the NFS server. But the NFS server is powered off, so the NFS client
> times out waiting for the TCP SYN-ACK reply. On timeout, the NFS client
> will retransmit the TCP SYN packet, and eventually time out again
> waiting for the SYN-ACK reply. This process repeats itself until TCP
> retransmits are exhausted. Eventually the NFS client mount command
> gives up (after 3 minutes) and exits.
>
> During this 3 minute period every other NFS mount command on the host where
> the NFS client is running is blocked; this effectively caused a
> denial of service attack since no other user was able to NFS mount anything,
> including perfectly valid NFS mounts.
> To make matters worse, after the mount command exited, the workload would
> retry the powered off mount command again extending the duration of the
> inadvertent denial of service.
>
> > He says the server is down. If that's the case, the TCP layer will
> > time out fast
>
> As described above, the 3-way TCP handshake timeout is relatively slow and
> the mount command takes 3 minutes to exit.
>
> I believe you're thinking of the case when the NFS server is powered-on,
> but has no Linux process listening on NFS port 2049. In this case
> the NFS server --will-- reply quickly (within milliseconds) with a
> TCP RST/Reset packet and the NFS client will quickly exit the mount process
> with an error code.
>
> > There are valid protocol reasons why the NFSv4 client has to check
> > whether or not the new mount is really talking to the same server but
> > over a different IP addresses.
>
> Hi Trond,
>
> I ran the following script:
> #!/bin/sh -x
> mount -o vers=4 -v -v -v 2.2.2.2:/fake_path /tmp/mnt.dead &
> echo PID_OF_DEAD_BG_MOUNT=$!
> sleep 5 # give time for the first mount to execute in the background
> mount -o vers=4 -v -v -v 10.188.76.67:/git /tmp/mnt.alive
> on Ubuntu 21.04, MacOS 11.1 and FreeBSD 11.4.
>
> The undesirable blocking behavior only appeared on Ubuntu.
> MacOs and FreeBSD executed the script fine meaning the 10.188.76.67:/git
> NFS share immediately and successfully mounted without blocking.
>
> On Ubuntu, a user runs "mount <unreachable-ip-address>:<fake-path>"
> which blocks mounts for every other user on the system, this is undesirable.
> Our virtual machines basically crashed because we had several hundred
> to several thousand blocked mount processes preventing workloads from
> making progress.
>
> We'd prefer the mount behavior of the MacOS or FreeBSD implementations,
> even if it's less secure, since at least it's not taking down
> our servers with an unintentional DoS attack.
>
> Is there any chance of looking at the FreeBSD mount implementation and seeing
> if it is correct, and if so, have the Linux mount command emulate this behavior?
>
> Thanks, Mike
>
> p.s.
> I've attached the following 4 files which were generated on Ubuntu 21.04:
> 1. mount_hang.sh: script running mount test
> 2. mount_hang.sh.log: output of mount_hang.sh
> 3. trace_pipe.txt: output of: cat /sys/kernel/debug/tracing/trace_pipe > trace_pipe.txt
> with events/sunrpc/enable set to 1
> events/nfs4/enable set to 1
> 4. nfsv4.pcap: output of "tcpdump -v -v -v -v -s 0 -w /tmp/nfsv4.pcap port 2049"
>
> The test procedure was:
> - run mount_hang.sh on the Ubunut 21.04 VM
> - after seeing the second mount command execute
> - wait about 5 seconds
> - press control-C to exit the mount_hang.sh script.

2021-06-09 13:00:39

by Michael Wakabayashi

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

Hi Olga,

There seems to be a discrepancy between what you're seeing and what we're seeing.

So we were wondering if you can you please run these commands in your Linux environment and paste the output of the mount command below?
$ sudo mkdir -p /tmp/mnt.dead
$ time sudo mount -o vers=4 -vvv 2.2.2.2:/fake_path /tmp/mnt.dead

We'd like the mount command to specifically use "2.2.2.2:/fake_path" since we know it is unreachable and outside your subnet.
We're hoping by mounting "2.2.2.2:/fake_path" you'll be able to reproduce the same behavior that we're seeing.

Also, if possible, a packet trace would be helpful:
$ sudo tcpdump -s 0 -w /tmp/nfsv4.pcap port 2049

On my Ubuntu VirtualMachine, I see this output:
[email protected]:~$ time sudo mount -o vers=4 -vvv 2.2.2.2:/fake_path /tmp/mnt.dead
mount.nfs: timeout set for Wed Jun 9 05:12:15 2021
mount.nfs: trying text-based options 'vers=4,addr=2.2.2.2,clientaddr=10.162.132.231'
mount.nfs: mount(2): Connection timed out
mount.nfs: Connection timed out
real 3m1.257s
user 0m0.006s
sys 0m0.007s

Thanks, Mike

2021-06-09 13:20:31

by Alex Romanenko

[permalink] [raw]
Subject: RE: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

> Sending a SYN will only happen during the time that client had a valid
> ARP entry in its cache from a previous connection. As soon as the ARP
> cache is invalidated/updated and there is no value address to resolve
> the 'bad' mount would be time out faster (that's when all the other
> mounts would be unblocked but the hanging mount would take longer to
> retry a number of times).

In addition to what Michael Wakabayashi already said, I would like to point out
that if the NFS client is not in the same subnet as the NFS server, then the client
would never have an ARP entry for the NFS server. Instead, the client would have
the ARP entry for the default gateway of its subnet where it would happily send a SYN.
This is exactly why we are asking you to re-create the issue with the exact command
that Michael Wakabayashi has provided, as we believe that fake IP of 2.2.2.2 would be
outside of your NFS client's subnet range.

Thanks.
-Alex

2021-06-09 15:28:19

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

Hi Mike,

Indeed we do have discrepancy in what we are seeing because I'm not
able to see what you see. This is the case of an unreachable host (in
my traces
mount.nfs-3400 [001] .... 1662.882880: rpc_connect_status:
task:[email protected] status=-113

you provided a short tracepoint file (where the return value was -115
and that ENOTCONN reflecting that a SYN was sent). But here's my run.
Trying to capture the timing of both mounts. I start the "unreachable"
mount first and then the mount that should succeed which does without
much wait. The unreachable case times out very fast and allows for
other mounts to proceed without issues.

[[email protected] ~]$ date
Wed Jun 9 09:33:48 EDT 2021
[[email protected] ~]$ time sudo mount -o vers=4.2,sec=sys
2.2.2.2:/fake_path /scratch
mount.nfs: No route to host

real 2m12.464s
user 0m0.014s
sys 0m0.108s
[[email protected] ~]$ date
Wed Jun 9 09:36:18 EDT 2021

[[email protected] ~]$ date
Wed Jun 9 09:33:51 EDT 2021
[[email protected] ~]$ time sudo mount -o vers=4.2 192.168.1.110:/ /mnt

real 0m0.126s
user 0m0.017s
sys 0m0.067s
[[email protected] ~]$ date
Wed Jun 9 09:33:56 EDT 2021

There are no SYNs to 2.2.2.2 in my traces. So something different
about network infrastructures where in your case for some reason you
are sending something to 2.2.2.2 and the only reason I can think of is
that you have something in the ARP cache and I don't. And Alex yes I
do have an ARP entry for my default gateway but no there is no SYN
sent to the server from a different subnet.

On Wed, Jun 9, 2021 at 1:31 AM Michael Wakabayashi
<[email protected]> wrote:
>
> Hi Olga,
>
> There seems to be a discrepancy between what you're seeing and what we're seeing.
>
> So we were wondering if you can you please run these commands in your Linux environment and paste the output of the mount command below?
> $ sudo mkdir -p /tmp/mnt.dead
> $ time sudo mount -o vers=4 -vvv 2.2.2.2:/fake_path /tmp/mnt.dead
>
> We'd like the mount command to specifically use "2.2.2.2:/fake_path" since we know it is unreachable and outside your subnet.
> We're hoping by mounting "2.2.2.2:/fake_path" you'll be able to reproduce the same behavior that we're seeing.
>
> Also, if possible, a packet trace would be helpful:
> $ sudo tcpdump -s 0 -w /tmp/nfsv4.pcap port 2049
>
> On my Ubuntu VirtualMachine, I see this output:
> [email protected]:~$ time sudo mount -o vers=4 -vvv 2.2.2.2:/fake_path /tmp/mnt.dead
> mount.nfs: timeout set for Wed Jun 9 05:12:15 2021
> mount.nfs: trying text-based options 'vers=4,addr=2.2.2.2,clientaddr=10.162.132.231'
> mount.nfs: mount(2): Connection timed out
> mount.nfs: Connection timed out
> real 3m1.257s
> user 0m0.006s
> sys 0m0.007s
>
> Thanks, Mike

2021-06-09 15:42:31

by Benjamin Coddington

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

On 9 Jun 2021, at 1:31, Michael Wakabayashi wrote:

> Hi Olga,
>
> There seems to be a discrepancy between what you're seeing and what
> we're seeing.
>
> So we were wondering if you can you please run these commands in your
> Linux environment and paste the output of the mount command below?
> $ sudo mkdir -p /tmp/mnt.dead
> $ time sudo mount -o vers=4 -vvv 2.2.2.2:/fake_path /tmp/mnt.dead
>
> We'd like the mount command to specifically use "2.2.2.2:/fake_path"
> since we know it is unreachable and outside your subnet.
> We're hoping by mounting "2.2.2.2:/fake_path" you'll be able to
> reproduce the same behavior that we're seeing.
>
> Also, if possible, a packet trace would be helpful:
> $ sudo tcpdump -s 0 -w /tmp/nfsv4.pcap port 2049
>
> On my Ubuntu VirtualMachine, I see this output:
> [email protected]:~$ time sudo mount -o vers=4 -vvv
> 2.2.2.2:/fake_path /tmp/mnt.dead
> mount.nfs: timeout set for Wed Jun 9 05:12:15 2021
> mount.nfs: trying text-based options
> 'vers=4,addr=2.2.2.2,clientaddr=10.162.132.231'
> mount.nfs: mount(2): Connection timed out
> mount.nfs: Connection timed out
> real 3m1.257s
> user 0m0.006s
> sys 0m0.007s
>
> Thanks, Mike

It looks to me like you and Olga are seeing the same thing, a wait
through SYN retries scaling up from initial RTO for the number of
tcp_syn_retries.

It's not disputed that mounts waiting on the transport layer will block
other mounts.

It might be able to be changed: there's this torch:
https://lore.kernel.org/linux-nfs/[email protected]/

..or there may be another way we don't have to wait ..

.. or tune tcp_syn_retries.. or RTO.. or something else (eBPF?).

I think we're all strapped for time and problems like this usually get
fixed by the folks feeling the most pain from them.

Ben

2021-06-09 15:44:18

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

On Wed, 2021-06-09 at 10:31 -0400, Benjamin Coddington wrote:
>
> It's not disputed that mounts waiting on the transport layer will block
> other mounts.
>
> It might be able to be changed:  there's this torch:
> https://lore.kernel.org/linux-nfs/[email protected]/
>

No.

> ..or there may be another way we don't have to wait ..
>

OK. So let's look at how we can solve the problem of the initial
connection to the server timing out and causing hangs in
nfs41_walk_client_list(), and leave aside any other corner case
problems (such as the server going down while we're mounting).

How about something like the following (compile tested only) patch?

8<--------------------------------------------------------
From 30833f78b5b4c7780c91f705b0867ef5492a9eed Mon Sep 17 00:00:00 2001
From: Trond Myklebust <[email protected]>
Date: Wed, 9 Jun 2021 10:04:46 -0400
Subject: [PATCH] NFSv4: Initialise connection to the server in
nfs4_alloc_client()

Set up the connection to the NFSv4 server in nfs4_alloc_client(), before
we've added the struct nfs_client to the net-namespace's nfs_client_list
so that a downed server won't cause other mounts to hang in the trunking
detection code.

Reported-by: Michael Wakabayashi <[email protected]>
Fixes: 5c6e5b60aae4 ("NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DS")
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs4client.c | 82 +++++++++++++++++++++++----------------------
1 file changed, 42 insertions(+), 40 deletions(-)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 42719384e25f..28431acd1230 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -197,8 +197,11 @@ void nfs40_shutdown_client(struct nfs_client *clp)

struct nfs_client *nfs4_alloc_client(const struct nfs_client_initdata *cl_init)
{
- int err;
+ char buf[INET6_ADDRSTRLEN + 1];
+ const char *ip_addr = cl_init->ip_addr;
struct nfs_client *clp = nfs_alloc_client(cl_init);
+ int err;
+
if (IS_ERR(clp))
return clp;

@@ -222,6 +225,44 @@ struct nfs_client *nfs4_alloc_client(const struct nfs_client_initdata *cl_init)
init_waitqueue_head(&clp->cl_lock_waitq);
#endif
INIT_LIST_HEAD(&clp->pending_cb_stateids);
+
+ if (cl_init->minorversion != 0)
+ __set_bit(NFS_CS_INFINITE_SLOTS, &clp->cl_flags);
+ __set_bit(NFS_CS_DISCRTRY, &clp->cl_flags);
+ __set_bit(NFS_CS_NO_RETRANS_TIMEOUT, &clp->cl_flags);
+
+ /*
+ * Set up the connection to the server before we add add to the
+ * global list.
+ */
+ err = nfs_create_rpc_client(clp, cl_init, RPC_AUTH_GSS_KRB5I);
+ if (err == -EINVAL)
+ err = nfs_create_rpc_client(clp, cl_init, RPC_AUTH_UNIX);
+ if (err < 0)
+ goto error;
+
+ /* If no clientaddr= option was specified, find a usable cb address */
+ if (ip_addr == NULL) {
+ struct sockaddr_storage cb_addr;
+ struct sockaddr *sap = (struct sockaddr *)&cb_addr;
+
+ err = rpc_localaddr(clp->cl_rpcclient, sap, sizeof(cb_addr));
+ if (err < 0)
+ goto error;
+ err = rpc_ntop(sap, buf, sizeof(buf));
+ if (err < 0)
+ goto error;
+ ip_addr = (const char *)buf;
+ }
+ strlcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
+
+ err = nfs_idmap_new(clp);
+ if (err < 0) {
+ dprintk("%s: failed to create idmapper. Error = %d\n",
+ __func__, err);
+ goto error;
+ }
+ __set_bit(NFS_CS_IDMAP, &clp->cl_res_state);
return clp;

error:
@@ -372,8 +413,6 @@ static int nfs4_init_client_minor_version(struct nfs_client *clp)
struct nfs_client *nfs4_init_client(struct nfs_client *clp,
const struct nfs_client_initdata *cl_init)
{
- char buf[INET6_ADDRSTRLEN + 1];
- const char *ip_addr = cl_init->ip_addr;
struct nfs_client *old;
int error;

@@ -381,43 +420,6 @@ struct nfs_client *nfs4_init_client(struct nfs_client *clp,
/* the client is initialised already */
return clp;

- /* Check NFS protocol revision and initialize RPC op vector */
- clp->rpc_ops = &nfs_v4_clientops;
-
- if (clp->cl_minorversion != 0)
- __set_bit(NFS_CS_INFINITE_SLOTS, &clp->cl_flags);
- __set_bit(NFS_CS_DISCRTRY, &clp->cl_flags);
- __set_bit(NFS_CS_NO_RETRANS_TIMEOUT, &clp->cl_flags);
-
- error = nfs_create_rpc_client(clp, cl_init, RPC_AUTH_GSS_KRB5I);
- if (error == -EINVAL)
- error = nfs_create_rpc_client(clp, cl_init, RPC_AUTH_UNIX);
- if (error < 0)
- goto error;
-
- /* If no clientaddr= option was specified, find a usable cb address */
- if (ip_addr == NULL) {
- struct sockaddr_storage cb_addr;
- struct sockaddr *sap = (struct sockaddr *)&cb_addr;
-
- error = rpc_localaddr(clp->cl_rpcclient, sap, sizeof(cb_addr));
- if (error < 0)
- goto error;
- error = rpc_ntop(sap, buf, sizeof(buf));
- if (error < 0)
- goto error;
- ip_addr = (const char *)buf;
- }
- strlcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
-
- error = nfs_idmap_new(clp);
- if (error < 0) {
- dprintk("%s: failed to create idmapper. Error = %d\n",
- __func__, error);
- goto error;
- }
- __set_bit(NFS_CS_IDMAP, &clp->cl_res_state);
-
error = nfs4_init_client_minor_version(clp);
if (error < 0)
goto error;
--
2.31.1


--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-06-09 15:44:38

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

On Wed, Jun 9, 2021 at 10:31 AM Benjamin Coddington <[email protected]> wrote:
>
> On 9 Jun 2021, at 1:31, Michael Wakabayashi wrote:
>
> > Hi Olga,
> >
> > There seems to be a discrepancy between what you're seeing and what
> > we're seeing.
> >
> > So we were wondering if you can you please run these commands in your
> > Linux environment and paste the output of the mount command below?
> > $ sudo mkdir -p /tmp/mnt.dead
> > $ time sudo mount -o vers=4 -vvv 2.2.2.2:/fake_path /tmp/mnt.dead
> >
> > We'd like the mount command to specifically use "2.2.2.2:/fake_path"
> > since we know it is unreachable and outside your subnet.
> > We're hoping by mounting "2.2.2.2:/fake_path" you'll be able to
> > reproduce the same behavior that we're seeing.
> >
> > Also, if possible, a packet trace would be helpful:
> > $ sudo tcpdump -s 0 -w /tmp/nfsv4.pcap port 2049
> >
> > On my Ubuntu VirtualMachine, I see this output:
> > [email protected]:~$ time sudo mount -o vers=4 -vvv
> > 2.2.2.2:/fake_path /tmp/mnt.dead
> > mount.nfs: timeout set for Wed Jun 9 05:12:15 2021
> > mount.nfs: trying text-based options
> > 'vers=4,addr=2.2.2.2,clientaddr=10.162.132.231'
> > mount.nfs: mount(2): Connection timed out
> > mount.nfs: Connection timed out
> > real 3m1.257s
> > user 0m0.006s
> > sys 0m0.007s
> >
> > Thanks, Mike
>
> It looks to me like you and Olga are seeing the same thing, a wait
> through SYN retries scaling up from initial RTO for the number of
> tcp_syn_retries.

Ben, I disagree. Mike and I are seeing different things. Mike is
seeing SYNs being sent. I argue that SYNs should not be sent. I agree
if SYNs are sent then that would cause a problem

> It's not disputed that mounts waiting on the transport layer will block
> other mounts.
>
> It might be able to be changed: there's this torch:
> https://lore.kernel.org/linux-nfs/[email protected]/

We already discussed that this is not a solution as the NFS layer has
to serialize the client creation attempts.

> ..or there may be another way we don't have to wait ..
>
> .. or tune tcp_syn_retries.. or RTO.. or something else (eBPF?).
>
> I think we're all strapped for time and problems like this usually get
> fixed by the folks feeling the most pain from them.

I think we are still not understanding what network setup that is
happening that leads to a client sending a SYN (which is incorrect) to
what is supposed to be an unreachable server instead of timing out
fast (because there shouldn't be an ARP entry).

Mike, can you show your arp cache info (arp -n) during your run?

>
> Ben
>

2021-06-09 15:55:54

by Benjamin Coddington

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

On 9 Jun 2021, at 10:41, Trond Myklebust wrote:

> On Wed, 2021-06-09 at 10:31 -0400, Benjamin Coddington wrote:
>>
>> It's not disputed that mounts waiting on the transport layer will block
>> other mounts.
>>
>> It might be able to be changed:  there's this torch:
>> https://lore.kernel.org/linux-nfs/[email protected]/
>>
>
> No.
>
>> ..or there may be another way we don't have to wait ..
>>
>
> OK. So let's look at how we can solve the problem of the initial
> connection to the server timing out and causing hangs in
> nfs41_walk_client_list(), and leave aside any other corner case
> problems (such as the server going down while we're mounting).
>
> How about something like the following (compile tested only) patch?

It works as intended for this case, but I don't have my head wrapped around
the implications of the change yet.

Ben

2021-06-09 16:17:28

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

On Wed, 2021-06-09 at 11:00 -0400, Benjamin Coddington wrote:
> On 9 Jun 2021, at 10:41, Trond Myklebust wrote:
>
> > On Wed, 2021-06-09 at 10:31 -0400, Benjamin Coddington wrote:
> > >
> > > It's not disputed that mounts waiting on the transport layer will
> > > block
> > > other mounts.
> > >
> > > It might be able to be changed:  there's this torch:
> > > https://lore.kernel.org/linux-nfs/[email protected]/
> > >
> >
> > No.
> >
> > > ..or there may be another way we don't have to wait ..
> > >
> >
> > OK. So let's look at how we can solve the problem of the initial
> > connection to the server timing out and causing hangs in
> > nfs41_walk_client_list(), and leave aside any other corner case
> > problems (such as the server going down while we're mounting).
> >
> > How about something like the following (compile tested only) patch?
>
> It works as intended for this case, but I don't have my head wrapped
> around
> the implications of the change yet.
>

The main implications are that if you have 100 mounts all going to the
same server that is down, then you'll get 100 connection attempts. If
the server is up, but has wonky rpc.gssd service, then you'll get 100
attempts to set up krb5i...

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-06-09 18:11:24

by Michael Wakabayashi

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

> I think we are still not understanding what network setup that is
> happening that leads to a client sending a SYN (which is incorrect) to
> what is supposed to be an unreachable server instead of timing out
> fast (because there shouldn't be an ARP entry).
>
> Mike, can you show your arp cache info (arp -n) during your run?

Here's my run:
(Note: ​10.162.159.253 is the gateway for the default route.)

[email protected]:/tmp/dbg# date; arp -n; date; cat mount_hang.sh; date; ./mount_hang.sh; date; arp -n

Wed Jun 9 17:00:07 UTC 2021
Address HWtype HWaddress Flags Mask Iface
10.162.159.253 ether 00:00:0c:9f:fa:3c C ens192

Wed Jun 9 17:00:07 UTC 2021
#!/bin/sh -x
mount -o vers=4 -v -v -v 2.2.2.2:/fake_path /tmp/mnt.dead &
echo PID_OF_DEAD_BG_MOUNT=$!
sleep 5
mount -o vers=4 -v -v -v 10.188.76.67:/git /tmp/mnt.alive

Wed Jun 9 17:00:07 UTC 2021
+ echo+ mount -o vers=4 -v -v -v 2.2.2.2:/fake_path /tmp/mnt.dead
PID_OF_DEAD_BG_MOUNT=767703
PID_OF_DEAD_BG_MOUNT=767703
+ sleep 5
mount.nfs: timeout set for Wed Jun 9 17:02:07 2021
mount.nfs: trying text-based options 'vers=4,addr=2.2.2.2,clientaddr=10.162.132.231'
+ mount -o vers=4 -v -v -v 10.188.76.67:/git /tmp/mnt.alive
mount.nfs: timeout set for Wed Jun 9 17:02:12 2021
mount.nfs: trying text-based options 'vers=4,addr=10.188.76.67,clientaddr=10.162.132.231'
mount.nfs: mount(2): Connection timed out
mount.nfs: Connection timed out
mount.nfs: mount(2): Connection timed out
mount.nfs: Connection timed out
mount.nfs: mount(2): Connection timed out
mount.nfs: Connection timed out

Wed Jun 9 17:07:40 UTC 2021
Address HWtype HWaddress Flags Mask Iface
10.162.159.253 ether 00:00:0c:9f:fa:3c C ens192
10.162.159.252 ether 00:de:fb:a6:56:7c C ens192
[email protected]:/tmp/dbg#

2021-06-09 20:22:07

by Alex Romanenko

[permalink] [raw]
Subject: RE: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

> There are no SYNs to 2.2.2.2 in my traces. So something different
> about network infrastructures where in your case for some reason you
> are sending something to 2.2.2.2 and the only reason I can think of is
> that you have something in the ARP cache and I don't. And Alex yes I
> do have an ARP entry for my default gateway but no there is no SYN
> sent to the server from a different subnet.


Hi Olga,
Thanks for working with us on this issue.
I think we might have figured out the difference. 2.2.2.2 is inaccessible
public internet address. Does your network have a route to the internet?
That said, for the SYN to be sent the client's network needs to have a route
to the server's network. So, perhaps you can try a test with an IP that you
know your client network has a route for, but is down/unreachable.

P.S. We do see that Trond put out a possible patch for the issue:
https://marc.info/?l=linux-nfs&m=162326359818467&w=2

2021-06-11 05:27:54

by Michael Wakabayashi

[permalink] [raw]
Subject: Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

Hi everyone,

I took the patch created by Trond:
https://marc.info/?l=linux-nfs&m=162326359818467&w=2
[Subject: [PATCH] NFSv4: Initialise connection to the server in nfs4_alloc_client()]
Then rebuilt the NFSv4 kernel module and installed it on my Ubuntu virtual machine.

The valid NFS mounts are no longer being blocked!

We've had several?fairly significant outages for almost a year now
(when we migrated to NFSv4) due to this issue and this patch will
eliminate these types of incidents.

We greatly appreciate all the work that was done (especially knowing
everyone has a busy schedule).
Thanks so much to everyone that helped!

Mike