Subject: about nfs_wait_client_init_complete/concurrent mounts

Hi All,

Sorry for wide distribution, but i think it might be an interesting topic.

There's this issue with concurrent mounts.nfs commands (NFS4.1) to
different NFS servers where if one or some of
servers are unreachable i.e a firewall blocks NFS server, other mounts
may wait on one of the consumers of
nfs_wait_client_init_complete : nfs_match_client, nfs4_match_client,
nfs41_discover_server_trunking...
since we need to check if we can reuse a client or not. Typically we
wait on nfs_match_client until client is marked
NFS_CS_READY or timeout.

This happens even if mount requests are for different servers, IP,
NFS version etc...
depending on timing, a mount to a reachable server might wait for
other broken mount
to timeout. I tested on kernels 4.18 and 5.8.

A more impacting version of this is i.e a rather big autofs map
contain some broken mounts that got triggered
from time to time, as a result other mounts i.e. users mounting remote
home were affected and can cause
user impact, delays etc...

Of course once broken mounts are removed, issue is fixed but I was
wondering if there's a way on NFS client to skip
reusing some clients a priori without them to be necessary NFS_CS_READY.

While doing some local patch testing to skip some candidates with
client-side information i could see we still get
a problem with trunking. As far as i could see, with trunking we need
a dialog with
NFS server to get session information in order to decide if we can
discard a client or reuse it, so we need to wait for that
to happen or timeout. That's my understanding of code, i can be wrong of course.

If my premises are true, *looks* to me pretty hard to optimize the
"concurrent mounts with
broken servers" scenario with trunking "in the game" without some
major and careful change.

What are your thoughts about the problem described above ? Any way
to optimize the above scenario where
some mount operation might wait for other broken mounts that have
nothing to do it ? Maybe some on-going
change on this topic ? (i did not found any)

rgds
roberto