2017-12-21 14:45:19

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: blocked client due to sleep with lock



Hi all,


when a pNFS client gets the multipath list for a DS with GETDEVICEINFO
it tries to poke all provided ip addresses. By doing that client holds
nfs_clid_init_mutex which is defined in nfs4state.c and calls
nfs41_discover_server_trunking function. The discovery code waits until
connection get initialized by looping around wait_event_killable called
by nfs_wait_client_init_complete. However, if one of the DS interfaces is
not reachable by the client, the nfs_clid_init_mutex is locked for quite some
time and client can't initialize any other DS, if a parallel request was
issued.

Now, the bonus issue. If DS provides IPv6 address, but client as a link-local
address only, then client will try to use it and first access to the DS will
take some time (400 sec). Moreover, the same situation is observed even
if client has IPv6 only on loopback interface. Unfortunately, looks
like both cases are allowed by IPv6. But with 1200 DSes accessing the data
is quite painful.

Regards,
Tigran.