We have a number of specialized workstations running CentOS 7.4
(kernel 3.10.0-693.11.1.el7.x86_64) that are both NFSv3 clients and
servers- i.e. they can mount each other's exports via autofs -
however, occasionally the mount process hangs
Each workstation has 4 exports, 2 of the exports are subdirectories of
the root disk - these are exported with the 'sync' option , the other
2 exports are the mount points of separate file systems - exported
with the 'async' option. All are XFS file systems
The mount process hang only happens with the exports from the root
disk - and once a mount hangs on one of these exports, a mount of the
other export from the root disk also hangs - but mounting the other 2
exports are fine
The problem can occur on any of the workstations mounting any of the
other workstations exports
We can temporary fix the problem by running 'exportfs -f' on the server
Running tcpdump on the client (or server) when attempting a mount when
in this state shows that the client sends an FSINFO call, but doesn't
get a reply - it then retransmits the FSINFO call and sends another
FSINFO call about 18 seconds later - but no replies - which I'm
guessing is significant?
Unfortunately, it is not straight forward to upgrade the kernel on
these workstations - so difficult to test if a newer kernel would fix
the issue ...
Is anyone able to suggest any other debugging steps we can take to
find out what the issue might be?
Thanks
James Pearson