From: Trond Myklebust <trond.myklebust@fys.uio.no>
Subject: Re: "Server not responding" after periods of client
	inactivity
Date: Sat, 30 Jul 2005 10:15:43 -0400
Message-ID: <1122732943.8248.13.camel@lade.trondhjem.org>
References: <20050714212514.GA23867@fox>  <20050730131031.GA1668@fox>
Mime-Version: 1.0
Content-Type: text/plain
Cc: nfs@lists.sourceforge.net
To: Haakon Riiser <haakon.riiser@fys.uio.no>
In-Reply-To: <20050730131031.GA1668@fox>
Sender: nfs-admin@lists.sourceforge.net
Errors-To: nfs-admin@lists.sourceforge.net

lau den 30.07.2005 Klokka 15:10 (+0200) skreiv Haakon Riiser:
> I have now tried this with another NFS client (Fedora Core 4)
> connected to the NFS server simultaneously with client used
> in the below problem report.  The same problem happens on
> the new client as well, even when the other client has just
> experienced the hang.
> 
> That is, after the stalled NFS operation completes on client A,
> it is still possible to observe the same problem on client B.
> It looks to me like this is caused by the server disconnecting idle
> clients, and when a new request suddenly occurs, it takes a while
> before the server manages to resurrect the connection.  I tried
> adding a cron job that does 'ls -l /NFS-MOUNT-POINT >/dev/null'
> every few minutes, but surprisingly, that didn't keep the the
> connection alive.  Is this because of client side caching?  Is there
> anything I can do?  These hangs are getting very annoying; it
> would be great if I could somehow change the idle timeout.

If the server drops the connection, then the client will wait 15 seconds
before retrying. The reason for this is that client has to assume that
the server is disconnecting due to congestion issues.
Note that if congestion really is an issue, your Linux server should
normally send an error message to the effect of "too many open TCP
sockets, consider increasing the number of nfsd threads" to your syslog.
You can fiddle with RPC_REESTABLISH_TIMEOUT if you want to change the 15
second delay, but I wouldn't recommend this unless you are sure you know
what you are doing. (FYI, the fixed timeout is, BTW, soon due to be
replaced with an exponential backoff-based timeout.)

Otherwise, you should note that the client too will attempt to drop the
connection after 5 minutes of idle activity on the socket. That should
normally not lead to a 15 second wait, though.
If the client fails to disconnect the idle connection, then the server
will do so after 6 minutes (i.e. ~ 1 minute after the client timeout
should have occurred).

Cheers,
  Trond


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs