From: Haakon Riiser <haakon.riiser@fys.uio.no>
Subject: "Server not responding" after periods of client inactivity
Date: Thu, 14 Jul 2005 23:25:14 +0200
Message-ID: <20050714212514.GA23867@fox>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
To: nfs@lists.sourceforge.net
Sender: nfs-admin@lists.sourceforge.net
Errors-To: nfs-admin@lists.sourceforge.net

(Trying to post this for the fourth time -- nothing appeared on the
list the first three times.)

I have noticed that after periods of inactivity on the client
machine, the first NFS operation will always hang for around
15 seconds.  My first guess was that the server had powered down
its disk drives, and that it was the spin-up time that caused the
delay, but when I had an ssh session open on the server at the
same time as the client started complaining about not getting a
reply, I saw that this was /not/ the case -- there is nothing
on the server that would explain why the client is stalling.
No load, and no delay when working directly on the server.

I did tcpdump (on the server side) while the client was hanging,
and this is what I found:

Source  Time  Packets
------  ----  -------
client  0.00  V3 ACCESS Call, FH:0x02120000
client  0.10  [Retransmission of #1] V3 ACCESS Call, FH:0x02120000
client  0.31  [Retransmission of #1] V3 ACCESS Call, FH:0x02120000
client  0.71  [Retransmission of #1] V3 ACCESS Call, FH:0x02120000
client  1.53  [Retransmission of #1] V3 ACCESS Call, FH:0x02120000
client  3.16  [Retransmission of #1] V3 ACCESS Call, FH:0x02120000
client  6.42  [Retransmission of #1] V3 ACCESS Call, FH:0x02120000
client  7.12  [Retransmission of #1] V3 ACCESS Call, FH:0x02120000
client  8.52  [Retransmission of #1] V3 ACCESS Call, FH:0x02120000
client 11.32  [Retransmission of #1] V3 ACCESS Call, FH:0x02120000
server 15.30  V3 ACCESS Reply

After this, there are no more delays for this shared file system;
any file operation on the same file system will succeed instantly.
However, the first operation on any /other/ NFS file system also
mounted on the client will still hang just like the above example.
I.e., the hang always happens exactly once for each mount point.

I have tried setting rsize=1024,wsize=1024, and I have tried both
tcp and udp, but nothing has helped so far.  Any ideas?  tcpdump
clearly shows that all the requests arrive at the server, so why
does the server wait 15 seconds before it replies?

NFS server:
  Pentium III 650 MHz, 256 MB RAM
  Fedora Core 3 (fully updated)
  nfs-utils 1.0.6-52
  kernel 2.6.11-1.35_FC3

NFS client:
  Athlon XP2500+, 1 GB RAM
  Slackware 10.1
  nfs-utils 1.0.7
  kernel 2.6.11.11

-- 
 Haakon


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs