2008-04-14 17:27:01

by Bruce Fields

[permalink] [raw]
Subject: Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests

On Sat, Apr 12, 2008 at 08:45:12AM +0200, Carsten Aulbert wrote:
> 2.6.24.Hi,
> J. Bruce Fields wrote:
>>> In the standard set-up many connections get into the box (tcp
>>> connection status SYN_RECV) but those fall over after some time and
>>> stay in CLOSE_WAIT state until I restart the nfs-kernel-server.
>>> Typically that looks like (netstat -an):
>> That's interesting! But I'm not sure how to figure this out.
>> Is it possible to get a network trace that shows what's going on?
> In principle yes, but
> (1) it's huge. I only get this when doing this with 500-1000 clients
> starting at about the same time
> (2) It seems that I don't get a full trace, i.e. the session seem to be
> incomplete - sometimes I only see a single packet with FIN set. I tried
> doing this both with wireshark running locally and with ntap's capturing
> device.

Yeah, that's not surprising. You'd probably want to dedicate a machine
to doing the capture, and then I'm not sure what kind of hardware you'd
need for a given network to get everything. Probably it's not worth it.

>> What happens on the clients?
> In the logs (/var/log/daemon.log) I only see that the mount request
> fails in different ways.
> Apr 9 12:07:55 n0078 automount[26838]: >> mount: RPC: Timed out
> Apr 9 12:07:55 n0078 automount[26838]: mount(nfs): nfs: mount failure
> d14:/data on /atlas/data/d14
> Apr 9 12:07:55 n0078 automount[26838]: failed to mount /atlas/data/d14
> Apr 9 12:18:56 n0078 automount[27977]: >> mount: RPC: Remote system
> error - Connection timed out
> Apr 9 12:18:56 n0078 automount[27977]: mount(nfs): nfs: mount failure
> d14:/data on /atlas/data/d14
> I have not yet run tshark in the background on many nodes to see if I
> can capture the client's view. Would that be beneficial?

Couldn't hurt.

Hauling out TCP/IP Illustrated and refreshing my memory of the tcp state
transition diagram.... So if the server has a lot of connections stuck
in CLOSE_WAIT, that means it got FIN's from the clients (perhaps after
they timed out), but never shut down its side of the connection. Sounds
like a bug in some server-side rpc code. (Hm. But all those SYN_RECV's
are somebody waiting for a client to ACK a SYN. Why are there so many
of those?)

Those connections are actually to port 687, which I assume is mountd
(what does rpcinfo -p say?). (And probably if you just killed and
restarted mountd, instead of doing a complete
"/etc/init.d/nfs-kernel-server restart", that'd also clear those out.)
In fact, in the example you gave only three out of about 27 connections
(the only ESTABLISHED connections) were to port 2049 (nfsd itself).

So it looks like it's mountd that's not keeping up (and that's leaving
connections sitting around too long), and the mountd processes are
probably what we should be debugging.

>> What kernel version are you using?--b.
> on Debian Etch
> Right now, it seems that running 196 nfsd plus 64 threads for mountd
> solves the problem for the time being. Although it would be nice to
> understand these "magic" numbers ;)

Yes, definitely. I'm surprised the number of nfsd threads matters much
at all, actually, if mountd is the bottleneck.


This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
NFS maillist - [email protected]
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.