From: Didier CONTIS <didier@ece.gatech.edu>
Subject: Pb of optimization for a Cluster under Gigabit
Date: Tue, 06 Apr 2004 23:22:44 -0400
Sender: nfs-admin@lists.sourceforge.net
Message-ID: <40737404.7000401@ece.gatech.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
To: nfs@lists.sourceforge.net
Errors-To: nfs-admin@lists.sourceforge.net


We have a cluster with ~60 Dell PowerEdge 1750 (dual cpu)
running Redhat 9.0 (fully patched) connected via Gigabit
to a stack of Catalyst 3750.

The cluster has a dedicated NFS server also connected
via Gigabit:

Dell PowerEdge 2650 running AS 2.1 fully patched.
The unit has a Raid 1 array for the OS and is connected
via a dual Fiber Channel to a EMC Clarion SAN. We are
running Powerpath. The server has also 1GB of memory.

Its load is always 2 or higher an we have some flacky
performance when copying files from one NFS partitions
to another from the client:

All the filesystem are exported with sync and mounted
on the client (via autofs) with:
rw,sync,hard,intr,rsize=8192,wsize=8192

The time for copying a 40MB file from a NFS partition to local
client filesystem is good.

[didier@xfront2 ~]$ time cp jeffay.txt /tmp
0.010u 0.190s 0:05.19 3.8%      0+0k 0+0io 115pf+0w

For copying same file from one NFS partition to another
via the same client it takes more than a couple of minutes.

We are running 96 nfsd on the file server with the Queue tune-up hack.

The under /proc/net/rpc/nfsd
[...]
th 96 0 171.110 29.200 5.100 0.000 0.000 0.000 0.000 0.000 0.000 0.000

looks good.

It seems the file server is spending too much time doing ip frag work:
uptime -> 18hours

[didier@xnfs1 ~]$ cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
FragOKs FragFails FragCreates
Ip: 2 64 34249037 0 0 0 0 217 23273862 31176320 24384 0 0 16472823
5502518 0 0 0 10378060
[...]
Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens
AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts
Tcp: 0 0 0 0 2532 0 0 0 1 54706 76945 15 0 12
Udp: InDatagrams NoPorts InErrors OutDatagrams
Udp: 23221263 75 217 23165842

Would anyone have any suggestions or recommendations ? Should
I switch rsize / wsize to 1024 ?

Thanks - Didier


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs