LinuxLists.cc - Pb of optimization for a Cluster under Gigabit

2004-04-07 03:23:42

Subject: Pb of optimization for a Cluster under Gigabit

We have a cluster with ~60 Dell PowerEdge 1750 (dual cpu)
running Redhat 9.0 (fully patched) connected via Gigabit
to a stack of Catalyst 3750.

The cluster has a dedicated NFS server also connected
via Gigabit:

Dell PowerEdge 2650 running AS 2.1 fully patched.
The unit has a Raid 1 array for the OS and is connected
via a dual Fiber Channel to a EMC Clarion SAN. We are
running Powerpath. The server has also 1GB of memory.

Its load is always 2 or higher an we have some flacky
performance when copying files from one NFS partitions
to another from the client:

All the filesystem are exported with sync and mounted
on the client (via autofs) with:
rw,sync,hard,intr,rsize=8192,wsize=8192

The time for copying a 40MB file from a NFS partition to local
client filesystem is good.

[didier@xfront2 ~]$ time cp jeffay.txt /tmp
0.010u 0.190s 0:05.19 3.8% 0+0k 0+0io 115pf+0w

For copying same file from one NFS partition to another
via the same client it takes more than a couple of minutes.

We are running 96 nfsd on the file server with the Queue tune-up hack.

The under /proc/net/rpc/nfsd
[...]
th 96 0 171.110 29.200 5.100 0.000 0.000 0.000 0.000 0.000 0.000 0.000

looks good.

It seems the file server is spending too much time doing ip frag work:
uptime -> 18hours

[didier@xnfs1 ~]$ cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
FragOKs FragFails FragCreates
Ip: 2 64 34249037 0 0 0 0 217 23273862 31176320 24384 0 0 16472823
5502518 0 0 0 10378060
[...]
Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens
AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts
Tcp: 0 0 0 0 2532 0 0 0 1 54706 76945 15 0 12
Udp: InDatagrams NoPorts InErrors OutDatagrams
Udp: 23221263 75 217 23165842

Would anyone have any suggestions or recommendations ? Should
I switch rsize / wsize to 1024 ?

Thanks - Didier

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-08 02:17:43

by Didier CONTIS

[permalink] [raw]

Subject: Re: Pb of optimization for a Cluster under Gigabit

> that may be completely normal.
> i don't think the load average is a good indication of
> how hard your server is working. is your application
> throughput reasonable? any response time problems?

I had some timeos until I reboot the server
(before sending my e-mail to the mailing list) + high load.

Currently I have just a high load (I update the RH AS 2.1 kernel to the
latest rev). I thought it was odd considering
the hardware and in comparison of the load I had with older Linux file
server of other clusters.

I did look at the section 5 of the HOWTO before e-mailing the list
(even though I confused myself in using the sync option on the server
and client side). I just did not want to keep playing increasing
my number of NFSD / increasing the rsize + wsize before asking.

To answer someone else, the options as shown from the cat /proc/mount
are:

rw,v3,rsize=8192,wsize=8192,hard,intr,udp,lock,addr=xnfs1...

Something weird I noticed -> I switched the file server exports
from sync to async. Of course I got better performance in terms of
response time on the client side (up to 50s saved on some linking
operations during compilation).
In addition the load on the file server got divided
by 2 (need to do a better and longuer monitoring with ganglia for
example). Can such a load decrease be expected ? I am surprised.

Too side question

1) Is the Kernel of Redhat AS 2.1 broken NFS wise and I should speed-up
the upgrade of the file server to AS 3.0 ? Please note that using RH
on the file server was imposed by Dell + EMC for the PowerPath
install (redundant pathing)

2) Has anyone of the list any experience on running an NFS server
connected to an EMC Clariion SAN with their PowerPath software ?

Thanks - Didier.

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-07 04:40:16

by Lever, Charles

[permalink] [raw]

Subject: RE: Pb of optimization for a Cluster under Gigabit

hi didier-

do you really need to use the "sync" mount option on
the clients? the "sync" export option on the server
should be enough for most applications.

IP fragmentation is normal for any UDP-based protocol,
and your stats don't show any reassembly failures or
timeouts. btw you can get this information in slightly
friendlier form with "netstat -s".

> -----Original Message-----
> From: Didier CONTIS [mailto:[email protected]]=20
> Sent: Tuesday, April 06, 2004 11:23 PM
> To: [email protected]
> Subject: [NFS] Pb of optimization for a Cluster under Gigabit
>=20
>=20
>=20
>=20
> We have a cluster with ~60 Dell PowerEdge 1750 (dual cpu)
> running Redhat 9.0 (fully patched) connected via Gigabit
> to a stack of Catalyst 3750.
>=20
> The cluster has a dedicated NFS server also connected
> via Gigabit:
>=20
> Dell PowerEdge 2650 running AS 2.1 fully patched.
> The unit has a Raid 1 array for the OS and is connected
> via a dual Fiber Channel to a EMC Clarion SAN. We are
> running Powerpath. The server has also 1GB of memory.
>=20
> Its load is always 2 or higher an we have some flacky
> performance when copying files from one NFS partitions
> to another from the client:
>=20
> All the filesystem are exported with sync and mounted
> on the client (via autofs) with:
> rw,sync,hard,intr,rsize=3D8192,wsize=3D8192
>=20
> The time for copying a 40MB file from a NFS partition to local
> client filesystem is good.
>=20
> [didier@xfront2 ~]$ time cp jeffay.txt /tmp
> 0.010u 0.190s 0:05.19 3.8% 0+0k 0+0io 115pf+0w
>=20
> For copying same file from one NFS partition to another
> via the same client it takes more than a couple of minutes.
>=20
> We are running 96 nfsd on the file server with the Queue tune-up hack.
>=20
> The under /proc/net/rpc/nfsd
> [...]
> th 96 0 171.110 29.200 5.100 0.000 0.000 0.000 0.000 0.000 0.000 0.000
>=20
> looks good.
>=20
> It seems the file server is spending too much time doing ip frag work:
> uptime -> 18hours
>=20
> [didier@xnfs1 ~]$ cat /proc/net/snmp
> Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors
> ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests
> OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails
> FragOKs FragFails FragCreates
> Ip: 2 64 34249037 0 0 0 0 217 23273862 31176320 24384 0 0 16472823
> 5502518 0 0 0 10378060
> [...]
> Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens
> AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs=20
> InErrs OutRsts
> Tcp: 0 0 0 0 2532 0 0 0 1 54706 76945 15 0 12
> Udp: InDatagrams NoPorts InErrors OutDatagrams
> Udp: 23221263 75 217 23165842
>=20
> Would anyone have any suggestions or recommendations ? Should
> I switch rsize / wsize to 1024 ?
>=20
> Thanks - Didier
>=20
>=20
>=20
>=20
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> =
administration.http://ads.osdn.com/?ad_id=3D1470&alloc_id=3D3638&op=3Dcli=
ck
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-07 05:29:04

by Greg Banks

[permalink] [raw]

Subject: Re: Pb of optimization for a Cluster under Gigabit

Didier CONTIS wrote:
>
> All the filesystem are exported with sync and mounted
> on the client (via autofs) with:
> rw,sync,hard,intr,rsize=8192,wsize=8192
> [...]
> Would anyone have any suggestions or recommendations ? [...]

Try removing "sync" from your mount options.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-07 11:40:57

by Bogdan Costescu

[permalink] [raw]

Subject: Re: Pb of optimization for a Cluster under Gigabit

On Tue, 6 Apr 2004, Didier CONTIS wrote:

> Would anyone have any suggestions or recommendations ? Should
> I switch rsize / wsize to 1024 ?

If the network cards and switch allow it, you could try first using
Jumbo (9k) Ethernet frames; in such a setup, the 8k NFS packets would
most likely travel unfragmented between server and clients.

--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [email protected]

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-07 20:39:17

by Didier CONTIS

[permalink] [raw]

Subject: RE: Pb of optimization for a Cluster under Gigabit

>If the network cards and switch allow it, you could try first using
>Jumbo (9k) Ethernet frames; in such a setup, the 8k NFS packets would
>most likely travel unfragmented between server and clients.

While most of 90% of the hardware supports it, I have some legacy hardware
(nodes and switch) still running at 100Mbit connected to the stack of 3750.

Thanks - Didier.

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-07 20:46:58

by Didier CONTIS

[permalink] [raw]

Subject: RE: Pb of optimization for a Cluster under Gigabit

>-----Original Message-----
>From: Lever, Charles [mailto:[email protected]]
>Sent: Wednesday, April 07, 2004 12:40 AM
>To: Didier CONTIS
>Cc: [email protected]
>Subject: RE: [NFS] Pb of optimization for a Cluster under Gigabit
>
>hi didier-
>
>do you really need to use the "sync" mount option on
>the clients? the "sync" export option on the server
>should be enough for most applications.

That definitely helped. Now the NFS options passed
via automount to the client are:

rw,nfsvers=3,udp,hard,intr,rsize=8192,wsize=8192

However, the load of the nfs server still goes up to 3.5

What else could I check for ?

Thanks - Didier.

The output of nfsstat look like (after 2 days of uptime)

[didier@xnfs1 ~]$ /usr/sbin/nfsstat
Server rpc stats:
calls badcalls badauth badclnt xdrcall
58142076 0 0 0 0
Server nfs v2:
null getattr setattr root lookup readlink
1 100% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

Server nfs v3:
null getattr setattr lookup access readlink
1 0% 15158205 26% 53353 0% 996812 1% 42526 0% 1067 0%
read write create mkdir symlink mknod
7941851 13% 33614209 57% 139897 0% 5118 0% 865 0% 0 0%
remove rmdir rename link readdir readdirplus
92778 0% 651 0% 1080 0% 394 0% 15436 0% 0 0%
fsstat fsinfo pathconf commit
1106 0% 1106 0% 0 0% 75620 0%

>IP fragmentation is normal for any UDP-based protocol,
>and your stats don't show any reassembly failures or
>timeouts. btw you can get this information in slightly
>friendlier form with "netstat -s".

I got the following:

[didier@xnfs1 ~]$ netstat -s | more
Ip:
79283261 total packets received
0 forwarded
19816 incoming packets discarded
58326116 incoming packets delivered
87711576 requests sent out
148509 outgoing packets dropped
30853764 reassemblies required
9927976 packets reassembled ok
37294547 fragments created

[......]

Udp:
58212451 packets received
190 packets to unknown port received.
19816 packet receive errors
57985833 packets sent

[didier@xnfs1 ~]$ netstat -in
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP
TX-OVR Flg
eth0 1500 0 109165 0 0 0 144631 0 0
0 BMRU
eth1 1500 0 79305078 0 0 0 87676856 0 0
0 BMRU
lo 16436 0 299 0 0 0 299 0 0
0 LRU

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-07 20:58:30

by Lever, Charles

[permalink] [raw]

Subject: RE: Pb of optimization for a Cluster under Gigabit

> >do you really need to use the "sync" mount option on
> >the clients? the "sync" export option on the server
> >should be enough for most applications.
>=20
> That definitely helped. Now the NFS options passed
> via automount to the client are:
>=20
> rw,nfsvers=3D3,udp,hard,intr,rsize=3D8192,wsize=3D8192
>=20
> However, the load of the nfs server still goes up to 3.5

that may be completely normal.

i don't think the load average is a good indication of
how hard your server is working. is your application
throughput reasonable? any response time problems?

you should talk a walk through the NFS HOWTO, as it has
some good server performance tips.

http://nfs.sourceforge.net/

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-07 21:36:07

by Chris Worley

[permalink] [raw]

Subject: RE: Pb of optimization for a Cluster under Gigabit

On Wed, 2004-04-07 at 14:58, Lever, Charles wrote:
> >
> > That definitely helped. Now the NFS options passed
> > via automount to the client are:
> >
> > rw,nfsvers=3,udp,hard,intr,rsize=8192,wsize=8192
> >
> > However, the load of the nfs server still goes up to 3.5

Is that option string coming from /proc/mounts, or actually what you're
using? If it's coming from /proc/mounts, then not all options are
echoed... and, one thing to definitely get the load level up on the
server is using the "noac" option on the clients.

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs