2005-05-11 05:31:12

by Jeff Block

[permalink] [raw]
Subject: NFS Performance issues


We seem to be having some major performance problems on our redhat
enterprise linux 3 boxes. Some of our machines have RAIDs attached, some
have internal SCSI drives, and some have internal IDE drives. The one
thing all the boxes have in common is that there solaris counterparts are
putting them to shame in the nfs performance battle.

Here's some of the info and what we've already tried.
/etc/exports is simple:
/export/data @all-hosts(rw,sync)

The automounter is used so the default mount options are used, looks like
this:
server:/export/data /data/mountpoint nfs
rw,v3,rsize=8192,wsize=8192,hard,udp,lock,addr=server 0 0

We can't change the rsize and wsize on these mounts because the precompiled
redhat kernel for vers3 maxes out at 8K. We could of course compile our own
kernel, but doing this for more than a handful of machines can be a big
headache.

We've tried moving the journaling from RAID devices onto another internal
disk. This helped a little, but not much.

We have tried async, and that certainly does speed things up, but we are
definitely not comfortable with using async.

The big problem that we are having seems to do with copying a bunch of data
from one machine to another.

We have 683MB of test data that we were playing with that represents the
file sizes that our users play with. There are several small files in this
set so there is a lot of writes and commits. Our users generally work with
data sets in the multiple gigabyte range.

Test data - 683 MB
NFS Testing:
Client | Server | Storage | NFS cp Time | SCP Time
Solaris | Solaris | RAID | 1:32 | 1:59
Linux A | Solaris | RAID | 0:42 | 2:51
Linux A | Linux B | RAID5 /Journal to SCSI | 3:17 | 2:05
Linux A | Linux B | RAID5 /Journal to RAID | 5:07 | 1:45
Linux A | Linux B | SCSI | 3:17 | 1:52
Linux A | Linux B | IDE | 1:36 | 2:27

Other Tests

Internal Tests:
Host/Storage | Host/Storage | cp Time
Linux B Int. SCSI | Linux B Ext. RAID5 | 0:37
Sol Int. SCSI | Sol Ext. RAID5 | 0:35

Network:
Host A | Host B | Throughput
linux A | linux B | 893 Mbit/sec

Probably hard to read, but the bottom line is this:
Copying the 683MB from a linux host to a solaris RAID took 42 seconds.
Copying the same data from a linux host to a linux RAID took 5:07 or 3:17
depending on where the journal is stored. My SCP times from Linux to Linux
RAID are much quicker than my nfs copies which seems pretty backwards to me.

Thanks in advance for the help on this.

Jeff Block
Programmer / Analyst
Radiology Research Computing
University of California, San Francisco




-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-05-12 01:22:06

by Chris Penney

[permalink] [raw]
Subject: Re: NFS Performance issues

> We have 683MB of test data that we were playing with that represents the
> file sizes that our users play with. There are several small files in th=
is
> set so there is a lot of writes and commits. Our users generally work wi=
th
> data sets in the multiple gigabyte range.

This sounds similar to some of the CAE analysis work that some of the
NFS servers I maintain handle. Our Sun 480s w/ Veritas do a
reasonable job, but the linux boxes we have blow their doors off.

We are using JFS file systems (which was a huge improvement for us)
and using the 2.6 device-mapper to stripe across four 1TB luns. We
have dual cpu boxes w/ hyperthreading enabled and use 128 nfs threads.
All clients use a 32k r/wsize (which was also an improvement). We
don't use async for reliability reasons (I'm not sure with out setup
it would matter than much).

I aslo use the following in sysctl.conf:

net.core.rmem_default =3D 262144
net.core.wmem_default =3D 262144
net.core.rmem_max =3D 8388608
net.core.wmem_max =3D 8388608
net.ipv4.tcp_rmem =3D 4096 87380 8388608
net.ipv4.tcp_wmem =3D 4096 65536 8388608
net.ipv4.tcp_mem =3D 8388608 8388608 8388608

I can't say those tuneing options are formally tested. Perhaps
something with more understand could comment on them.

Chris


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-11 06:15:03

by NeilBrown

[permalink] [raw]
Subject: Re: NFS Performance issues

On Tuesday May 10, [email protected] wrote:
>
> We've tried moving the journaling from RAID devices onto another internal
> disk. This helped a little, but not much.
>

Are you using ext3?
Have you tried the "data=journal" mount option. It speeds up NFS
writes a lot.

NeilBrown


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-05-11 17:25:13

by Dan Stromberg

[permalink] [raw]
Subject: Re: NFS Performance issues


You might try upgrading to RHEL 4, or other linux with a 2.6.x kernel.

If you're on a gigabit network, you might try turning on jumbo frames.

NFS is known to have a lot of "back and forth", relative to other
protocols. For bulk transfers, you're far better off with something
like ftp, ssh, rsync - even on a system with pretty good NFS
performance... NFS is there for convenience, more than large data
transfers,IMO.

You might try firing up a sniffer against the NFS traffic, comparing
linux->linux, linux->solaris, solaris->linux, solaris->solaris. If one
pairing has a lot more retries than others, then you know to look for a
network problem.

You might try UDP if you're using TCP now, or TCP if you're using UDP
now. Theoretically, TCP should be better for long-haul transfers (lots
of router hops), and UDP should be better for local transfers through a
small (or even nil) number of routers. But we may be surprised. :)

If you're getting lots of retries, a smaller blocksize may actually
speed things up. (But check for network problems first)

You might try benchmarking the same data -locally-, without any network
involved, to see to what extent your RAID situation is contributing to
the slowdown you're seeing.

FUSE might be an interesting thing to try... To ditch NFS. :) I've
never installed a FUSE-based filesystem though, let alone benchmarked
one though.

Some vendor or other, is expected to be releasing some sort of NFS proxy
(which I believe probably functions a bit like "NX", of NoMachine fame -
IE, includes protocol-specific smarts to cache suitable chunks of data
on either side of the transmissions, and uses a hash table indexed by
cryptographic hashes to see if something similar was already transferred
recently, in which case the data can be simply pulled from a cache),
which should reduce the "back and forthing" of NFS significantly, and
hence give much better NFS performance. Unfortunately, the guy who
mentioned this was under an NDA, so I don't know the name of the
vendor. :(

HTH.

On Tue, 2005-05-10 at 22:31 -0700, Jeff Block wrote:
> We seem to be having some major performance problems on our redhat
> enterprise linux 3 boxes. Some of our machines have RAIDs attached, some
> have internal SCSI drives, and some have internal IDE drives. The one
> thing all the boxes have in common is that there solaris counterparts are
> putting them to shame in the nfs performance battle.
>
> Here's some of the info and what we've already tried.
> /etc/exports is simple:
> /export/data @all-hosts(rw,sync)
>
> The automounter is used so the default mount options are used, looks like
> this:
> server:/export/data /data/mountpoint nfs
> rw,v3,rsize=8192,wsize=8192,hard,udp,lock,addr=server 0 0
>
> We can't change the rsize and wsize on these mounts because the precompiled
> redhat kernel for vers3 maxes out at 8K. We could of course compile our own
> kernel, but doing this for more than a handful of machines can be a big
> headache.
>
> We've tried moving the journaling from RAID devices onto another internal
> disk. This helped a little, but not much.
>
> We have tried async, and that certainly does speed things up, but we are
> definitely not comfortable with using async.
>
> The big problem that we are having seems to do with copying a bunch of data
> from one machine to another.
>
> We have 683MB of test data that we were playing with that represents the
> file sizes that our users play with. There are several small files in this
> set so there is a lot of writes and commits. Our users generally work with
> data sets in the multiple gigabyte range.
>
> Test data - 683 MB
> NFS Testing:
> Client | Server | Storage | NFS cp Time | SCP Time
> Solaris | Solaris | RAID | 1:32 | 1:59
> Linux A | Solaris | RAID | 0:42 | 2:51
> Linux A | Linux B | RAID5 /Journal to SCSI | 3:17 | 2:05
> Linux A | Linux B | RAID5 /Journal to RAID | 5:07 | 1:45
> Linux A | Linux B | SCSI | 3:17 | 1:52
> Linux A | Linux B | IDE | 1:36 | 2:27
>
> Other Tests
>
> Internal Tests:
> Host/Storage | Host/Storage | cp Time
> Linux B Int. SCSI | Linux B Ext. RAID5 | 0:37
> Sol Int. SCSI | Sol Ext. RAID5 | 0:35
>
> Network:
> Host A | Host B | Throughput
> linux A | linux B | 893 Mbit/sec
>
> Probably hard to read, but the bottom line is this:
> Copying the 683MB from a linux host to a solaris RAID took 42 seconds.
> Copying the same data from a linux host to a linux RAID took 5:07 or 3:17
> depending on where the journal is stored. My SCP times from Linux to Linux
> RAID are much quicker than my nfs copies which seems pretty backwards to me.
>
> Thanks in advance for the help on this.
>
> Jeff Block
> Programmer / Analyst
> Radiology Research Computing
> University of California, San Francisco
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by Oracle Space Sweepstakes
> Want to be the first software developer in space?
> Enter now for the Oracle Space Sweepstakes!
> http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part