From: "Chris Wornell" Subject: NFS Help! Terrible performance with sync, fast performance with async Date: Sun, 19 Nov 2006 04:57:17 -0500 Message-ID: <49BCFA109293624D9AE0EC6558C0D3B802A8B327@ATL1VEXC017.usdom004.tco.tc> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0189757274==" Return-path: Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.92] helo=mail.sourceforge.net) by sc8-sf-list2-new.sourceforge.net with esmtp (Exim 4.43) id 1GljQT-000289-QO for nfs@lists.sourceforge.net; Sun, 19 Nov 2006 01:57:26 -0800 Received: from out002.iad.hostedmail.net ([209.225.56.24]) by mail.sourceforge.net with esmtp (Exim 4.44) id 1GljQT-0007gw-LT for nfs@lists.sourceforge.net; Sun, 19 Nov 2006 01:57:27 -0800 To: List-Id: "Discussion of NFS under Linux development, interoperability, and testing." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfs-bounces@lists.sourceforge.net Errors-To: nfs-bounces@lists.sourceforge.net This is a multi-part message in MIME format. --===============0189757274== Content-class: urn:content-classes:message Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C70BC1.1BE19EF7" This is a multi-part message in MIME format. ------_=_NextPart_001_01C70BC1.1BE19EF7 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I've got a problem that I've spent quite a bit of time on, though I'm not an expert at NFS. In summary, operations that require meta-data changes (such as file/directory creations/deletions), perform extremely slow over sync, but over 10x faster using async. I have two systems, connected to a GigE switch using intel pro 1000 NICs (jumbo frames is currently not enabled on any of the points).=20 The NFS server is a dual-core opteron system with 1GB of RAM and 3x300 SAS disk RAID-5 on a Perc5/i controller with 256MB battery backed cache (write cache is enabled). The file system is ext3. I've configured nfsd to spawn 32 processes upon startup. I'm using defaults for export the nfs shares, no changes to rsize or wsize. The NFS client is a dual Xeon with 4GB of RAM and a single 7200rpm SATA disk. Both systems are running RHEL WS 3 Update 8 and kernel 2.4.21-47.0.1.ELsmp.=20 For testing, I'm using bonnie++. The following are some sample test results that sum up the problem: Test on NFS server directly (not NFS loopback) -Sequential File Creation: 2976 -Sequential File Deletion: N/A -Random File Creation: 3077 -Random File Deletion: 9922 NFS test with sync enabled -Sequential File Creation: 39 -Sequential File Deletion: 79 -Random File Creation: 39 -Random File Delection: 65 NFS test with async enabled -Sequential File Creation: 575 -Sequential File Deletion: 1718 -Random File Creation: 543 -Random File Deletion: 1228 Based on the local performance of the NFS server, it does not appear the IO setup is the culprit. My understanding of the sync operation is a commit happens which means the NFS server doesn't reply back until the change has actually been committed to stable storage. There is something happening behind the scenes though which is causing a huge delay before the NFS server replies back the commit was complete. This question is actually work related and I'm planning to put the NFS server into production, but I'd rather not use async, even with a UPS and dual PSU's on the server. With the newer nfs-utils, sync is the default option as well so it seems like sync should perform relatively well. Another question is I don't quite understand how the data corruption happens if a power loss occurs on an NFS server using async. Even with sync, data transferred over the wire maybe loss if the nfs server gets shut down before that data is committed. Can anyone go into more detail on how the data corruption happens? Thanks a bunch! =20 Thanks, Chris Wornell Network Administrator, Information Technology Peerless Systems Corporation http://www.peerless.com =20 office: 310.727.5723 fax: 310.727.5715 mailto:cwornell@peerless.com =20 =20 ------_=_NextPart_001_01C70BC1.1BE19EF7 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

I've got a problem that I've spent quite a bit of time on, = though I'm not an expert at NFS. In summary, operations that require meta-data changes = (such as file/directory creations/deletions), perform extremely slow over = sync, but over 10x faster using async.

I have two systems, connected to a GigE switch using intel pro 1000 NICs = (jumbo frames is currently not enabled on any of the points).

The NFS server is a dual-core opteron system with 1GB of RAM and 3x300 = SAS disk RAID-5 on a Perc5/i controller with 256MB battery backed cache (write = cache is enabled). The file system is ext3. I've configured nfsd to spawn 32 = processes upon startup. I'm using defaults for export the nfs shares, no changes = to rsize or wsize.

The NFS client is a dual Xeon with 4GB of RAM and a single 7200rpm SATA = disk. Both systems are running RHEL WS 3 Update 8 and kernel = 2.4.21-47.0.1.ELsmp.

For testing, I'm using bonnie++. The following are some sample test = results that sum up the problem:

Test on NFS server directly (not NFS loopback)
-Sequential File Creation: 2976
-Sequential File Deletion: N/A
-Random File Creation: 3077
-Random File Deletion: 9922

NFS test with sync enabled
-Sequential File Creation: 39
-Sequential File Deletion: 79
-Random File Creation: 39
-Random File Delection: 65

NFS test with async enabled
-Sequential File Creation: 575
-Sequential File Deletion: 1718
-Random File Creation: 543
-Random File Deletion: 1228

Based on the local performance of the NFS server, it does not appear the = IO setup is the culprit. My understanding of the sync operation is a commit happens which means the NFS server doesn't reply back until the change = has actually been committed to stable storage. There is something happening = behind the scenes though which is causing a huge delay before the NFS server = replies back the commit was complete.

This question is actually work related and I'm planning to put the NFS = server into production, but I'd rather not use async, even with a UPS and dual = PSU's on the server. With the newer nfs-utils, sync is the default option as = well so it seems like sync should perform relatively well.

Another question is I don't quite understand how the data corruption = happens if a power loss occurs on an NFS server using async. Even with sync, data transferred over the wire maybe loss if the nfs server gets shut down = before that data is committed. Can anyone go into more detail on how the data corruption happens?

Thanks a bunch!

 

Thanks,

Chris Wornell
Network Administrator, Information Technology
Peerless Systems Corporation
http://www.peerless.com
office: 310.727.5723
fax: 310.727.5715
mailto:cwornell@peerless.com=

 

------_=_NextPart_001_01C70BC1.1BE19EF7-- --===============0189757274== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV --===============0189757274== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs --===============0189757274==--