From: Trond Myklebust Subject: Re: Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing Date: Fri, 29 May 2009 12:46:35 -0400 Message-ID: <1243615595.7155.48.camel@heimdal.trondhjem.org> References: <5ECD2205-4DC9-41F1-AC5C-ADFA984745D3@oracle.com> <49FA0CE8.9090706@redhat.com> <1241126587.15476.62.camel@heimdal.trondhjem.org> Mime-Version: 1.0 Content-Type: text/plain Cc: Chuck Lever , linux-nfs@vger.kernel.org, linux-nfs-owner@vger.kernel.org, Peter Staubach To: Brian R Cowan Return-path: Received: from mail-out1.uio.no ([129.240.10.57]:57132 "EHLO mail-out1.uio.no" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752228AbZE2Qqk (ORCPT ); Fri, 29 May 2009 12:46:40 -0400 In-Reply-To: Sender: linux-nfs-owner@vger.kernel.org List-ID: Look... This happens when you _flush_ the file to stable storage if there is only a single write < wsize. It isn't the business of the NFS layer to decide when you flush the file; that's an application decision... Trond On Fri, 2009-05-29 at 11:55 -0400, Brian R Cowan wrote: > Been working this issue with Red hat, and didn't need to go to the list... > Well, now I do... You mention that "The main type of workload we're > targetting with this patch is the app that opens a file, writes < 4k and > then closes the file." Well, it appears that this issue also impacts > flushing pages from filesystem caches. > > The reason this came up in my environment is that our product's build > auditing gives the the filesystem cache an interesting workout. When > ClearCase audits a build, the build places data in a few places, > including: > 1) a build audit file that usually resides in /tmp. This build audit is > essentially a log of EVERY file open/read/write/delete/rename/etc. that > the programs called in the build script make in the clearcase "view" > you're building in. As a result, this file can get pretty large. > 2) The build outputs themselves, which in this case are being written to a > remote storage location on a Linux or Solaris server, and > 3) a file called .cmake.state, which is a local cache that is written to > after the build script completes containing what is essentially a "Bill of > materials" for the files created during builds in this "view." > > We believe that the build audit file access is causing build output to get > flushed out of the filesystem cache. These flushes happen *in 4k chunks.* > This trips over this change since the cache pages appear to get flushed on > an individual basis. > > One note is that if the build outputs were going to a clearcase view > stored on an enterprise-level NAS device, there isn't as much of an issue > because many of these return from the stable write request as soon as the > data goes into the battery-backed memory disk cache on the NAS. However, > it really impacts writes to general-purpose OS's that follow Sun's lead in > how they handle "stable" writes. The truly annoying part about this rather > subtle change is that the NFS client is specifically ignoring the client > mount options since we cannot force the "async" mount option to turn off > this behavior. > > ================================================================= > Brian Cowan > Advisory Software Engineer > ClearCase Customer Advocacy Group (CAG) > Rational Software > IBM Software Group > 81 Hartwell Ave > Lexington, MA > > Phone: 1.781.372.3580 > Web: http://www.ibm.com/software/rational/support/ > > > Please be sure to update your PMR using ESR at > http://www-306.ibm.com/software/support/probsub.html or cc all > correspondence to sw_support@us.ibm.com to be sure your PMR is updated in > case I am not available. > > > > From: > Trond Myklebust > To: > Peter Staubach > Cc: > Chuck Lever , Brian R Cowan/Cupertino/IBM@IBMUS, > linux-nfs@vger.kernel.org > Date: > 04/30/2009 05:23 PM > Subject: > Re: Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing > Sent by: > linux-nfs-owner@vger.kernel.org > > > > On Thu, 2009-04-30 at 16:41 -0400, Peter Staubach wrote: > > Chuck Lever wrote: > > > > > > On Apr 30, 2009, at 4:12 PM, Brian R Cowan wrote: > > >> > > >> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ab0a3dbedc51037f3d2e22ef67717a987b3d15e2 > > > >> > > Actually, the "stable" part can be a killer. It depends upon > > why and when nfs_flush_inode() is invoked. > > > > I did quite a bit of work on this aspect of RHEL-5 and discovered > > that this particular code was leading to some serious slowdowns. > > The server would end up doing a very slow FILE_SYNC write when > > all that was really required was an UNSTABLE write at the time. > > > > Did anyone actually measure this optimization and if so, what > > were the numbers? > > As usual, the optimisation is workload dependent. The main type of > workload we're targetting with this patch is the app that opens a file, > writes < 4k and then closes the file. For that case, it's a no-brainer > that you don't need to split a single stable write into an unstable + a > commit. > > So if the application isn't doing the above type of short write followed > by close, then exactly what is causing a flush to disk in the first > place? Ordinarily, the client will try to cache writes until the cows > come home (or until the VM tells it to reclaim memory - whichever comes > first)... > > Cheers > Trond > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > >