From: Trond Myklebust Subject: Re: Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing Date: Fri, 29 May 2009 14:43:35 -0400 Message-ID: <1243622615.7155.109.camel@heimdal.trondhjem.org> References: <5ECD2205-4DC9-41F1-AC5C-ADFA984745D3@oracle.com> <49FA0CE8.9090706@redhat.com> <1241126587.15476.62.camel@heimdal.trondhjem.org> <41044976-395B-4ED0-BBA1-153FD76BDA53@oracle.com> <1243618968.7155.60.camel@heimdal.trondhjem.org> <4A2020AA.6050906@redhat.com> Mime-Version: 1.0 Content-Type: text/plain Cc: Brian R Cowan , Chuck Lever , linux-nfs@vger.kernel.org, linux-nfs-owner@vger.kernel.org To: Peter Staubach Return-path: Received: from mail-out1.uio.no ([129.240.10.57]:39616 "EHLO mail-out1.uio.no" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752662AbZE2Snn (ORCPT ); Fri, 29 May 2009 14:43:43 -0400 In-Reply-To: <4A2020AA.6050906@redhat.com> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Fri, 2009-05-29 at 13:51 -0400, Peter Staubach wrote: > Trond Myklebust wrote: > > On Fri, 2009-05-29 at 13:38 -0400, Brian R Cowan wrote: > > > >>> You may have a misunderstanding about what exactly "async" does. The > >>> "sync" / "async" mount options control only whether the application > >>> waits for the data to be flushed to permanent storage. They have no > >>> effect on any file system I know of _how_ specifically the data is > >>> moved from the page cache to permanent storage. > >>> > >> The problem is that the client change seems to cause the application to > >> stop until this stable write completes... What is interesting is that it's > >> not always a write operation that the linker gets stuck on. Our best > >> hypothesis -- from correlating times in strace and tcpdump traces -- is > >> that the FILE_SYNC'ed write NFS RPCs are in fact triggered by *read()* > >> system calls on the output file (that is opened for read/write). We THINK > >> the read call triggers a FILE_SYNC write if the page is dirty...and that > >> is why the read calls are taking so long. Seeing writes happening when the > >> app is waiting for a read is odd to say the least... (In my test, there is > >> nothing else running on the Virtual machines, so the only thing that could > >> be triggering the filesystem activity is the build test...) > >> > > > > Yes. If the page is dirty, but not up to date, then it needs to be > > cleaned before you can overwrite the contents with the results of a > > fresh read. > > That means flushing the data to disk... Which again means doing either a > > stable write or an unstable write+commit. The former is more efficient > > that the latter, 'cos it accomplishes the exact same work in a single > > RPC call. > > In the normal case, we aren't overwriting the contents with the > results of a fresh read. We are going to simply return the > current contents of the page. Given this, then why is the normal > data cache consistency mechanism, based on the attribute cache, > not sufficient? It is. You would need to look into why the page was not marked with the PG_uptodate flag when it was being filled. We generally do try to do that whenever possible. Trond