Cc: linux-nfs@vger.kernel.org
Message-Id: <5ECD2205-4DC9-41F1-AC5C-ADFA984745D3@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
To: Brian R Cowan <brcowan@us.ibm.com>
In-Reply-To: <OF3EBF546E.60A83A8F-ON852575A8.006EBB38-852575A8.006EFDE6@us.ibm.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Subject: Re: Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing
Date: Thu, 30 Apr 2009 16:28:24 -0400
References: <OF3EBF546E.60A83A8F-ON852575A8.006EBB38-852575A8.006EFDE6@us.ibm.com>
Sender: linux-nfs-owner@vger.kernel.org
MIME-Version: 1.0


On Apr 30, 2009, at 4:12 PM, Brian R Cowan wrote:

> Hello all,
>
> This is my first post, so please be gentle.... I have been working  
> with a
> customer who is attempting to build their product in ClearCase dynamic
> views on Linux. When they went from Red hat Enterprise Linux 4  
> (update 5)
> to Red Hat Enterprise Linux 5 (Update 2), their build performance  
> degraded
> dramatically. When troubleshooting the issue, we noticed that links on
> RHEL 5 caused an incredible number of "STABLE" 4kb nfs writes even  
> though
> the storage we were writing to was EXPLICITLY mounted async. (This  
> made
> RHEL 5 nearly 5x slower than RHEL 4.5 in this area...)
>
> On consultation with some internal resources, we found this change  
> in the
> 2.6 kernel:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ab0a3dbedc51037f3d2e22ef67717a987b3d15e2
>
> In here it looks like the NFS client is forcing sync writes any time a
> write of less than the NFS write size occurs. We tested this  
> hypothesis by
> setting the write size to 2KB. The "STABLE" writes went away and link
> times came back down out of the stratosphere. We built a modified  
> kernel
> based on the RHEL 5.2 kernel (that ONLY backed out of this change)  
> and we
> got a 33% improvement in overall build speeds. In my case, I see  
> almost
> identical build times between the 2 OS's when we use this modified  
> kernel
> on RHEL 5.
>
> Now, why am I posing this to the list? I need to understand *why* that
> change was made. On the face of it, simply backing out that patch  
> would be
> perfect. I'm paranoid. I want to make sure that this is the ONLY  
> reason:
> "/* For single writes, FLUSH_STABLE is more efficient */ "
>
> It seems more accurate to say that they *aren't* more efficient, but
> rather are "safer, but slower."

They are more efficient from the point of view that only a single RPC  
is needed for a complete write.  The WRITE and COMMIT are done in a  
single request.

I don't think the issue here is whether the write is stable, but it is  
whether the NFS client has to block the application for it.  A stable  
write that is asynchronous to the application is faster than WRITE 
+COMMIT.

So it's not "stable" that is holding you up, it's "synchronous."   
Those are orthogonal concepts.

> I know that this is a 3+ year old update, but RHEL 4 is based on a 2.4
> kernel,

Nope, RHEL 4 is 2.6.9.  RHEL 3 is 2.4.20-ish.

> and SLES 9 is based on something in the same ballpark. And our
> customers see problems when they go to SLES 10/RHEL 5 from the prior  
> major
> distro version.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com