Return-Path: Received: from rcsinet12.oracle.com ([148.87.113.124]:20250 "EHLO rgminet12.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755393AbZD3U2q (ORCPT ); Thu, 30 Apr 2009 16:28:46 -0400 Cc: linux-nfs@vger.kernel.org Message-Id: <5ECD2205-4DC9-41F1-AC5C-ADFA984745D3@oracle.com> From: Chuck Lever To: Brian R Cowan In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Subject: Re: Read/Write NFS I/O performance degraded by FLUSH_STABLE page flushing Date: Thu, 30 Apr 2009 16:28:24 -0400 References: Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On Apr 30, 2009, at 4:12 PM, Brian R Cowan wrote: > Hello all, > > This is my first post, so please be gentle.... I have been working > with a > customer who is attempting to build their product in ClearCase dynamic > views on Linux. When they went from Red hat Enterprise Linux 4 > (update 5) > to Red Hat Enterprise Linux 5 (Update 2), their build performance > degraded > dramatically. When troubleshooting the issue, we noticed that links on > RHEL 5 caused an incredible number of "STABLE" 4kb nfs writes even > though > the storage we were writing to was EXPLICITLY mounted async. (This > made > RHEL 5 nearly 5x slower than RHEL 4.5 in this area...) > > On consultation with some internal resources, we found this change > in the > 2.6 kernel: > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ab0a3dbedc51037f3d2e22ef67717a987b3d15e2 > > In here it looks like the NFS client is forcing sync writes any time a > write of less than the NFS write size occurs. We tested this > hypothesis by > setting the write size to 2KB. The "STABLE" writes went away and link > times came back down out of the stratosphere. We built a modified > kernel > based on the RHEL 5.2 kernel (that ONLY backed out of this change) > and we > got a 33% improvement in overall build speeds. In my case, I see > almost > identical build times between the 2 OS's when we use this modified > kernel > on RHEL 5. > > Now, why am I posing this to the list? I need to understand *why* that > change was made. On the face of it, simply backing out that patch > would be > perfect. I'm paranoid. I want to make sure that this is the ONLY > reason: > "/* For single writes, FLUSH_STABLE is more efficient */ " > > It seems more accurate to say that they *aren't* more efficient, but > rather are "safer, but slower." They are more efficient from the point of view that only a single RPC is needed for a complete write. The WRITE and COMMIT are done in a single request. I don't think the issue here is whether the write is stable, but it is whether the NFS client has to block the application for it. A stable write that is asynchronous to the application is faster than WRITE +COMMIT. So it's not "stable" that is holding you up, it's "synchronous." Those are orthogonal concepts. > I know that this is a 3+ year old update, but RHEL 4 is based on a 2.4 > kernel, Nope, RHEL 4 is 2.6.9. RHEL 3 is 2.4.20-ish. > and SLES 9 is based on something in the same ballpark. And our > customers see problems when they go to SLES 10/RHEL 5 from the prior > major > distro version. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com