From: Olaf Kirch Subject: Re: Strange delays on NFS server Date: Wed, 11 Aug 2004 18:41:35 +0200 Sender: nfs-admin@lists.sourceforge.net Message-ID: <20040811164135.GA11101@suse.de> References: <4119FB15.7010205@stams.strath.ac.uk> <411A17F2.2060203@RedHat.com> <411A448D.3080205@stams.strath.ac.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Steve Dickson , nfs@lists.sourceforge.net Return-path: Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.12] helo=sc8-sf-mx2.sourceforge.net) by sc8-sf-list2.sourceforge.net with esmtp (Exim 4.30) id 1BuwAb-0008Ey-Io for nfs@lists.sourceforge.net; Wed, 11 Aug 2004 09:41:45 -0700 Received: from cantor.suse.de ([195.135.220.2]) by sc8-sf-mx2.sourceforge.net with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1BuwAb-0000FM-3Q for nfs@lists.sourceforge.net; Wed, 11 Aug 2004 09:41:45 -0700 To: Ian Thurlbeck In-Reply-To: <411A448D.3080205@stams.strath.ac.uk> Errors-To: nfs-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Post: List-Help: List-Subscribe: , List-Archive: On Wed, Aug 11, 2004 at 05:08:45PM +0100, Ian Thurlbeck wrote: > OK, I've been running "top -d 1 -i" and trying to see what comes up when > the server freezes. I caught one instance where a delay coincided > with about 15 nfsd + 1 kjournald process appearing in the top > display. I'm simultaneously looking at a graphical network tool to try > and see the traffic going to the server - anyone got a better suggestion? This sounds exactly like the COMMIT stall problem for which I submitted the early-writeout patch to this list about a week ago. I've been thinking about this a little more. It may be that one reason the problem is more pronounced in in 2.6 than in 2.4 is the new io barrier code. In 2.6 ext3 uses barriers by default; Suse's 2.6 has reiserfs patches that add barriers (and enables them by default). We've reports of this problem on both file systems. JFS does i/o barriers while XFS does not; and this also fits the pattern of what Ian reports. I dimly remember there's a kernel command line option to turn off barriers at the block io level. Can you try if that helps, Ian? The more I think about this, the more I believe the early-writeout patch is the right way to address this problem (short of turning off barriers). When data hits the NFS server, it is supposed to go to disk rather soonishly. This also covers most of the rewrite case, at least as long as you have just one application writing to the file - all rewriting happens in the client cache. The crucial question is, what is a good heursitic to choose when to initiate a write-out. Sequential writes to the end of file are easy enough to detect. I have a somewhat updated version of my patch that covers just this case, and exports a sysctl to let you tune how often it initiates an early write-out. Olaf -- Olaf Kirch | The Hardware Gods hate me. okir@suse.de | ---------------+ ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs