Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752248AbZLXCwe (ORCPT ); Wed, 23 Dec 2009 21:52:34 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751151AbZLXCwc (ORCPT ); Wed, 23 Dec 2009 21:52:32 -0500 Received: from mga03.intel.com ([143.182.124.21]:9243 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751032AbZLXCwc (ORCPT ); Wed, 23 Dec 2009 21:52:32 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.47,316,1257148800"; d="scan'208";a="226253217" Date: Thu, 24 Dec 2009 10:52:28 +0800 From: Wu Fengguang To: Trond Myklebust Cc: Jan Kara , Steve Rago , Peter Zijlstra , "linux-nfs@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "jens.axboe" , Peter Staubach , Arjan van de Ven , Ingo Molnar , "linux-fsdevel@vger.kernel.org" Subject: Re: [PATCH] improve the performance of large sequential write NFS workloads Message-ID: <20091224025228.GA12477@localhost> References: <1261015420.1947.54.camel@serenity> <1261037877.27920.36.camel@laptop> <20091219122033.GA11360@localhost> <1261232747.1947.194.camel@serenity> <20091222015907.GA6223@localhost> <1261578107.2606.11.camel@localhost> <20091223180551.GD3159@quack.suse.cz> <1261595574.6775.2.camel@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1261595574.6775.2.camel@localhost> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2988 Lines: 67 Trond, On Thu, Dec 24, 2009 at 03:12:54AM +0800, Trond Myklebust wrote: > On Wed, 2009-12-23 at 19:05 +0100, Jan Kara wrote: > > On Wed 23-12-09 15:21:47, Trond Myklebust wrote: > > > @@ -474,6 +482,18 @@ writeback_single_inode(struct inode *inode, struct writeback_control *wbc) > > > } > > > > > > spin_lock(&inode_lock); > > > + /* > > > + * Special state for cleaning NFS unstable pages > > > + */ > > > + if (inode->i_state & I_UNSTABLE_PAGES) { > > > + int err; > > > + inode->i_state &= ~I_UNSTABLE_PAGES; > > > + spin_unlock(&inode_lock); > > > + err = commit_unstable_pages(inode, wait); > > > + if (ret == 0) > > > + ret = err; > > > + spin_lock(&inode_lock); > > > + } > > I don't quite understand this chunk: We've called writeback_single_inode > > because it had some dirty pages. Thus it has I_DIRTY_DATASYNC set and a few > > lines above your chunk, we've called nfs_write_inode which sent commit to > > the server. Now here you sometimes send the commit again? What's the > > purpose? > > We no longer set I_DIRTY_DATASYNC. We only set I_DIRTY_PAGES (and later > I_UNSTABLE_PAGES). > > The point is that we now do the commit only _after_ we've sent all the > dirty pages, and waited for writeback to complete, whereas previously we > did it in the wrong order. Sorry I still don't get it. The timing used to be: write 4MB ==> WRITE block 0 (ie. first 512KB) WRITE block 1 WRITE block 2 WRITE block 3 ack from server for WRITE block 0 => mark 0 as unstable (inode marked need-commit) WRITE block 4 ack from server for WRITE block 1 => mark 1 as unstable WRITE block 5 ack from server for WRITE block 2 => mark 2 as unstable WRITE block 6 ack from server for WRITE block 3 => mark 3 as unstable WRITE block 7 ack from server for WRITE block 4 => mark 4 as unstable ack from server for WRITE block 5 => mark 5 as unstable write_inode ==> COMMIT blocks 0-5 ack from server for WRITE block 6 => mark 6 as unstable (inode marked need-commit) ack from server for WRITE block 7 => mark 7 as unstable ack from server for COMMIT blocks 0-5 => mark 0-5 as clean write_inode ==> COMMIT blocks 6-7 ack from server for COMMIT blocks 6-7 => mark 6-7 as clean Note that the first COMMIT is submitted before receiving all ACKs for the previous writes, hence the second COMMIT is necessary. It seems that your patch does not improve the timing at all. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/