Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932845AbXAXWrn (ORCPT ); Wed, 24 Jan 2007 17:47:43 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932846AbXAXWrO (ORCPT ); Wed, 24 Jan 2007 17:47:14 -0500 Received: from omx2-ext.sgi.com ([192.48.171.19]:47362 "EHLO omx2.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932845AbXAXWrL (ORCPT ); Wed, 24 Jan 2007 17:47:11 -0500 Date: Thu, 25 Jan 2007 09:46:54 +1100 From: David Chinner To: Nick Piggin Cc: Peter Zijlstra , David Chinner , linux-kernel@vger.kernel.org, xfs@oss.sgi.com, akpm@osdl.org Subject: Re: [PATCH 1/2]: Fix BUG in cancel_dirty_pages on XFS Message-ID: <20070124224654.GN33919298@melbourne.sgi.com> References: <20070123223702.GF33919298@melbourne.sgi.com> <1169640835.6189.14.camel@twins> <45B7627B.8050202@yahoo.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <45B7627B.8050202@yahoo.com.au> User-Agent: Mutt/1.4.2.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2498 Lines: 63 On Thu, Jan 25, 2007 at 12:43:23AM +1100, Nick Piggin wrote: > Peter Zijlstra wrote: > >On Wed, 2007-01-24 at 09:37 +1100, David Chinner wrote: > > > >>With the recent changes to cancel_dirty_pages(), XFS will > >>dump warnings in the syslog because it can truncate_inode_pages() > >>on dirty mapped pages. > >> > >>I've determined that this is indeed correct behaviour for XFS > >>as this can happen in the case of races on mmap()d files with > >>direct I/O. In this case when we do a direct I/O read, we > >>flush the dirty pages to disk, then truncate them out of the > >>page cache. Unfortunately, between the flush and the truncate > >>the mmap could dirty the page again. At this point we toss a > >>dirty page that is mapped. > > > > > >This sounds iffy, why not just leave the page in the pagecache if its > >mapped anyway? > > And why not just leave it in the pagecache and be done with it? because what is in cache is then not coherent with what is on disk, and a direct read is supposed to read the data that is present in the file at the time it is issued. > All you need is to do a writeout before a direct IO read, which is > what generic dio code does. No, that's not good enough - after writeout but before the direct I/O read is issued a process can fault the page and dirty it. If you do a direct read, followed by a buffered read you should get the same data. The only way to guarantee this is to chuck out any cached pages across the range of the direct I/O so they are fetched again from disk on the next buffered I/O. i.e. coherent at the time the direct I/O is issued. > I guess you'll say that direct writes still need to remove pages, Yup. > but in that case you'll either have to live with some racyness > (which is what the generic code does), or have a higher level > synchronisation to prevent buffered + direct IO writes I suppose? The XFS inode iolock - direct I/O writes take it shared, buffered writes takes it exclusive - so you can't do both at once. Buffered reads take is shared, which is another reason why we need to purge the cache on direct I/O writes - they can operate concurrently (and coherently) with buffered reads. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/