From: Eric Sandeen Subject: Re: [PATCH RFC] Insure direct IO writes do not use the page cache Date: Wed, 29 Jul 2009 12:18:09 -0500 Message-ID: <4A708451.4060908@redhat.com> References: <6601abe90907281728h22be79fenc68a16b578e28a91@mail.gmail.com> <6601abe90907290910x7cf1122cwac689d1f106326d3@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: ext4 development To: Curt Wohlgemuth Return-path: Received: from mx2.redhat.com ([66.187.237.31]:35196 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750776AbZG2RST (ORCPT ); Wed, 29 Jul 2009 13:18:19 -0400 In-Reply-To: <6601abe90907290910x7cf1122cwac689d1f106326d3@mail.gmail.com> Sender: linux-ext4-owner@vger.kernel.org List-ID: Curt Wohlgemuth wrote: > Although replying to self is somewhat bad etiquette... nah :) > I've found at least one issue with this patch: Although the semantics > seem correct, since the late-converted-to-init extents are not merged > with neighbors, you can easily end up with thousands of extents :-( . > Each write to fallocate'd space results in its own initialized extent. > > I'm not sure how expensive it would be to merge the extents when they > are converted to initialized after the DIO write goes through. > > Curt > hm I think I've seen other cases where things don't get merged as well as I'd expect. I haven't replied to the first mail yet because I have a lot of remembering to do about xfs first, but I'm fairly certain that at least your use of blockdev_direct_IO_own_locking() is not correct. See for example all the comments around __blockdev_direct_IO about i_mutex, and all the xfs_ilock/xfs_iolock calls in xfs_read/xfs_write. There is a lot of locking for the fs to handle if you want to go that route. Also, IIRC xfs does the conversion to written (vs. unwritten) extents in an IO completion handler, just FWIW. -Eric