From: Dave Chinner Subject: Re: Ext4 and xfs problems in dm-thin on allocation and discard Date: Wed, 20 Jun 2012 06:39:38 +1000 Message-ID: <20120619203938.GM25389@dastard> References: <20120619031241.GA3884@redhat.com> <20120619131649.GA6811@redhat.com> <20120619133041.GB6811@redhat.com> <4FE0840F.2050704@shiftmail.org> <20120619144413.GA7225@redhat.com> <20120619184858.GA8841@redhat.com> <20120619200631.GL25389@dastard> <20120619202130.GF22805@thunk.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Mike Snitzer , Spelic , =?utf-8?B?THVrw6HFoQ==?= Czerner , device-mapper development , linux-ext4@vger.kernel.org, xfs@oss.sgi.com To: Ted Ts'o Return-path: Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:32165 "EHLO ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753677Ab2FSUjn (ORCPT ); Tue, 19 Jun 2012 16:39:43 -0400 Content-Disposition: inline In-Reply-To: <20120619202130.GF22805@thunk.org> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Tue, Jun 19, 2012 at 04:21:30PM -0400, Ted Ts'o wrote: > On Wed, Jun 20, 2012 at 06:06:31AM +1000, Dave Chinner wrote: > > > But in general xfs is issuing discards with much smaller extents than > > > ext4 does, e.g.: > > > > THat's normal when you use -o discard - XFS sends extremely > > fine-grained discards as the have to be issued during the checkpoint > > commit that frees the extent. Hence they can't be aggregated like is > > done in ext4. > > Actually, ext4 is also sending the discards during (well, actually, > after) the commit which frees the extent/inode. We do aggregate them > while the commit is open, but once the transaction is committed, we > send out the discards. I suspect the difference is in the granularity > of the transactions between ext4 and xfs. Exactly - XFS transactions are fine grained, checkpoints are coarse. We don't merge extents freed in fine grained transactions inside checkpoints. We probably could, but, well, it's complex to do in XFS and merging adjacent requests is something the block layer is supposed to do.... > > As it is, no-one really should be using -o discard - it is extremely > > inefficient compared to a background fstrim run given that discards > > are unqueued, blocking IOs. It's just a bad idea until the lower > > layers get fixed to allow asynchronous, vectored discards and SATA > > supports queued discards... > > What Dave said. :-) This is true for both ext4 and xfs. > > As a result, I can very easily see there being a distinction made > between when we *do* want to pass the discards all the way down to the > device, and when we only want the thinp layer to process them --- > because for current devices, sending discards down to the physical > device is very heavyweight. > > I'm not sure how we could do this without a nasty layering violation, > but some way in which we could label fstrim discards versus "we've > committed the unlink/truncate and so thinp can feel free to reuse > these blocks" discards would be interesting to consider. I think if we had better discard support from the block layer, it wouldn't matter from a filesystem POV what discard support is present in the block layer below it. I think it's better to get the block layer interface fixed than to add new request types/labels to filesystems to work around the current deficiencies. Cheers, Dave. -- Dave Chinner david@fromorbit.com