Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932887AbcCCW5T (ORCPT ); Thu, 3 Mar 2016 17:57:19 -0500 Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:16737 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758374AbcCCW5R (ORCPT ); Thu, 3 Mar 2016 17:57:17 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CpDAD4wNhWOVEqLHldKAECgw9SbaZVAwEBAQaLd4VIhAsdhWwCAgEBAoExTQEBAQEBAQcBAQEBQAFAQRIBg24BAQQnExwjEAgDGAklDwUlAwcaExuIBbsvAQEBAQYCARkEGIU4hQSIcgWXHIVah3+PAY5NhFooLocEgh4BAQE Date: Fri, 4 Mar 2016 09:56:46 +1100 From: Dave Chinner To: "Martin K. Petersen" Cc: Christoph Hellwig , Linus Torvalds , "Darrick J. Wong" , Jens Axboe , Andrew Morton , Linux API , Linux Kernel Mailing List , shane.seymour@hpe.com, Bruce Fields , linux-fsdevel , Jeff Layton Subject: Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks Message-ID: <20160303225646.GT29057@dastard> References: <20160302040932.16685.62789.stgit@birch.djwong.org> <20160302040947.16685.42926.stgit@birch.djwong.org> <20160302225601.GB21890@birch.djwong.org> <20160303180924.GA4116@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1979 Lines: 52 On Thu, Mar 03, 2016 at 01:54:54PM -0500, Martin K. Petersen wrote: > >>>>> "Christoph" == Christoph Hellwig writes: > > Christoph> - FALLOC_FL_PUNCH_HOLE assures zeroes are returned, but > Christoph> space is deallocated as much as possible - > Christoph> FALLOC_FL_ZERO_RANGE assures zeroes are returned, AND blocks > Christoph> are actually allocated > > That works for me. I think it would be great if we could have consistent > interfaces for fs and block. The more commonality the merrier. Absolutely in agreement here. it would be much nicer if filesystems could just call bdev->ops->fallocate(PUNCH_HOLE, off, len) and bdev->ops->fallocate(ZERO_RANGE, off, len) than all the weird "technology specific" blkdev_issue_foo() functions we have grown over time. Let the block device implement them as it sees fit - the higher levels don't need to care about protocol/technology details. --- FWIW, this reminds me of a "bigger picture" I think we should be working towards. Does anyone remember this: https://lwn.net/Articles/592091/ (Splitting filesytems in two) i.e. if we add fallocate support to punch holes, zero ranges and *allocate blocks* to a block device, we're mostly at the point where we can offload all freespace management that the filesystem currently does to the underlying block device. There's really only a small extension we'd need - the block allocation done by the block device needs to be able to return the the sector and length of the newly allocated extent. Indeed, this is something we talked about last year at LSFMM as a solution to the SMR write ordering problem: https://lwn.net/Articles/637035/ (near the end, paragraph talking about a "new kind of write command") That "new kind of write command" would enable delayed allocation algorithms to continue to work at the filesystem level on block devices that freespace management completely is offloaded to... Cheers, Dave. -- Dave Chinner david@fromorbit.com