Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934328AbcCIWUl (ORCPT ); Wed, 9 Mar 2016 17:20:41 -0500 Received: from mail-yw0-f193.google.com ([209.85.161.193]:35132 "EHLO mail-yw0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754249AbcCIWUd convert rfc822-to-8bit (ORCPT ); Wed, 9 Mar 2016 17:20:33 -0500 MIME-Version: 1.0 In-Reply-To: <20160303231050.GU29057@dastard> References: <20160302040932.16685.62789.stgit@birch.djwong.org> <20160302040947.16685.42926.stgit@birch.djwong.org> <20160302225601.GB21890@birch.djwong.org> <20160303180924.GA4116@infradead.org> <20160303223952.GE24012@thunk.org> <20160303231050.GU29057@dastard> Date: Wed, 9 Mar 2016 14:20:31 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks From: Gregory Farnum To: Dave Chinner Cc: "Theodore Ts'o" , "Martin K. Petersen" , Christoph Hellwig , Linus Torvalds , "Darrick J. Wong" , Jens Axboe , Andrew Morton , Linux API , Linux Kernel Mailing List , shane.seymour@hpe.com, Bruce Fields , linux-fsdevel , Jeff Layton Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3172 Lines: 70 On Thu, Mar 3, 2016 at 3:10 PM, Dave Chinner wrote: > On Thu, Mar 03, 2016 at 05:39:52PM -0500, Theodore Ts'o wrote: >> On Thu, Mar 03, 2016 at 01:54:54PM -0500, Martin K. Petersen wrote: >> > >>>>> "Christoph" == Christoph Hellwig writes: >> > >> > Christoph> - FALLOC_FL_PUNCH_HOLE assures zeroes are returned, but >> > Christoph> space is deallocated as much as possible - >> > Christoph> FALLOC_FL_ZERO_RANGE assures zeroes are returned, AND blocks >> > Christoph> are actually allocated >> > >> > That works for me. I think it would be great if we could have consistent >> > interfaces for fs and block. The more commonality the merrier. >> >> So a question I have is do we want to add a "discard-as-a-hint" analog >> for fallocate? > > Well defined, reliable behaviour only, please. If the device can't > provide the required hardware offload, then it needs to use the > generic, slow implementation of the functionality or report > EOPNOTSUPP. > >> P.S. Speaking of things that are powerful and too dangerous for >> application programmers, after the Linux FAST workshop, I was having >> dinner with the Ceph developers and Ric Wheeler, and we were talking >> about things they really needed. Turns out they also could use an >> FALLOC_FL_NO_HIDE_STALE functionality. > > For better or for worse, Ceph is moving away from using filesystems > for its back end object store, so the use of such a hack in Ceph > has a very limited life. Well, let's be clear: the reason Ceph is moving away from using local filesystems is because we couldn't get the overheads of using them down to what we considered an acceptable level. There are always going to be some inefficiencies from it of course (since you have two metadata streams) but the more issues get addressed, the fewer userspace filesystems will feel or run up against the need to do their own block device management. :) If none of them get fixed the same scenario will just repeat itself — a userspace filesystem rises, it tries to get features it needs into the kernel, it eventually gives up and drops the kernel out of the loop, and then the fact that nobody's using the kernel in this scenario will be considered a reason not to make it work better. I really am sensitive to the security concerns, just know that if it's a permanent blocker you're essentially blocking out a growing category of disk users (who run on an awfully large number of disks!). -Greg > >> I told them I had an >> out-of-tree patch that had that functionality, and even Ric Wheeler >> started getting tempted.... :-) > > You can tempt all you want, but it does not change the basic fact > that it is dangerous and compromises system security. As such, it > does not belong in upstream kernels. Especially in this day and age > where ensuring the fundamental integrity of our systems is more > important than ever. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html