Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758242AbcCCSOU (ORCPT ); Thu, 3 Mar 2016 13:14:20 -0500 Received: from mail-io0-f194.google.com ([209.85.223.194]:35137 "EHLO mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756028AbcCCSOS (ORCPT ); Thu, 3 Mar 2016 13:14:18 -0500 MIME-Version: 1.0 In-Reply-To: References: <20160302040932.16685.62789.stgit@birch.djwong.org> <20160302040947.16685.42926.stgit@birch.djwong.org> <20160302225601.GB21890@birch.djwong.org> Date: Thu, 3 Mar 2016 10:14:17 -0800 X-Google-Sender-Auth: 23mlePmlq131_IZYZoGY04FSA1o Message-ID: Subject: Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks From: Linus Torvalds To: "Martin K. Petersen" Cc: "Darrick J. Wong" , Jens Axboe , Christoph Hellwig , Andrew Morton , Linux API , Linux Kernel Mailing List , shane.seymour@hpe.com, Bruce Fields , linux-fsdevel , Jeff Layton Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1513 Lines: 42 On Thu, Mar 3, 2016 at 10:01 AM, Martin K. Petersen wrote: >>>>>> "Linus" == Linus Torvalds writes: > > Linus> .. but the flag doesn't even set that. Even if you avoid TRIM, > Linus> there is absolutely zero guarantees that WRITE_SAME would do > Linus> "real storage blocks full of zeroes backing the LBAs they just > Linus> wrote out". > > That's not entirely true. Writing the blocks may cause them to be > allocated on the storage device (depending on which flags we feed it in > WRITE SAME). Ok, so now we're getting somewhere, with actual _reasons_ why somebody would want to use one interface over another. > The filesystems people were wanted the following semantics: > > - deallocate, don't care about contents for future reads (discard) > - deallocate, guarantee zeroes on future reads (zeroout) > - (re)allocate, guarantee zeroes on future reads (zeroout) > > Maybe we just need a better naming scheme... Yes. And this does make me think that Christoph is right: this would be so much better if the block layer just supported fallocate() instead, which already has those operations. Right now we have if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode)) return -ENODEV; so right now the vfs_fallocate() code expliitly disallows block devices, but that would be easy to expand. Would people be happy with that kind of patch instead? It would certainly make all my objections go away.. Linus