Received: by 2002:a05:6a10:2785:0:0:0:0 with SMTP id ia5csp588159pxb; Fri, 8 Jan 2021 12:37:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJw8ZVLgjz1BQ+qy/sdaQ/+r94am6tGxh66FLymNfKVHlbrgQ53Ae/1ktHzH2ObKTVJjvIWk X-Received: by 2002:a50:c053:: with SMTP id u19mr6452188edd.109.1610138220292; Fri, 08 Jan 2021 12:37:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1610138220; cv=none; d=google.com; s=arc-20160816; b=nKAvcqp6jfdPwoel1MKyhPYtxnD8KbyW2sToT8lNPeMbUevV+hBnC8EDzNiN1Soljt FW7HaPGUzkA+vvjxTmxwFxjrDblLDwyCX1Z4eR1tGFsdCAP7VJx0OI0bCobxPkSs9+Fw eNYzOyWDJmDoSjY1bCvmlZrvqOCXvMrEgIMhwzRkMOIEYNfT6oeZu1MW/QvYhKEld+l1 3N0Gl7mG109b50Ah5XRssxTOTg4fjVVkswA0s8ddtbgnrMJZ4z6ZGmjP13Bmf7/eep3p 0y/HdNgcjN7bgE9X+yDGmW92ICHCHIHRw7XFk6veh7SQgin0s5U687+poZ58MOrU/UC2 bJyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=xGRySeJaLUCcDkcPFI4SyswIZbrauVjjZAC6cDvvfJM=; b=0rLCb5EY4NrLeDk7zgCLdUbywUVVoTAU7WU93G9ZsOKHMsJnzTxprgsXTF9UxvOlbr rrUm1bx1YJEIduZxNrxYd0GejIxRksfsZaoYHJLTFj74maz+R+Vvz0vluKeiocYqKAAh JhtpNw8crpLRf4vColJt+vPgW10l+EYI0ouQeBPQ2WUNTK8wirD325qJwuBbU2UEhfCp zE3Xz8A41LRVsojYvbt6OHNK2hnvOqm/kBU9vMeDQnmFC9fMgs7vqoN6aNfsrj4TYx2c 0FzOXi3oet7vI3qJU49UTK7tGKp1CnNo9DthnASuvbJyqzj5JtTXo+irpVbfAJsFZpHD SDpA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 97si3984175edr.29.2021.01.08.12.36.36; Fri, 08 Jan 2021 12:37:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729069AbhAHUd1 (ORCPT + 99 others); Fri, 8 Jan 2021 15:33:27 -0500 Received: from mail106.syd.optusnet.com.au ([211.29.132.42]:53169 "EHLO mail106.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729251AbhAHUd1 (ORCPT ); Fri, 8 Jan 2021 15:33:27 -0500 Received: from dread.disaster.area (pa49-179-167-107.pa.nsw.optusnet.com.au [49.179.167.107]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id EFAD17655CD; Sat, 9 Jan 2021 07:32:43 +1100 (AEDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1kxyR7-004R24-RS; Sat, 09 Jan 2021 07:32:41 +1100 Date: Sat, 9 Jan 2021 07:32:41 +1100 From: Dave Chinner To: Andres Freund Cc: linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-block@vger.kernel.org Subject: Re: fallocate(FALLOC_FL_ZERO_RANGE_BUT_REALLY) to avoid unwritten extents? Message-ID: <20210108203241.GI331610@dread.disaster.area> References: <20201230062819.yinrrp6uwfegsqo3@alap3.anarazel.de> <20210106225201.GF331610@dread.disaster.area> <20210106234009.b6gbzl7bjm2evxj6@alap3.anarazel.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210106234009.b6gbzl7bjm2evxj6@alap3.anarazel.de> X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=Ubgvt5aN c=1 sm=1 tr=0 cx=a_idp_d a=+wqVUQIkAh0lLYI+QRsciw==:117 a=+wqVUQIkAh0lLYI+QRsciw==:17 a=kj9zAlcOel0A:10 a=EmqxpYm9HcoA:10 a=7-415B0cAAAA:8 a=wut9VHP1pO6yiD9O2Z4A:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, Jan 06, 2021 at 03:40:09PM -0800, Andres Freund wrote: > Hi, > > On 2021-01-07 09:52:01 +1100, Dave Chinner wrote: > > On Tue, Dec 29, 2020 at 10:28:19PM -0800, Andres Freund wrote: > > > Which brings me to $subject: > > > > > > Would it make sense to add a variant of FALLOC_FL_ZERO_RANGE that > > > doesn't convert extents into unwritten extents, but instead uses > > > blkdev_issue_zeroout() if supported? Mostly interested in xfs/ext4 > > > myself, but ... > > > > We have explicit requests from users (think initialising large VM > > images) that FALLOC_FL_ZERO_RANGE must never fall back to writing > > zeroes manually. > > That behaviour makes a lot of sense for quite a few use cases - I wasn't > trying to make it sound like it should not be available. Nor that > FALLOC_FL_ZERO_RANGE should behave differently. > > > > IOWs, while you might want FALLOC_FL_ZERO_RANGE to explicitly write > > zeros, we have users who explicitly don't want it to do this. > > Right - which is why I was asking for a variant of FALLOC_FL_ZERO_RANGE > (jokingly named FALLOC_FL_ZERO_RANGE_BUT_REALLY in the subject), rather > than changing the behaviour. > > > > Perhaps we should add want FALLOC_FL_CONVERT_RANGE, which tells the > > filesystem to convert an unwritten range of zeros to a written range > > by manually writing zeros. i.e. you do FALLOC_FL_ZERO_RANGE to zero > > the range and fill holes using metadata manipulation, followed by > > FALLOC_FL_WRITE_RANGE to then convert the "metadata zeros" to real > > written zeros. > > Yep, something like that would do the trick. Perhaps > FALLOC_FL_MATERIALIZE_RANGE? [ FWIW, I really dislike the "RANGE" part of fallocate flag names. It's redundant (fallocate always operates on a range!) and just makes names unnecessarily longer. ] I used "convert range" as the name explicitly because it has specific meaning for extent space manipulation. i.e. we "convert" extents from one state to another. "write range" is also has explicit meaning, in that it will convert extents from unwritten to written data. In comparison, "materialise" is something undefined, and could be easily thought to take something ephemeral (such as a hole) and turn it into something real (an allocated extent). We wouldn't want this operation to allocate space, so I think "materialise" is just too much magic to encoding into an API for an explicit, well defined state change. We also have people asking for ZERO_RANGE to just flip existing extents from written to unwritten (rather than the punch/preallocate we do now). This is also a "convert" operation, just in the other direction (from data to zeros rather than from zeros to data). The observation I'm making here is that these "convert" oeprations will both makes SEEK_HOLE/SEEK_DATA behave differently for the underlying data. preallocated space is considered a HOLE, written zeros are considered DATA. So we do expose the ability to check that a "convert" operation has actually changed the state of the underlying extents in either direction... CONVERT_TO_DATA/CONVERT_TO_ZERO as an operational pair whose behaviour is visible and easily testable via SEEK_HOLE/SEEK_DATA makes a lot more sense to me. Also defining them to fail fast if unwritten extents are not supported by the filesystem (i.e. they should -never- physically write anything) would also allow applications to fall back to ZERO_RANGE on filesystems that don't support unwritten extents to explicitly write zeros if CONVERT_TO_ZERO fails.... Cheers, Dave. -- Dave Chinner david@fromorbit.com