Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp100043rwd; Fri, 19 May 2023 16:09:10 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4908eyrxaKccox9um6s0r+YBFTgd2VTZdKaJSF37ywzyVhY03qvJMCAO2QWJPf7EPZZoA9 X-Received: by 2002:a17:903:24f:b0:1ae:3145:ab7a with SMTP id j15-20020a170903024f00b001ae3145ab7amr4671629plh.9.1684537749941; Fri, 19 May 2023 16:09:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684537749; cv=none; d=google.com; s=arc-20160816; b=fuC7lPVd7gGp+NUIsrV4yJif+YvynMzRxtF/iv2J6Nq6fRmWS+YS8dNI1sYGab6nSY OefZc/zoMadoJzsI2UJ5bagOh8KYr381dGOzmde+eKBiHhOhJPUifTCgCLwkpGqw5JoM oS1Zx8RCvUlTlVv8YsdVmhhVSmDHvgRhIivB6iMnlL/rH7/JM5x56DxVn3jgSsDJa3iA /ub+PwuKaU5vAe/6OG81J54g+cRxJ4QT/p30ZnN65kKpIWEwu4DCgloD+rzt3EVlwhlE I3mcrVkgJd3PvwVQ2APL1fB2Jvf1t0eI88D2SoAWErQtDfhZYWk9SByfSwFAibqgEVDf kqug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=vej0HUXiqshmcvqs7VBZTGkCXU+JddJuVreKpbmYuhw=; b=xTP9J//VvELl37Y+yw8i9L9EjulBpS6V9aqiW6tZCVEWGglk+1rf5Sfp+BEXoWytO+ Puh0jhW05K6ErcllPG3Ks+bmmiIph0e0p7H69EWUNxMnJQr2gTVuwGCulLmvRanp+XR/ L1ODGsbnBL/kIf7hunMGZLAi0oWgIQm8bszqnsvyhOlgKzMikFhESCA0Vi3eD/+LI4Ri xiZT4tl+0FI2iv8ILj3xcYCuhfnHaaleOKXPgcDfnKvHfRHevXuyDeKOckQk9EhDcCbp 0k4eoK8qgNGexVeBhqksBqvVyngZWggxpgIhrOgr5Nq0XpuosXfUUXJ/CfF4pA91Alv8 wC8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20221208.gappssmtp.com header.s=20221208 header.b=M7M+fUzZ; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 20-20020a170902e9d400b001ae5e5c275asi249816plk.364.2023.05.19.16.08.30; Fri, 19 May 2023 16:09:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20221208.gappssmtp.com header.s=20221208 header.b=M7M+fUzZ; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230266AbjESXHy (ORCPT + 99 others); Fri, 19 May 2023 19:07:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229528AbjESXHx (ORCPT ); Fri, 19 May 2023 19:07:53 -0400 Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2B05134 for ; Fri, 19 May 2023 16:07:50 -0700 (PDT) Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-64d293746e0so2315248b3a.2 for ; Fri, 19 May 2023 16:07:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20221208.gappssmtp.com; s=20221208; t=1684537670; x=1687129670; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=vej0HUXiqshmcvqs7VBZTGkCXU+JddJuVreKpbmYuhw=; b=M7M+fUzZKYFhCed62ckEM75i49Kzi0Sy+mNCDiHsZpcNM/HQImrznTrkS4Wi9NL1Qt OpBue7B/lTyS4Ky8m0UYNNuyU32olEvqU9kMDgIsurTQS74yymOzmc8h+RSkAmjSqxOP woaJGiEwnE03YgfMS1q3jv3TV55P4eda89l5rEF9YibVhRMXrAoasmZaFgYrpCbUd+oE XkNqBx9FFYqUpTZUSUyJJeWE0IiZORfli0QtnpbJ5rpT7PGgf7LZiQ+NCEMRM7WFfN1T vkwBHodfbKEBqQNapAiloS/SU1gESn6ZDbM8KRw093WyP1wcSNUWGyK7A0I+ybuXSTVQ 2JuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684537670; x=1687129670; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=vej0HUXiqshmcvqs7VBZTGkCXU+JddJuVreKpbmYuhw=; b=GCIIyWF5US6ZRSLFAc1K+vtqSDl1KNp2OZX/rGLp/LagP/P0dAFa7Mj3Itn3tIvLqX 4mCumhV63/K2awT3tv7syarOWF1eMPWWkp18xlnYbwvu/2UQclxotCW0sB6sK2q1jXsz E1R0xAdAVukJ0TvulHVE9WostGO91KKfhuR0QCF80dp/XKinUNnmh4eADQ5xVNVMSkO0 o3LPPRuggFKEfmUjPRTTDgwKU7dS+Xtr2c4LOH2g0pauKY+4ImKADBjSBLS+2FZMrJ+s 4FKhs88S5h18w2P/EWRcf0fbIOL85k8TGW6G/vyP3hQabrAlIMFDfJLj08ojnFfb82pG RL5w== X-Gm-Message-State: AC+VfDxEMMoZBcg+DExkfOL1s7R0XqigbND0T7rTSztnscmDtk8VxMdx 52NusqMkvx1GB9mvzn5Eag1gCw== X-Received: by 2002:a05:6a00:1896:b0:63b:854c:e0f6 with SMTP id x22-20020a056a00189600b0063b854ce0f6mr5344668pfh.21.1684537670148; Fri, 19 May 2023 16:07:50 -0700 (PDT) Received: from dread.disaster.area (pa49-179-0-188.pa.nsw.optusnet.com.au. [49.179.0.188]) by smtp.gmail.com with ESMTPSA id i6-20020aa78d86000000b006414289ab69sm204704pfr.52.2023.05.19.16.07.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 May 2023 16:07:49 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1q09CM-001WVR-2G; Sat, 20 May 2023 09:07:46 +1000 Date: Sat, 20 May 2023 09:07:46 +1000 From: Dave Chinner To: Mike Snitzer Cc: Christoph Hellwig , Sarthak Kukreti , dm-devel@redhat.com, linux-block@vger.kernel.org, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Jens Axboe , "Michael S. Tsirkin" , Jason Wang , Stefan Hajnoczi , Alasdair Kergon , Brian Foster , Theodore Ts'o , Andreas Dilger , Bart Van Assche , "Darrick J. Wong" Subject: Re: [PATCH v7 0/5] Introduce provisioning primitives Message-ID: References: <20230518223326.18744-1-sarthakkukreti@chromium.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri, May 19, 2023 at 10:41:31AM -0400, Mike Snitzer wrote: > On Fri, May 19 2023 at 12:09P -0400, > Christoph Hellwig wrote: > > > FYI, I really don't think this primitive is a good idea. In the > > concept of non-overwritable storage (NAND, SMR drives) the entire > > concept of a one-shoot 'provisioning' that will guarantee later writes > > are always possible is simply bogus. > > Valid point for sure, such storage shouldn't advertise support (and > will return -EOPNOTSUPP). > > But the primitive still has utility for other classes of storage. Yet the thing people are wanting to us filesystem developers to use this with is thinly provisioned storage that has snapshot capability. That, by definition, is non-overwritable storage. These are the use cases people are asking filesystes to gracefully handle and report errors when the sparse backing store runs out of space. e.g. journal writes after a snapshot is taken on a busy filesystem are always an overwrite and this requires more space in the storage device for the write to succeed. ENOSPC from the backing device for journal IO is a -fatal error-. Hence if REQ_PROVISION doesn't guarantee space for overwrites after snapshots, then it's not actually useful for solving the real world use cases we actually need device-level provisioning to solve. It is not viable for filesystems to have to reprovision space for in-place metadata overwrites after every snapshot - the filesystem may not even know a snapshot has been taken! And it's not feasible for filesystems to provision on demand before they modify metadata because we don't know what metadata is going to need to be modified before we start modifying metadata in transactions. If we get ENOSPC from provisioning in the middle of a dirty transcation, it's all over just the same as if we get ENOSPC during metadata writeback... Hence what filesystems actually need is device provisioned space to be -always over-writable- without ENOSPC occurring. Ideally, if we provision a range of the block device, the block device *must* guarantee all future writes to that LBA range succeeds. That guarantee needs to stand until we discard or unmap the LBA range, and for however many writes we do to that LBA range. e.g. If the device takes a snapshot, it needs to reprovision the potential COW ranges that overlap with the provisioned LBA range at snapshot time. e.g. by re-reserving the space from the backing pool for the provisioned space so if a COW occurs there is space guaranteed for it to succeed. If there isn't space in the backing pool for the reprovisioning, then whatever operation that triggers the COW behaviour should fail with ENOSPC before doing anything else.... Software devices like dm-thin/snapshot should really only need to keep a persistent map of the provisioned space and refresh space reservations for used space within that map whenever something that triggers COW behaviour occurs. i.e. a snapshot needs to reset the provisioned ranges back to "all ranges are freshly provisioned" before the snapshot is started. If that space is not available in the backing pool, then the snapshot attempt gets ENOSPC.... That means filesystems only need to provision space for journals and fixed metadata at mkfs time, and they only need issue a REQ_PROVISION bio when they first allocate over-write in place metadata. We already have online discard and/or fstrim for releasing provisioned space via discards. This will require some mods to filesystems like ext4 and XFS to issue REQ_PROVISION and fail gracefully during metadata allocation. However, doing so means that we can actually harden filesystems against sparse block device ENOSPC errors by ensuring they will never occur in critical filesystem structures.... -Dave. -- Dave Chinner david@fromorbit.com