Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1780990rwd; Thu, 25 May 2023 18:44:15 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7gyeva9IcWy81Gr/DkbuVOi3SkR77GQOMumtpkUnL1FretlQf/nRWNqRgZDOH7UAL/wNhJ X-Received: by 2002:a17:90a:498c:b0:253:6a05:1bce with SMTP id d12-20020a17090a498c00b002536a051bcemr607199pjh.35.1685065455214; Thu, 25 May 2023 18:44:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685065455; cv=none; d=google.com; s=arc-20160816; b=dwkLGFkYi4+3OLIxq32gRTCeUGM9c0/78FgJ/na3EjRti3QyrOpGGcLNBv0H68xora p4SPZVyEfmJIl+KcMi5U6U4W5YCc4PzMTAGdVa+LB0bK7AwdaafgtuFTwkfohzoVUEGB 2HeA46Gy/mcgpk8hQ9/E5V0nx0Bk/hzHWzfSYAJdTmfihIsJdumSbNsLFLepI7YHpwvC CoP8OHsa2ilFpKR5kF+jVS/ZISJqBVEomR5lau/K/2suNuQi0TcE7S34yAlBqUIkGuxd bHv5sHIoASp52xmg8t7UAQooTfKEBxeDt8jGu3rxaD2htlsDh0JhkDqw6da20yL5BM1E xQTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=bR3UToISzSLbX8ZQkHUE42zju/u1Pd13Qxkfw1ihU0s=; b=BQTXMTwlQzuKacHQocEeKQLh/RIuOL83q6Fx15GunYRJ5DUWF7QcHVtr4D89v+scVL wiYu6TdQ5MEJ3tz7+LJvw+/eQ8y7VzOqyDV/xP26rOBeQixXkYwSGOUQ0TUXVD56rl8U yqw3RlaaFyaRZNp2hxfuhv+HF+Dwqqj6TSjkQPwY0dl8irSLscAd9TTPPieB9x1EB5TW j1BlcPgp7JvSsLWX27Sm5WZLXgVtMWRdvNHIVIAgRJALovo10Rp41NoVSx6oPsF5VAlt 64uzHUfqvW/L7AKss3Ci60L7662f1SrKaV67ykF8E2Z+zdyhJ4/1lcMYtEN+1+Z3b1Tf +hjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20221208.gappssmtp.com header.s=20221208 header.b=xBykKLfg; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ne10-20020a17090b374a00b0024e3527e8e9si2239320pjb.9.2023.05.25.18.43.57; Thu, 25 May 2023 18:44:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20221208.gappssmtp.com header.s=20221208 header.b=xBykKLfg; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241882AbjEZBgQ (ORCPT + 99 others); Thu, 25 May 2023 21:36:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241630AbjEZBgQ (ORCPT ); Thu, 25 May 2023 21:36:16 -0400 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 470D0199 for ; Thu, 25 May 2023 18:36:13 -0700 (PDT) Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-64d2c865e4eso350489b3a.0 for ; Thu, 25 May 2023 18:36:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20221208.gappssmtp.com; s=20221208; t=1685064973; x=1687656973; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=bR3UToISzSLbX8ZQkHUE42zju/u1Pd13Qxkfw1ihU0s=; b=xBykKLfgpBaSvfRX3A2+FxT1KMrbnM15im+s6pEISEci1JWoy+6UGBvACHOKnzsNQr afk90Xh/M4D33/CZyC3EusZmqc1ApG7JjLCpW6aX3SWhZt9ZYbyqO7WMBxqRgMWFTHtq HSdOCN6vwq8iD2sZuRhSnYIqjspgSjnwj/JJkdgbMcgZZCUfDvhe0451GrKyYqY5eIvT gbFjo76t5wzNhvbQD1+eZ6JBj+DjthvcCoZA0AU3MkAccbMC5I/5OCQHLQBwb95rcN0F aZC1+itQVaJaNWNgoirfbLXH3K81rFtIImWJ/54lSzn69rMSJfSeM4eOwdpVzh8fi8GZ 2dwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685064973; x=1687656973; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bR3UToISzSLbX8ZQkHUE42zju/u1Pd13Qxkfw1ihU0s=; b=k2f7CurLRc3ZLVZOuMM7sHFBQm4A0YZWme0M0xFbj/tUF3huzdCYrFEHNHW3AVYla4 TTtBeFckyRM9yEWve1/Syel1LELAhCenlRaZ2rXaJpT8jNjk6iMY43mDFCNL0h4seqds URBXdUNRP6wbjdRBoKJuNde6uBYk8JMeUiGdEDQAfuOZuMswgh/QYYTUi9aFeDR7sgWl l4XZgk/1URJBeRM9yzcpU1VXhPXtbgAg0Fdp6yNYYGc4GnUh8B7A7sstG4Lv8WlxcWDP e0W9l65EOlzbcL4DpmRh4PtUqog0hsYA44Pw2YF1CjG2WRQ/DSRKSuDyOFcz9latP2ar 9Rsg== X-Gm-Message-State: AC+VfDw2t8ToljLn07kon8745AFvF3hlcCy/SGBAtbbNvGKDVoY1EiU9 WWYssJ/yUfEPdwOExErEQKkzlg== X-Received: by 2002:a05:6a00:124a:b0:643:96bc:b292 with SMTP id u10-20020a056a00124a00b0064396bcb292mr1061741pfi.5.1685064972718; Thu, 25 May 2023 18:36:12 -0700 (PDT) Received: from dread.disaster.area (pa49-179-0-188.pa.nsw.optusnet.com.au. [49.179.0.188]) by smtp.gmail.com with ESMTPSA id g9-20020a62e309000000b0063efe2f3ecdsm1679539pfh.204.2023.05.25.18.36.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 May 2023 18:36:11 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1q2MNF-003wt9-15; Fri, 26 May 2023 11:36:09 +1000 Date: Fri, 26 May 2023 11:36:09 +1000 From: Dave Chinner To: Sarthak Kukreti Cc: Mike Snitzer , Joe Thornber , Jens Axboe , linux-block@vger.kernel.org, Theodore Ts'o , Stefan Hajnoczi , "Michael S. Tsirkin" , "Darrick J. Wong" , Brian Foster , Bart Van Assche , linux-kernel@vger.kernel.org, Christoph Hellwig , dm-devel@redhat.com, Andreas Dilger , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, Jason Wang , Alasdair Kergon Subject: Re: [PATCH v7 0/5] Introduce provisioning primitives Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu, May 25, 2023 at 03:47:21PM -0700, Sarthak Kukreti wrote: > On Thu, May 25, 2023 at 9:00 AM Mike Snitzer wrote: > > On Thu, May 25 2023 at 7:39P -0400, > > Dave Chinner wrote: > > > On Wed, May 24, 2023 at 04:02:49PM -0400, Mike Snitzer wrote: > > > > On Tue, May 23 2023 at 8:40P -0400, > > > > Dave Chinner wrote: > > > > > It's worth noting that XFS already has a coarse-grained > > > > > implementation of preferred regions for metadata storage. It will > > > > > currently not use those metadata-preferred regions for user data > > > > > unless all the remaining user data space is full. Hence I'm pretty > > > > > sure that a pre-provisioning enhancment like this can be done > > > > > entirely in-memory without requiring any new on-disk state to be > > > > > added. > > > > > > > > > > Sure, if we crash and remount, then we might chose a different LBA > > > > > region for pre-provisioning. But that's not really a huge deal as we > > > > > could also run an internal background post-mount fstrim operation to > > > > > remove any unused pre-provisioning that was left over from when the > > > > > system went down. > > > > > > > > This would be the FITRIM with extension you mention below? Which is a > > > > filesystem interface detail? > > > > > > No. We might reuse some of the internal infrastructure we use to > > > implement FITRIM, but that's about it. It's just something kinda > > > like FITRIM but with different constraints determined by the > > > filesystem rather than the user... > > > > > > As it is, I'm not sure we'd even need it - a preiodic userspace > > > FITRIM would acheive the same result, so leaked provisioned spaces > > > would get cleaned up eventually without the filesystem having to do > > > anything specific... > > > > > > > So dm-thinp would _not_ need to have new > > > > state that tracks "provisioned but unused" block? > > > > > > No idea - that's your domain. :) > > > > > > dm-snapshot, for certain, will need to track provisioned regions > > > because it has to guarantee that overwrites to provisioned space in > > > the origin device will always succeed. Hence it needs to know how > > > much space breaking sharing in provisioned regions after a snapshot > > > has been taken with be required... > > > > dm-thinp offers its own much more scalable snapshot support (doesn't > > use old dm-snapshot N-way copyout target). > > > > dm-snapshot isn't going to be modified to support this level of > > hardening (dm-snapshot is basically in "maintenance only" now). Ah, of course. Sorry for the confusion, I was kinda using dm-snapshot as shorthand for "dm-thinp + snapshots". > > But I understand your meaning: what you said is 100% applicable to > > dm-thinp's snapshot implementation and needs to be accounted for in > > thinp's metadata (inherent 'provisioned' flag). *nod* > A bit orthogonal: would dm-thinp need to differentiate between > user-triggered provision requests (eg. from fallocate()) vs > fs-triggered requests? Why? How is the guarantee the block device has to provide to provisioned areas different for user vs filesystem internal provisioned space? > I would lean towards user provisioned areas not > getting dedup'd on snapshot creation, Snapshotting is a clone operation, not a dedupe operation. Yes, the end result of both is that you have a block shared between multiple indexes that needs COW on the next overwrite, but the two operations that get to that point are very different... > but that would entail tracking > the state of the original request and possibly a provision request > flag (REQ_PROVISION_DEDUP_ON_SNAPSHOT) or an inverse flag > (REQ_PROVISION_NODEDUP). Possibly too convoluted... Let's not try to add everyone's favourite pony to this interface before we've even got it off the ground. It's the simple precision of the API, the lack of cross-layer communication requirements and the ability to implement and optimise the independent layers independently that makes this a very appealing solution. We need to start with getting the simple stuff working and prove the concept. Then once we can observe the behaviour of a working system we can start working on optimising individual layers for efficiency and performance.... Cheers, Dave. -- Dave Chinner david@fromorbit.com