Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1822544rwd; Thu, 25 May 2023 19:36:44 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6Folsx1PEaFsdWVqYW/M1rwOdgZc01+vPnRFR3eZmT9PI6BvPK+7cKx1ffU9/5+NdLiLvX X-Received: by 2002:a17:902:bc43:b0:1af:de3d:bbe6 with SMTP id t3-20020a170902bc4300b001afde3dbbe6mr1123985plz.2.1685068604532; Thu, 25 May 2023 19:36:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685068604; cv=none; d=google.com; s=arc-20160816; b=j8XWNiAPXSmRlNPt2EjALyDk5zfQu9c6nFH4xk7EGM7Kl2ehpFkr6v7Vuj55Bs4GO2 xPHBbpiPBIcqUMZMHPaPI7z5vjOj37EeejNQ5F+o4kXIFCzEU2oLj/dO5aYSzdjOL8Yb oXwhVF0Cob224Gtye/dCrnAJtQy2PxLOLdAFJo7XuNjBkpnKmeQcszSngCfcKfgIeMNy UItXYNo0tySGShGKIj3K8PxjN4Zk0TYrE1CiIM7g0eP38yrfmNH/53qRhecMyD7UdeHs agf7/y4xlxctEwE0QA6iQycXbs4BGsWpHDm1UVeHLv4lCqKwJId+IyT4YKTmlgJKThyk Athg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=+W8m3Ca/C7ABr7TnX4n7WXczAjnHTy6VxW28/pga+l4=; b=hVsfJYvwcn1y/UdJvfF1YnkiMwrsQ322cI82qvimV5eJKyc+F/1J6yMm9D3X2fPeVE g6zdQVvfDMiL4JHDT9CJUJy4lCOo/r8H/Lg1Yq0YC52DNRavuKHfO/NSFEaaLJ0OHk2b j4fjrWFDoVZGMR7SIxI8PvPtZ1UUQxV/N8vFoS53LMxjrzifiKt9DI31K6ABgWY7hUvT liZILXwGJHHblarMUYsfv05SuBD7iprsXU6mfohBymkzNOo8qb1/pNdWKp17McVRyuxy mHKnV2Yv/mYuP4QJ8Qq5rOX1beHg0tw9LUlZ6JjmwHDRXrNVyMPyrvXsqx7BPMbLdyoO XQIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="Wavv/Ov2"; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e21-20020a170902d39500b001ae5a8856fesi2805713pld.189.2023.05.25.19.36.26; Thu, 25 May 2023 19:36:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="Wavv/Ov2"; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229865AbjEZCfe (ORCPT + 99 others); Thu, 25 May 2023 22:35:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231990AbjEZCfd (ORCPT ); Thu, 25 May 2023 22:35:33 -0400 Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE3D31B3 for ; Thu, 25 May 2023 19:35:27 -0700 (PDT) Received: by mail-ed1-x52b.google.com with SMTP id 4fb4d7f45d1cf-510d1972d5aso266024a12.0 for ; Thu, 25 May 2023 19:35:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1685068526; x=1687660526; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+W8m3Ca/C7ABr7TnX4n7WXczAjnHTy6VxW28/pga+l4=; b=Wavv/Ov2JSnmm+7j2BYBnmhKRyztSttW1+mnqq7+2MZ9K1A7mAmZr4EJLFq5EASJ6x t2/A3ejCByvnHo+RksigZM4+BFRkPM7YDCuoT3AfyWAv8H7eZA9+C6oL5/qxyUSzxjfC JCU/e5goHDnpHb9hX4afGFLT/JVQv7M4AnxEU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685068526; x=1687660526; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+W8m3Ca/C7ABr7TnX4n7WXczAjnHTy6VxW28/pga+l4=; b=OE0u8gOePJLKD27CTfWpSBMzzpeL8kZPR14O2LY9a4bt3izjhfG2dDuWNrq54ZM+IA 8tt5qR7tqDgqaSjcwONqljpv+GeFN/eimuGiwBAOfdFTAt0HElJeJ++BgwbSgCdvzQs4 KCyukDimlZnYYi3MsJ/BaAWwqi5wVH+Uob7LC/4yUo/WFt0RZNbbSPjWiJbexMolTqaT 8A8d6VA84tkHeumGtBT5t0Z0gCRbDxxX0ZDFHyE1Ofki2nQgI3SgbaT9nJZ1v0zlYDH0 4zfwthCP3XvK3X60mKb/FoTo/Ub0GNZquS+JxUqkY/PY9WatapgvI/YRVUggFibcwt6U FtDg== X-Gm-Message-State: AC+VfDwn+uMMkMBrcYXv5yLvwbnhC4JDZq/04hZ3LD8qIyRCcu4gX5C8 Kr7NnVc5VS42FQffh1BGRQTqwD06ClKdlxaolNgHEQ== X-Received: by 2002:a17:906:dacb:b0:96b:e92:4feb with SMTP id xi11-20020a170906dacb00b0096b0e924febmr572925ejb.60.1685068526183; Thu, 25 May 2023 19:35:26 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Sarthak Kukreti Date: Thu, 25 May 2023 19:35:14 -0700 Message-ID: Subject: Re: [PATCH v7 0/5] Introduce provisioning primitives To: Dave Chinner Cc: Mike Snitzer , Joe Thornber , Jens Axboe , linux-block@vger.kernel.org, "Theodore Ts'o" , Stefan Hajnoczi , "Michael S. Tsirkin" , "Darrick J. Wong" , Brian Foster , Bart Van Assche , linux-kernel@vger.kernel.org, Christoph Hellwig , dm-devel@redhat.com, Andreas Dilger , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, Jason Wang , Alasdair Kergon Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu, May 25, 2023 at 6:36=E2=80=AFPM Dave Chinner = wrote: > > On Thu, May 25, 2023 at 03:47:21PM -0700, Sarthak Kukreti wrote: > > On Thu, May 25, 2023 at 9:00=E2=80=AFAM Mike Snitzer wrote: > > > On Thu, May 25 2023 at 7:39P -0400, > > > Dave Chinner wrote: > > > > On Wed, May 24, 2023 at 04:02:49PM -0400, Mike Snitzer wrote: > > > > > On Tue, May 23 2023 at 8:40P -0400, > > > > > Dave Chinner wrote: > > > > > > It's worth noting that XFS already has a coarse-grained > > > > > > implementation of preferred regions for metadata storage. It wi= ll > > > > > > currently not use those metadata-preferred regions for user dat= a > > > > > > unless all the remaining user data space is full. Hence I'm pr= etty > > > > > > sure that a pre-provisioning enhancment like this can be done > > > > > > entirely in-memory without requiring any new on-disk state to b= e > > > > > > added. > > > > > > > > > > > > Sure, if we crash and remount, then we might chose a different = LBA > > > > > > region for pre-provisioning. But that's not really a huge deal = as we > > > > > > could also run an internal background post-mount fstrim operati= on to > > > > > > remove any unused pre-provisioning that was left over from when= the > > > > > > system went down. > > > > > > > > > > This would be the FITRIM with extension you mention below? Which = is a > > > > > filesystem interface detail? > > > > > > > > No. We might reuse some of the internal infrastructure we use to > > > > implement FITRIM, but that's about it. It's just something kinda > > > > like FITRIM but with different constraints determined by the > > > > filesystem rather than the user... > > > > > > > > As it is, I'm not sure we'd even need it - a preiodic userspace > > > > FITRIM would acheive the same result, so leaked provisioned spaces > > > > would get cleaned up eventually without the filesystem having to do > > > > anything specific... > > > > > > > > > So dm-thinp would _not_ need to have new > > > > > state that tracks "provisioned but unused" block? > > > > > > > > No idea - that's your domain. :) > > > > > > > > dm-snapshot, for certain, will need to track provisioned regions > > > > because it has to guarantee that overwrites to provisioned space in > > > > the origin device will always succeed. Hence it needs to know how > > > > much space breaking sharing in provisioned regions after a snapshot > > > > has been taken with be required... > > > > > > dm-thinp offers its own much more scalable snapshot support (doesn't > > > use old dm-snapshot N-way copyout target). > > > > > > dm-snapshot isn't going to be modified to support this level of > > > hardening (dm-snapshot is basically in "maintenance only" now). > > Ah, of course. Sorry for the confusion, I was kinda using > dm-snapshot as shorthand for "dm-thinp + snapshots". > > > > But I understand your meaning: what you said is 100% applicable to > > > dm-thinp's snapshot implementation and needs to be accounted for in > > > thinp's metadata (inherent 'provisioned' flag). > > *nod* > > > A bit orthogonal: would dm-thinp need to differentiate between > > user-triggered provision requests (eg. from fallocate()) vs > > fs-triggered requests? > > Why? How is the guarantee the block device has to provide to > provisioned areas different for user vs filesystem internal > provisioned space? > After thinking this through, I stand corrected. I was primarily concerned with how this would balloon thin snapshot sizes if users potentially provision a large chunk of the filesystem but that's putting the cart way before the horse. Best Sarthak > > I would lean towards user provisioned areas not > > getting dedup'd on snapshot creation, > > > > Snapshotting is a clone operation, not a dedupe operation. > > Yes, the end result of both is that you have a block shared between > multiple indexes that needs COW on the next overwrite, but the two > operations that get to that point are very different... > > > > > but that would entail tracking > > the state of the original request and possibly a provision request > > flag (REQ_PROVISION_DEDUP_ON_SNAPSHOT) or an inverse flag > > (REQ_PROVISION_NODEDUP). Possibly too convoluted... > > Let's not try to add everyone's favourite pony to this interface > before we've even got it off the ground. > > It's the simple precision of the API, the lack of cross-layer > communication requirements and the ability to implement and optimise > the independent layers independently that makes this a very > appealing solution. > > We need to start with getting the simple stuff working and prove the > concept. Then once we can observe the behaviour of a working system > we can start working on optimising individual layers for efficiency > and performance.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com