Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp150794rwd; Fri, 26 May 2023 16:47:37 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6md2a1jKHcthiprgPd2IOSbLt1g2wyJQD7sNEDza+sChJvhZRqdLRqMe6kdMXwgO363Hl6 X-Received: by 2002:a05:6a00:1a4c:b0:64d:22d:adb3 with SMTP id h12-20020a056a001a4c00b0064d022dadb3mr5402894pfv.1.1685144856708; Fri, 26 May 2023 16:47:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685144856; cv=none; d=google.com; s=arc-20160816; b=Si3+vbDpiAO3kMyV6H0588vc4s9G2IdwpEh5dCsf9XxY+FPaYtYpfOtuMC4T9aDgWk 4lx0Y2KE2iitK+sOPZvETD4oeB6B9f2qf32lxZeZAFW3mcMxhAs/KyMdHppitCFPjAjb rgf9e2wmjZNj2nCXyO9DYsNUsU5OElLD2tpJCVDhQLlW4qyQ3ZDfArizgTjwf9bUJmZu UnCO6TAVB+yZxJr31DcAwo5BJARYlcq0RDLTv5BXwDTsyZ1YZbLyUxaaBqTnFGaXUZMv rLmRoNUrU9VHUsgzT4M62cV2cIRbt454NYMiZ5XRXIzPEk+oyDIY3Pw3a83DbSxT7DaX hXkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=CWzxmQ8QxCg8O8dff9yLq2U6+xkvzRDR890XcXfK3mg=; b=VtcSLk2nC7jFFZ56qt/vCgwXZrTN/f3SYvGwb6+s45sbH+VV+n83od2RQ8WCxbVBY5 pmT5h+nKuH+g0SWemNekctli+GmJ2rbRUgG05Ph1z3wz69K2gHwgkB+DOOM7KWoEGkXS Gzemixh4uOke41s59tGmTDXk+DjwKp4X6XF1P/LmIFS8iqB5Ytd3fyNPc2dnIyw4/MBC 6dnNQSAt3OXSx3vTunxZyZI6AC3/qCAu6OokVo6VEacfHuZIUe90TWVrjQ8oT3+W90LI G019s+Ga0Yx8FqI5OgxYLSXdUExZBbqyZkwDQL/Yl9e6o3n+8kqHQNcUBGpMMLkKdrBj xaTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20221208.gappssmtp.com header.s=20221208 header.b=qNuMbPXm; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g19-20020aa796b3000000b00643ba887601si5210144pfk.307.2023.05.26.16.47.16; Fri, 26 May 2023 16:47:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20221208.gappssmtp.com header.s=20221208 header.b=qNuMbPXm; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242226AbjEZXqa (ORCPT + 99 others); Fri, 26 May 2023 19:46:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230179AbjEZXqK (ORCPT ); Fri, 26 May 2023 19:46:10 -0400 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 855E6171B for ; Fri, 26 May 2023 16:45:38 -0700 (PDT) Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-64d2ca9ef0cso1092761b3a.1 for ; Fri, 26 May 2023 16:45:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20221208.gappssmtp.com; s=20221208; t=1685144706; x=1687736706; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=CWzxmQ8QxCg8O8dff9yLq2U6+xkvzRDR890XcXfK3mg=; b=qNuMbPXmTsHLzrS9s68EhswAAzoxtilaQGWKnCRgcVZwgN8OI10caCromsS5yYFv2R H+874/2Gj6go+XQifmTf5kewkn7i7zvKifQzI3km3f265IqOyDa5gd9FHfoFuAjSyJKf yp3Cu3MsWTcnXTi9viqZ4k5krjQN5vXKA82k2fOXEqn8d2vffoDkxV23GTnBw+SHldDg BKOE/PD8nm9+R387W92lR63+tYy0KUg7uB0gL32cR+2Vdg9xe1KAHkV5Hxgrp36Ol5Dq mPB6awaqPs/XRgHujtk3dzFuZEXXJDEl3MYeGn/kMUMu3PuoffhICO1ogYG8CxUAsQyb 9Idg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685144706; x=1687736706; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=CWzxmQ8QxCg8O8dff9yLq2U6+xkvzRDR890XcXfK3mg=; b=jeYwWermsqCZqkUxipmD2QK47eZNnwxhOOT6Plw52rfXqTDz0+KlaMzDn4Jr8GhcQ9 mdIBfKfJXN+Gp2r5zIEVFbti+oczF4qbzb6WgqVNgzmTb/N+UDo3c8gAdqdYvq5FhOIB LQt6zxfFKhWVrIB7ZKu5KKKJrL6IkNsN/ida79Bhx7y6X5NO/sfppvCqK6Xa8TPlDich 6oDofVfDzfmZo7d/lXcyUE+LclsZvqYyDfg2axiU8BOU4kqjDo5m1jRxhw9QTmfF7z54 1VMRjajwB25frywRSQYMEBruAWowaeEB853PpMZoDlYM66fAFgSBv6FpORfTquDX1V1+ qIgQ== X-Gm-Message-State: AC+VfDyyfjHoycgpc9YjSqWPMK6BJ5s7xp32IzvWzowcq39Wwv9LANry OdMIzTdKGoZEoV/hxbi5SbTc5w== X-Received: by 2002:a05:6a00:2d88:b0:64c:ecf7:f49a with SMTP id fb8-20020a056a002d8800b0064cecf7f49amr5438139pfb.21.1685144706285; Fri, 26 May 2023 16:45:06 -0700 (PDT) Received: from dread.disaster.area (pa49-179-0-188.pa.nsw.optusnet.com.au. [49.179.0.188]) by smtp.gmail.com with ESMTPSA id l11-20020a62be0b000000b0064f46570bb7sm3100448pff.167.2023.05.26.16.45.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 May 2023 16:45:05 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1q2h7G-004Jsa-2r; Sat, 27 May 2023 09:45:02 +1000 Date: Sat, 27 May 2023 09:45:02 +1000 From: Dave Chinner To: Joe Thornber Cc: Brian Foster , Mike Snitzer , Jens Axboe , Christoph Hellwig , Theodore Ts'o , Sarthak Kukreti , dm-devel@redhat.com, "Michael S. Tsirkin" , "Darrick J. Wong" , Jason Wang , Bart Van Assche , linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Joe Thornber , Andreas Dilger , Stefan Hajnoczi , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, Alasdair Kergon Subject: Re: [PATCH v7 0/5] Introduce provisioning primitives Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri, May 26, 2023 at 12:04:02PM +0100, Joe Thornber wrote: > Here's my take: > > I don't see why the filesystem cares if thinp is doing a reservation or > provisioning under the hood. All that matters is that a future write > to that region will be honoured (barring device failure etc.). > > I agree that the reservation/force mapped status needs to be inherited > by snapshots. > > > One of the few strengths of thinp is the performance of taking a snapshot. > Most snapshots created are never activated. Many other snapshots are > only alive for a brief period, and used read-only. eg, blk-archive > (https://github.com/jthornber/blk-archive) uses snapshots to do very > fast incremental backups. As such I'm strongly against any scheme that > requires provisioning as part of the snapshot operation. > > Hank and I are in the middle of the range tree work which requires a > metadata > change. So now is a convenient time to piggyback other metadata changes to > support reservations. > > > Given the above this is what I suggest: > > 1) We have an api (ioctl, bio flag, whatever) that lets you > reserve/guarantee a region: > > int reserve_region(dev, sector_t begin, sector_t end); A C-based interface is not sufficient because the layer that must do provsioning is not guaranteed to be directly under the filesystem. We must be able to propagate the request down to the layers that need to provision storage, and that includes hardware devices. e.g. dm-thin would have to issue REQ_PROVISION on the LBA ranges it allocates in it's backing device to guarantee that the provisioned LBA range it allocates is also fully provisioned by the storage below it.... > This api should be used minimally, eg, critical FS metadata only. Keep in mind that "critical FS metadata" in this context is any metadata which could cause the filesystem to hang or enter a global error state if an unexpected ENOSPC error occurs during a metadata write IO. Which, in pretty much every journalling filesystem, equates to all metadata in the filesystem. For a typical root filesystem, that might be a in the range of a 1-200MB (depending on journal size). For larger filesytems with lots of files in them, it will be in the range of GBs of space. Plan for having to support tens of GBs of provisioned space in filesystems, not tens of MBs.... [snip] > Now this is a lot of work. As well as the kernel changes we'll need to > update the userland tools: thin_check, thin_ls, thin_metadata_unpack, > thin_rmap, thin_delta, thin_metadata_pack, thin_repair, thin_trim, > thin_dump, thin_metadata_size, thin_restore. Are we confident that we > have buy in from the FS teams that this will be widely adopted? Are users > asking for this? I really don't want to do 6 months of work for nothing. I think there's a 2-3 solid days of coding to fully implement REQ_PROVISION support in XFS, including userspace tool support. Maybe a couple of weeks more to flush the bugs out before it's largely ready to go. So if there's buy in from the block layer and DM people for REQ_PROVISION as described, then I'll definitely have XFS support ready for you to test whenever dm-thinp is ready to go. I can't speak for other filesystems, I suspect the only one we care about is ext4. btrfs and f2fs don't need dm-thinp and there aren't any other filesystems that are used in production on top of dm-thinp, so I think only XFS and ext4 matter at this point in time. I suspect that ext4 would be fairly easy to add support for as well. ext4 has a lot more fixed-place metadata than XFS has so much more of it's metadata is covered by mkfs-time provisioning. Limiting dynamic metadata to specific fully provisioned block groups and provisioning new block groups for metadata when they are near full would be equivalent to how I plan to provision metadata space in XFS. Hence the implementation for ext4 looks to be broadly similar in scope and complexity as XFS.... -Dave. -- Dave Chinner david@fromorbit.com