Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp2788242rwd; Fri, 2 Jun 2023 15:00:16 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4w5v7t3DUQK7g/YjllX9UbUlcqHfHqICP3b4/+xXGpMgtgDB6b8cypRiT/V9hXskTQaUmc X-Received: by 2002:a05:6870:4406:b0:187:bd00:d63c with SMTP id u6-20020a056870440600b00187bd00d63cmr3385182oah.28.1685743216473; Fri, 02 Jun 2023 15:00:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685743216; cv=none; d=google.com; s=arc-20160816; b=RV3LU2NWLXGCF6yeUASB6fMYltZ9Cg+AzsWkixq9TOWY97ojqa14UcPHugboz3uQcN 95oOSIP59ocGeGnToy99AzqvwlGTdldN2jW9jUCLFtE6sqzNWHNzjsk6uLEz+va+Ty8n vvwj6MFqHvNXSKJyFgJef2TgOyHnl0vRqC52SrxPjURq5zNiFe6nJDC1lzMUkoTmL+8v p1cOJ86s/pSITk7zpg2FIbSrC9pySBh3/UjS4Lqd/01LuZGHmfmLjCrYtb6zsiyXO9fV 06dRcp4B1tUhMI2UogwdlONpv34EYcF3o+U1WSHA0+RE59/TEmeXTnASb2S3QV/4G11k zkvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=UTRky113EniM0nT/xznU7UrijnGVH8CEk6F86K/tnEo=; b=r9TVIr1Pg0W2lxwrLZPV1E29tKvDdyuFJNO3xiRe8WnXzDXkUqViL0KEL5LknJ7PCp OVN4YIzYGSuU7gojT1cUWtBrTAcFywLoCU1c3VTpyJqh327l286ybKT+dmPjUI6Kb3mB T8XwAha3vtBxSt2h9MJmCoN+rnfbXmY2aAHM1Ji7Jc9DCnQFck2Jh9EHQFTRtYwvHaFl JmR/uSgjssQZ/3ygXgp59p2NiBGaLCbnwV1trOiU2jJAfV0x5/mcRlMlEoq8b9ZG6p6V LXF4YF9wCPCgX7aZXOjB4S8IQovWtzJxGCXoDYZVTTmaV8dLi8RXYQU1uSSA5lw/3aNb V7bw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t2-20020a6549c2000000b005347ef6ec13si1594006pgs.507.2023.06.02.15.00.00; Fri, 02 Jun 2023 15:00:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235360AbjFBVxK (ORCPT + 99 others); Fri, 2 Jun 2023 17:53:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236104AbjFBVxJ (ORCPT ); Fri, 2 Jun 2023 17:53:09 -0400 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 257ED1A5 for ; Fri, 2 Jun 2023 14:52:03 -0700 (PDT) Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-3f8177f9a7bso23818811cf.2 for ; Fri, 02 Jun 2023 14:52:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685742651; x=1688334651; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UTRky113EniM0nT/xznU7UrijnGVH8CEk6F86K/tnEo=; b=OsGnHUA9C+NGEV+gn4rvFSA6m2iWzp//5wA5eg3bGcMwkEi4Uy4Du4fIJUV6eiT3qA 3JN3VRjEm1Vni9IYUbeMwQRTe+DYibmA1E293563pj1wagSvi0a2XUSA4yp7aXLeQw7N CtdQC2nX+0g2tSp8zBcfWqcs2nMM4Ksup28NtOs8zlZqNQd+KxAniu3N2n/BELgEc97M l3+hnyFfmpjoKYySRqKyIceILJbMERKuflGbsDnYjG7cKMgt27xYuUvUHuBxX7lwL2ry d1/wKOvRhG/UfLhlOHulBp/qzgDY71iCBm0Cwb0K8HG5Kghp7QMQIq2HVJ0AT1693rU7 Kzug== X-Gm-Message-State: AC+VfDz2JzAuGAKsR7Hu32fMvMmzkV4krDBTxHH7Qoj9QR7AYzUcOAPX 39ccD0ba1R9JYMu9uo1iQipk X-Received: by 2002:ac8:5e11:0:b0:3f6:b017:6289 with SMTP id h17-20020ac85e11000000b003f6b0176289mr16977166qtx.10.1685742651495; Fri, 02 Jun 2023 14:50:51 -0700 (PDT) Received: from localhost (pool-68-160-166-30.bstnma.fios.verizon.net. [68.160.166.30]) by smtp.gmail.com with ESMTPSA id i2-20020ac813c2000000b003f6f83de87esm1236527qtj.92.2023.06.02.14.50.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 02 Jun 2023 14:50:50 -0700 (PDT) Date: Fri, 2 Jun 2023 17:50:49 -0400 From: Mike Snitzer To: Sarthak Kukreti Cc: Jens Axboe , linux-block@vger.kernel.org, Joe Thornber , "Michael S. Tsirkin" , Jason Wang , "Darrick J. Wong" , Brian Foster , Bart Van Assche , Dave Chinner , linux-kernel@vger.kernel.org, Christoph Hellwig , dm-devel@redhat.com, Andreas Dilger , Stefan Hajnoczi , linux-fsdevel@vger.kernel.org, Theodore Ts'o , linux-ext4@vger.kernel.org, Joe Thornber , Alasdair Kergon Subject: Re: [PATCH v7 0/5] Introduce provisioning primitives Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri, Jun 02 2023 at 2:44P -0400, Sarthak Kukreti wrote: > On Tue, May 30, 2023 at 8:28 AM Mike Snitzer wrote: > > > > On Tue, May 30 2023 at 10:55P -0400, > > Joe Thornber wrote: > > > > > On Tue, May 30, 2023 at 3:02 PM Mike Snitzer wrote: > > > > > > > > > > > Also Joe, for you proposed dm-thinp design where you distinquish > > > > between "provision" and "reserve": Would it make sense for REQ_META > > > > (e.g. all XFS metadata) with REQ_PROVISION to be treated as an > > > > LBA-specific hard request? Whereas REQ_PROVISION on its own provides > > > > more freedom to just reserve the length of blocks? (e.g. for XFS > > > > delalloc where LBA range is unknown, but dm-thinp can be asked to > > > > reserve space to accomodate it). > > > > > > > > > > My proposal only involves 'reserve'. Provisioning will be done as part of > > > the usual io path. > > > > OK, I think we'd do well to pin down the top-level block interfaces in > > question. Because this patchset's block interface patch (2/5) header > > says: > > > > "This patch also adds the capability to call fallocate() in mode 0 > > on block devices, which will send REQ_OP_PROVISION to the block > > device for the specified range," > > > > So it wires up blkdev_fallocate() to call blkdev_issue_provision(). A > > user of XFS could then use fallocate() for user data -- which would > > cause thinp's reserve to _not_ be used for critical metadata. > > > > The only way to distinquish the caller (between on-behalf of user data > > vs XFS metadata) would be REQ_META? > > > > So should dm-thinp have a REQ_META-based distinction? Or just treat > > all REQ_OP_PROVISION the same? > > > I'm in favor of a REQ_META-based distinction. Does that imply that > REQ_META also needs to be passed through the block/filesystem stack > (eg. REQ_OP_PROVION + REQ_META on a loop device translates to a > fallocate() to the underlying file)? Unclear, I was thinking your REQ_UNSHARE (tied to fallocate) might be a means to translate REQ_OP_PROVISION + REQ_META to fallocate and have it perform the LBA-specific provisioning of Joe's design (referenced below). > > I think that might have applications beyond just provisioning: > currently, for stacked filesystems (eg filesystems residing in a file > on top of another filesystem), even if the upper filesystem issues > read/write requests with REQ_META | REQ_PRIO, these flags are lost in > translation at the loop device layer. A flag like the above would > allow the prioritization of stacked filesystem metadata requests. > Yes, it could prove useful. > Bringing the discussion back to this series for a bit, I'm still > waiting on feedback from the Block maintainers before sending out v8 > (which at the moment, only have a > s/EXPORT_SYMBOL/EXPORT_SYMBOL_GPL/g). I believe from the conversation > most of the above is follow up work, but please let me know if you'd > prefer I add some of this to the current series! I need a bit more time to work through various aspects of the broader requirements and the resulting interfaces that fall out. Joe's design is pretty compelling because it will properly handle snapshot thin devices: https://listman.redhat.com/archives/dm-devel/2023-May/054351.html Here is my latest status: - Focused on prototype for thinp block reservation (XFS metadata, XFS delalloc, fallocate) - Decided the "dynamic" (non-LBA specific) reservation stuff (old prototype code) is best left independent from Joe's design. SO 2 classes of thinp reservation. - Forward-ported the old prototype code that Brian Foster, Joe Thornber and I worked on years ago. It needs more careful review (and very likely will need fixes from Brian and myself). The XFS changes are pretty intrusive and likely up for serious debate (as to whether we even care to handle reservations for user data). - REQ_OP_PROVISION bio’s with REQ_META will use Joe’s design, otherwise data (XFS data and fallocate) will use “dynamic” reservation. - "dynamic" name is due to the reservation being generic (non-LBA: not in terms of an LBA range). Also, in-core only; so the associated “dynamic_reserve_count” accounting is reset to 0 every activation. - Fallocate may require stronger guarantees in the end (in which case we’ll add a REQ_UNSHARE flag that is selectable from the fallocate interface) - Will try to share common code, but just sorting out highlevel interface(s) still... I'll try to get a git tree together early next week. It will be the forward ported "dynamic" prototype code and your latest v7 code with some additional work to branch accordingly for each class of thinp reservation. And I'll use your v7 code as a crude stub for Joe's approach (branch taken if REQ_META set). Lastly, here are some additional TODOs I've noted in code earlier in my review process: diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 0d9301802609..43a6702f9efe 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -1964,6 +1964,26 @@ static void process_provision_bio(struct thin_c *tc, struct bio *bio) struct dm_cell_key key; struct dm_thin_lookup_result lookup_result; + /* + * FIXME: + * Joe's elegant reservation design is detailed here: + * https://listman.redhat.com/archives/dm-devel/2023-May/054351.html + * - this design, with associated thinp metadata updates, + * is how provision bios should be handled. + * + * FIXME: add thin-pool flag "ignore_provision" + * + * FIXME: needs provision_passdown support + * (needs thinp flag "no_provision_passdown") + */ + + /* + * FIXME: require REQ_META (or REQ_UNSHARE?) to allow deeper + * provisioning code that follows? (so that thinp + * block _is_ fully provisioned upon return) + * (or just remove all below code entirely?) + */ + /* * If cell is already occupied, then the block is already * being provisioned so we have nothing further to do here.