Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp6645933rwl; Tue, 4 Apr 2023 16:41:34 -0700 (PDT) X-Google-Smtp-Source: AKy350bV105/uy4VvvcEADv89kezC91yfUvsA/DC7H0klGvzfTSoPrRY93ndBnH7u3bZHWc24oFh X-Received: by 2002:a17:903:2012:b0:1a1:9787:505a with SMTP id s18-20020a170903201200b001a19787505amr3403889pla.58.1680651694205; Tue, 04 Apr 2023 16:41:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680651694; cv=none; d=google.com; s=arc-20160816; b=Q2KbqhNlr1ab/1EGHWL990+hgm8SCnSzqmbFwfK7lYd9fby2jX+nO6PklafhVp6r18 hYtoBW1cvvtuMCFTV+20w/UzB82kkMjhe4ly9rhtgCrWuI0kFvKi7wPbca6wurlXzVXk ti6/ITOxjVbO0h9e6M8+N0vmoIjLGBPhkga1/20uh7G1wPwVoGT3Nd317Fdb/6Pwk1f8 FSqnsPneUvWkSAZngzTd1xTTVHG3votjUAehYvgwKTN5iDR8jVMtPp3W/AVUquunE4rt X843yh1BsMkl8JiCXaarIaZ8jKJCrzPIe+39CQfMYY7wBsataBN6yxas6D41cF0Phxcw l9kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=MmoxdKIc4151pVAmtBeT09xdteWmODj4r9H1FuRwtxw=; b=h4zpLrJQap4y277YSj3/1x4bmiiAj6O4e2LWHEw46WwmJAbP4p8zJYpOAXADEBBww3 L03zLB5nheXbyusS8WxVoOjjXBUXNHXRwa84fBxWNVzzXrKRKolN4IH70fZh51276qDJ tB3RbisLWU5ch3E9wjA6lM4vBkE7TiG91gmG7ijTkx4BB9+jUGl1ZcQf6Mx6duRtbrWa lnhUF5L229br496Hwv3cDmIF9qqWfoPvHIr6K2SbXTfawFkCIvy+B+WQzFrLrC6ThZ62 pdHSuSnXkB0AJM7pisKjnNiUArsaZiYCdnz1zNiwLvpYIfv4zLMDvMWhQOUkE3gjK+V/ ijrw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=IQ5deiEX; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z13-20020a1709028f8d00b0019e3ff97c99si10804929plo.293.2023.04.04.16.41.19; Tue, 04 Apr 2023 16:41:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=IQ5deiEX; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236549AbjDDXc2 (ORCPT + 99 others); Tue, 4 Apr 2023 19:32:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236366AbjDDXc2 (ORCPT ); Tue, 4 Apr 2023 19:32:28 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54301171A; Tue, 4 Apr 2023 16:32:27 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E0AFA63317; Tue, 4 Apr 2023 23:32:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BF8AEC433EF; Tue, 4 Apr 2023 23:32:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1680651146; bh=mzC9yxS3pU0gQe8btlzdUEVy3UG1HFMbmPEVa9F5ySY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=IQ5deiEXGx1gFAZISmhBLeCx8NzxQoUgL9nmQTkP3esfYaFFQQ3b7ESc/W//rQnfl 3sC6jG70uboYZiTYrN8C1P21fsc+gnsV48CY7KPCfaJ/OQnp8UfTYsmdcdkTEW4buW hVn+2CEI8t2/lO/2BdLWwWucOOyJ253E2xRfYlRB0Mh95jIzG88ZgJaCkbFY/S4ke/ DQzAT3I9w8UZewW6kTUEN8IG68kmycB6WIeZaSCrRu4RRTDEL/ofiUNKTs5kJkkHKg x1+XwPQ6O+zGp2wSQJhkj/co7Ojg7M32UskAXMdTeQ7xaEOcp4FpffpkAMVswDrSRa iX8vXvvL6hmJg== Date: Tue, 4 Apr 2023 16:32:24 -0700 From: Eric Biggers To: Andrey Albershteyn Cc: djwong@kernel.org, dchinner@redhat.com, hch@infradead.org, linux-xfs@vger.kernel.org, fsverity@lists.linux.dev, rpeterso@redhat.com, agruenba@redhat.com, xiang@kernel.org, chao@kernel.org, damien.lemoal@opensource.wdc.com, jth@kernel.org, linux-erofs@lists.ozlabs.org, linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com Subject: Re: [PATCH v2 21/23] xfs: handle merkle tree block size != fs blocksize != PAGE_SIZE Message-ID: <20230404233224.GE1893@sol.localdomain> References: <20230404145319.2057051-1-aalbersh@redhat.com> <20230404145319.2057051-22-aalbersh@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230404145319.2057051-22-aalbersh@redhat.com> X-Spam-Status: No, score=-5.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hi Andrey, On Tue, Apr 04, 2023 at 04:53:17PM +0200, Andrey Albershteyn wrote: > In case of different Merkle tree block size fs-verity expects > ->read_merkle_tree_page() to return Merkle tree page filled with > Merkle tree blocks. The XFS stores each merkle tree block under > extended attribute. Those attributes are addressed by block offset > into Merkle tree. > > This patch make ->read_merkle_tree_page() to fetch multiple merkle > tree blocks based on size ratio. Also the reference to each xfs_buf > is passed with page->private to ->drop_page(). > > Signed-off-by: Andrey Albershteyn > --- > fs/xfs/xfs_verity.c | 74 +++++++++++++++++++++++++++++++++++---------- > fs/xfs/xfs_verity.h | 8 +++++ > 2 files changed, 66 insertions(+), 16 deletions(-) > > diff --git a/fs/xfs/xfs_verity.c b/fs/xfs/xfs_verity.c > index a9874ff4efcd..ef0aff216f06 100644 > --- a/fs/xfs/xfs_verity.c > +++ b/fs/xfs/xfs_verity.c > @@ -134,6 +134,10 @@ xfs_read_merkle_tree_page( > struct page *page = NULL; > __be64 name = cpu_to_be64(index << PAGE_SHIFT); > uint32_t bs = 1 << log_blocksize; > + int blocks_per_page = > + (1 << (PAGE_SHIFT - log_blocksize)); > + int n = 0; > + int offset = 0; > struct xfs_da_args args = { > .dp = ip, > .attr_filter = XFS_ATTR_VERITY, > @@ -143,26 +147,59 @@ xfs_read_merkle_tree_page( > .valuelen = bs, > }; > int error = 0; > + bool is_checked = true; > + struct xfs_verity_buf_list *buf_list; > > page = alloc_page(GFP_KERNEL); > if (!page) > return ERR_PTR(-ENOMEM); > > - error = xfs_attr_get(&args); > - if (error) { > - kmem_free(args.value); > - xfs_buf_rele(args.bp); > + buf_list = kzalloc(sizeof(struct xfs_verity_buf_list), GFP_KERNEL); > + if (!buf_list) { > put_page(page); > - return ERR_PTR(-EFAULT); > + return ERR_PTR(-ENOMEM); > } > > - if (args.bp->b_flags & XBF_VERITY_CHECKED) > + /* > + * Fill the page with Merkle tree blocks. The blcoks_per_page is higher > + * than 1 when fs block size != PAGE_SIZE or Merkle tree block size != > + * PAGE SIZE > + */ > + for (n = 0; n < blocks_per_page; n++) { > + offset = bs * n; > + name = cpu_to_be64(((index << PAGE_SHIFT) + offset)); > + args.name = (const uint8_t *)&name; > + > + error = xfs_attr_get(&args); > + if (error) { > + kmem_free(args.value); > + /* > + * No more Merkle tree blocks (e.g. this was the last > + * block of the tree) > + */ > + if (error == -ENOATTR) > + break; > + xfs_buf_rele(args.bp); > + put_page(page); > + kmem_free(buf_list); > + return ERR_PTR(-EFAULT); > + } > + > + buf_list->bufs[buf_list->buf_count++] = args.bp; > + > + /* One of the buffers was dropped */ > + if (!(args.bp->b_flags & XBF_VERITY_CHECKED)) > + is_checked = false; > + > + memcpy(page_address(page) + offset, args.value, args.valuelen); > + kmem_free(args.value); > + args.value = NULL; > + } I was really hoping for a solution where the cached data can be used directly, instead allocating a temporary page and copying the cached data into it every time the cache is accessed. The problem with what you have now is that every time a single 32-byte hash is accessed, a full page (potentially 64KB!) will be allocated and filled. That's not very efficient. The need to allocate the temporary page can also cause ENOMEM (which will get reported as EIO). Did you consider alternatives that would work more efficiently? I think it would be worth designing something that works properly with how XFS is planned to cache the Merkle tree, instead of designing a workaround. ->read_merkle_tree_page was not really designed for what you are doing here. How about replacing ->read_merkle_tree_page with a function that takes in a Merkle tree block index (not a page index!) and hands back a (page, offset) pair that identifies where the Merkle tree block's data is located? Or (folio, offset), I suppose. With that, would it be possible to directly return the XFS cache? - Eric