Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp3973362rdh; Tue, 28 Nov 2023 08:29:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IFKotXyXDTnK+7pyiqh6hrr2mpnx2l+7BH68mDWp+Mu/ZtyaoE2zs0OWDQTlTMfM7N3adOZ X-Received: by 2002:a17:90b:164a:b0:285:92c6:cc26 with SMTP id il10-20020a17090b164a00b0028592c6cc26mr11805363pjb.40.1701188981012; Tue, 28 Nov 2023 08:29:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701188981; cv=none; d=google.com; s=arc-20160816; b=weegdKdFMiHL6v2boGv++DSZg3tm2N2KQZwRriyIYqHm1OYrI55bJPMZzqMVVZ0pIt tzh1Zwzrd833wdXTNm3ke3dAmhpJUCRmFp0cAIocUVR+nYsqRUo2IuLrLyQRlQixIHWr z+hIFmHlMUQtUWnGq7qKxeaNTYwQW12rE536KfAeISxVYhtf2BaKELIkoXKRJnk6sahk LE45Ok3ast9eWztXQh8nLSDsfKwYOw6Yj3pcgE1GDPBibI8ys67vZJ/ZqjS+aKhCI1uL C4UlQ4jXIsbrYE2kptfzI24FhsM+ecUfxJXKQobKLZAiTpKs/njuosydPFuzcjl/kn+E NnfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=1mAXIgJ6isOkoqTjbQA8dloAK1pewn83YFa6XSq+dow=; fh=hBc4WS/S4zMOULDzNDf/m/sDzLC47g352LFmJEj5SMc=; b=n4ZvrxxfEyzkyyrznUz23Lx0OZ3+pVEXLX2FFgIl7U0emDD8XsqffyqqwR+voGSfYx M5pDipAhvKqEWEIuhskdpooHAMkPS9C9Hd+u5par2JCGWynV3GN250RmI4vBTHH35f4R 3UPj13d8HHDwlKXjVYXIEFQ6+3BvbgnTRbvlosKKgYBvoqo8HOH9KQ4so9rw/2AbxTi0 mcRG3aIV0nBiRC4kxTjAuvMgexlK7iZ9Y5eiyPg4ibxFrLU6wqqn9VzA+mF/hNY1+LZQ TqzsM0vM+ZJMtFtTEqlEskDbZK33CztVkNk5x0aCemymfziswI1Shxm8GvYTnw8/oD9t 9yqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="IRJlA/ah"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id p6-20020a17090ab90600b0028598045121si9641986pjr.9.2023.11.28.08.29.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 08:29:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="IRJlA/ah"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id F292780A367A; Tue, 28 Nov 2023 08:29:37 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229716AbjK1Q3V (ORCPT + 99 others); Tue, 28 Nov 2023 11:29:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234738AbjK1Q3S (ORCPT ); Tue, 28 Nov 2023 11:29:18 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C060DD63 for ; Tue, 28 Nov 2023 08:29:23 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5F6B0C433C7; Tue, 28 Nov 2023 16:29:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1701188963; bh=sIfDB3k+KYyERGoUwnCBqm5HDGxlFAUPH4Qei1wkjhs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=IRJlA/ahfz20AacKT1Y1Of4TQTFUwixgP9x+n+Zg8MxGoLrFbZzVTHlmzJngIXSlm KvLYlbeFcxtpgjYOd0ovo7GztyLiVOq5JXlC5Dd9H4ib7svMvynWpDVHQunFUsprXX y+JQek9omRrnQmyHDMu7fwOMPqOdqR2Ul9yPHyII2oMD6fQJapIdDj1ur79bGOuReh jGcfR1lIq7KAqSKRqVy/ENDogMR3xYmQ+i7Ulya2kJWRQfq0pquVD7mMLhjJRqtm0G 20yTn3PpgX8vZ6vXdz8HUPGK9zlPiB0hApXFkEYbMLdDS5RzyMiwWtbO8zQ31nQ+yW Pg1g/c9WzmcQg== Date: Tue, 28 Nov 2023 08:29:21 -0800 From: "Darrick J. Wong" To: Jiachen Zhang Cc: Christoph Hellwig , Chandan Babu R , Dave Chinner , Allison Henderson , Zhang Tianci , Brian Foster , Ben Myers , linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, xieyongji@bytedance.com, me@jcix.top Subject: Re: [PATCH 2/2] xfs: update dir3 leaf block metadata after swap Message-ID: <20231128162921.GU2766956@frogsfrogsfrogs> References: <20231128053202.29007-1-zhangjiachen.jaycee@bytedance.com> <20231128053202.29007-3-zhangjiachen.jaycee@bytedance.com> <39b76473-fe00-0f1b-62e3-ae349a9f80d3@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <39b76473-fe00-0f1b-62e3-ae349a9f80d3@bytedance.com> X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 28 Nov 2023 08:29:38 -0800 (PST) On Tue, Nov 28, 2023 at 05:39:50PM +0800, Jiachen Zhang wrote: > On 2023/11/28 16:39, Christoph Hellwig wrote: > > On Tue, Nov 28, 2023 at 01:32:02PM +0800, Jiachen Zhang wrote: > > > From: Zhang Tianci > > > > > > xfs_da3_swap_lastblock() copy the last block content to the dead block, > > > but do not update the metadata in it. We need update some metadata > > > for some kinds of type block, such as dir3 leafn block records its > > > blkno, we shall update it to the dead block blkno. Otherwise, > > > before write the xfs_buf to disk, the verify_write() will fail in > > > blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown. > > > > Do you have a reproducer for this? It would be very helpful to add it > > to xfstests. > > Hi Christoph, > > Thanks for the review! > > It's hard to reproduce the issue. Currently we can reproduce it with > some kernel code changes. We forcely reserve 0 t_blk_res for xfs_remove > on kernel version 4.19: > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index f2d06e1e4906..c8f84b95a0ec 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -2551,13 +2551,8 @@ xfs_remove( > * insert tries to happen, instead trimming the LAST > * block from the directory. > */ > - resblks = XFS_REMOVE_SPACE_RES(mp); > - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, resblks, 0, 0, > &tp); > - if (error == -ENOSPC) { > - resblks = 0; > - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0, > - &tp); > - } > + resblks = 0; > + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0, &tp); > if (error) { > ASSERT(error != -ENOSPC); > goto std_return > > > After insmod the new modified xfs.ko, run the following scripts, and it > can reproduce the problem consistently on the final `umount mnt`: > > fallocate -l 1G xfs.img > mkfs.xfs -f xfs.img > mkdir -p mnt > losetup /dev/loop0 xfs.img > mount -t xfs /dev/loop0 mnt > pushd mnt > mkdir dir3 > prefix="a_" > for j in $(seq 0 13); do > for i in $(seq 0 2800); do > touch dir3/${prefix}_${i}_${j} > done > for i in $(seq 0 2500); do > rm -f dir3/${prefix}_${i}_${j} > if [ "$i" == "2094" ] && [ "$j" == "13" ]; then > echo "should reproduce now, so break here!" > break; > fi > done > done > popd > umount mnt > > > We are still trying to make a reproducer without any kernel changes. Do > you have any suggestions on this? Add a debugging knob that calls xfs_da3_swap_lastblock without trying bunmapi, modify the script to activate the knob, then that can be turned into an fstest. > > > > > > > > > We will get this warning: > > > > > > XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178 > > > XFS (dm-0): Unmount and run xfs_repair > > > XFS (dm-0): First 128 bytes of corrupted metadata buffer: > > > 00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00 ........=....... > > > 000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00 ................ > > > 000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf .D...dDA..`.PC.. > > > 00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00 .........s...... > > > 00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00 .).8.....).@.... > > > 000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00 .).H.....I...... > > > 00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe .I....E%.I....H. > > > 0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97 .I....Lk.I. ..M. > > > XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c. Return address = 00000000c0ff63c1 > > > XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem > > > XFS (dm-0): Please umount the filesystem and rectify the problem(s) Aha, that might explain the weird recovery failures that I've been seeing every now and then with my parent pointer recovery stress test. > > > > > > >From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record > > > its blkno is 0x1a0. > > > > > > Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks") > > > Signed-off-by: Zhang Tianci > > > --- > > > fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++- > > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c > > > index e576560b46e9..35f70e4c6447 100644 > > > --- a/fs/xfs/libxfs/xfs_da_btree.c > > > +++ b/fs/xfs/libxfs/xfs_da_btree.c > > > @@ -2318,8 +2318,18 @@ xfs_da3_swap_lastblock( > > > * Copy the last block into the dead buffer and log it. > > > */ > > > memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize); > > > - xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1); > > > dead_info = dead_buf->b_addr; > > > + /* > > > + * Update the moved block's blkno if it's a dir3 leaf block > > > + */ > > > + if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) || > > > + dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) || > > > + dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) { > > > + struct xfs_da3_blkinfo *dap = (struct xfs_da3_blkinfo *)dead_info; > > > + > > > + dap->blkno = cpu_to_be64(dead_buf->b_bn); dap->blkno = cpu_to_be64(xfs_buf_daddr(dead_buf)); (IOWs, please send patches against latest upstream, not 4.19.) ((Code looks good to me too, compile errors and style notwithstanding.)) --D > > > + } > > > + xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1); > > > > The fix here looks correct to me, but also a little ugly and ad-hoc. > > > > At last we should be using container_of and not casts for getting from a > > xfs_da_blkinfo to a xfs_da3_blkinfo (even if there is bad precedence > > for the cast in existing code). > > > > Thanks, we will optimize the code in the next version of the patchset. > > > But I think it would be useful to add a helper that stamps in the blkno > > in for a caller that only has as xfs_da_blkinfo but no xfs_da3_blkinfo > > and use in all the places that do it currently in an open coded fashion > > e.g. xfs_da3_root_join, xfs_da3_root_split, xfs_attr3_leaf_to_node. > > > > That should probably be done on top of the small backportable fix. > > > > I think the idea to add helper is great, and we can do it after this > fixes patch is merged. > > > Thanks, > Jiachen > >