Hi,
Recently, our use-case ran into 2 bugs in case doing xfs_remove when the
disk space is in-pressure, which may cause xfs shutdown and kernel crash
in the xfs log recovery procedure. Here are 2 patches to fix the problem.
The 1st patch fixes an uninitialized variable issue.
The 2nd patch ensures the blkno in the xfs_buf is updated when doing
xfs_da3_swap_lastblock().
Thanks,
Jiachen
Jiachen Zhang (1):
xfs: ensure tmp_logflags is initialized in xfs_bmap_del_extent_real
Zhang Tianci (1):
xfs: update dir3 leaf block metadata after swap
fs/xfs/libxfs/xfs_bmap.c | 6 ++++--
fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++-
2 files changed, 15 insertions(+), 3 deletions(-)
--
2.20.1
In the case of returning -ENOSPC, ensure tmp_logflags is initialized by 0.
Otherwise the caller __xfs_bunmapi will set uninitialized illegal
tmp_logflags value into xfs log, which might cause unpredictable error
in the log recovery procedure.
Fixes: 1b24b633aafe ("xfs: move some more code into xfs_bmap_del_extent_real")
Signed-off-by: Jiachen Zhang <[email protected]>
---
fs/xfs/libxfs/xfs_bmap.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index be62acffad6c..7cb395a38507 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -5048,8 +5048,10 @@ xfs_bmap_del_extent_real(
if (tp->t_blk_res == 0 &&
ifp->if_format == XFS_DINODE_FMT_EXTENTS &&
ifp->if_nextents >= XFS_IFORK_MAXEXT(ip, whichfork) &&
- del->br_startoff > got.br_startoff && del_endoff < got_endoff)
- return -ENOSPC;
+ del->br_startoff > got.br_startoff && del_endoff < got_endoff) {
+ error = -ENOSPC;
+ goto done;
+ }
flags = XFS_ILOG_CORE;
if (whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip)) {
--
2.20.1
From: Zhang Tianci <[email protected]>
xfs_da3_swap_lastblock() copy the last block content to the dead block,
but do not update the metadata in it. We need update some metadata
for some kinds of type block, such as dir3 leafn block records its
blkno, we shall update it to the dead block blkno. Otherwise,
before write the xfs_buf to disk, the verify_write() will fail in
blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.
We will get this warning:
XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178
XFS (dm-0): Unmount and run xfs_repair
XFS (dm-0): First 128 bytes of corrupted metadata buffer:
00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00 ........=.......
000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00 ................
000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf .D...dDA..`.PC..
00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00 .........s......
00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00 .).8.....).@....
000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00 .).H.....I......
00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe .I....E%.I....H.
0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97 .I....Lk.I. ..M.
XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c. Return address = 00000000c0ff63c1
XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem
XFS (dm-0): Please umount the filesystem and rectify the problem(s)
From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
its blkno is 0x1a0.
Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks")
Signed-off-by: Zhang Tianci <[email protected]>
---
fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index e576560b46e9..35f70e4c6447 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -2318,8 +2318,18 @@ xfs_da3_swap_lastblock(
* Copy the last block into the dead buffer and log it.
*/
memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
- xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
dead_info = dead_buf->b_addr;
+ /*
+ * Update the moved block's blkno if it's a dir3 leaf block
+ */
+ if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
+ dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
+ dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
+ struct xfs_da3_blkinfo *dap = (struct xfs_da3_blkinfo *)dead_info;
+
+ dap->blkno = cpu_to_be64(dead_buf->b_bn);
+ }
+ xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
/*
* Get values from the moved block.
*/
--
2.20.1
On Tue, Nov 28, 2023 at 01:32:01PM +0800, Jiachen Zhang wrote:
> In the case of returning -ENOSPC, ensure tmp_logflags is initialized by 0.
> Otherwise the caller __xfs_bunmapi will set uninitialized illegal
> tmp_logflags value into xfs log, which might cause unpredictable error
> in the log recovery procedure.
This looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
But I wonder if removing the local flags variable and always directly
assigning to *logflagsp might be more robust in the long run.
On Tue, Nov 28, 2023 at 01:32:02PM +0800, Jiachen Zhang wrote:
> From: Zhang Tianci <[email protected]>
>
> xfs_da3_swap_lastblock() copy the last block content to the dead block,
> but do not update the metadata in it. We need update some metadata
> for some kinds of type block, such as dir3 leafn block records its
> blkno, we shall update it to the dead block blkno. Otherwise,
> before write the xfs_buf to disk, the verify_write() will fail in
> blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.
Do you have a reproducer for this? It would be very helpful to add it
to xfstests.
>
> We will get this warning:
>
> XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178
> XFS (dm-0): Unmount and run xfs_repair
> XFS (dm-0): First 128 bytes of corrupted metadata buffer:
> 00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00 ........=.......
> 000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00 ................
> 000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf .D...dDA..`.PC..
> 00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00 .........s......
> 00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00 .).8.....).@....
> 000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00 .).H.....I......
> 00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe .I....E%.I....H.
> 0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97 .I....Lk.I. ..M.
> XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c. Return address = 00000000c0ff63c1
> XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem
> XFS (dm-0): Please umount the filesystem and rectify the problem(s)
>
> >From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
> its blkno is 0x1a0.
>
> Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks")
> Signed-off-by: Zhang Tianci <[email protected]>
> ---
> fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index e576560b46e9..35f70e4c6447 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -2318,8 +2318,18 @@ xfs_da3_swap_lastblock(
> * Copy the last block into the dead buffer and log it.
> */
> memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
> - xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
> dead_info = dead_buf->b_addr;
> + /*
> + * Update the moved block's blkno if it's a dir3 leaf block
> + */
> + if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
> + dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
> + dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
> + struct xfs_da3_blkinfo *dap = (struct xfs_da3_blkinfo *)dead_info;
> +
> + dap->blkno = cpu_to_be64(dead_buf->b_bn);
> + }
> + xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
The fix here looks correct to me, but also a little ugly and ad-hoc.
At last we should be using container_of and not casts for getting from a
xfs_da_blkinfo to a xfs_da3_blkinfo (even if there is bad precedence
for the cast in existing code).
But I think it would be useful to add a helper that stamps in the blkno
in for a caller that only has as xfs_da_blkinfo but no xfs_da3_blkinfo
and use in all the places that do it currently in an open coded fashion
e.g. xfs_da3_root_join, xfs_da3_root_split, xfs_attr3_leaf_to_node.
That should probably be done on top of the small backportable fix.
On 2023/11/28 16:39, Christoph Hellwig wrote:
> On Tue, Nov 28, 2023 at 01:32:02PM +0800, Jiachen Zhang wrote:
>> From: Zhang Tianci <[email protected]>
>>
>> xfs_da3_swap_lastblock() copy the last block content to the dead block,
>> but do not update the metadata in it. We need update some metadata
>> for some kinds of type block, such as dir3 leafn block records its
>> blkno, we shall update it to the dead block blkno. Otherwise,
>> before write the xfs_buf to disk, the verify_write() will fail in
>> blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.
>
> Do you have a reproducer for this? It would be very helpful to add it
> to xfstests.
Hi Christoph,
Thanks for the review!
It's hard to reproduce the issue. Currently we can reproduce it with
some kernel code changes. We forcely reserve 0 t_blk_res for xfs_remove
on kernel version 4.19:
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index f2d06e1e4906..c8f84b95a0ec 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2551,13 +2551,8 @@ xfs_remove(
* insert tries to happen, instead trimming the LAST
* block from the directory.
*/
- resblks = XFS_REMOVE_SPACE_RES(mp);
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, resblks, 0,
0, &tp);
- if (error == -ENOSPC) {
- resblks = 0;
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0,
- &tp);
- }
+ resblks = 0;
+ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0, &tp);
if (error) {
ASSERT(error != -ENOSPC);
goto std_return
After insmod the new modified xfs.ko, run the following scripts, and it
can reproduce the problem consistently on the final `umount mnt`:
fallocate -l 1G xfs.img
mkfs.xfs -f xfs.img
mkdir -p mnt
losetup /dev/loop0 xfs.img
mount -t xfs /dev/loop0 mnt
pushd mnt
mkdir dir3
prefix="a_"
for j in $(seq 0 13); do
for i in $(seq 0 2800); do
touch dir3/${prefix}_${i}_${j}
done
for i in $(seq 0 2500); do
rm -f dir3/${prefix}_${i}_${j}
if [ "$i" == "2094" ] && [ "$j" == "13" ]; then
echo "should reproduce now, so break here!"
break;
fi
done
done
popd
umount mnt
We are still trying to make a reproducer without any kernel changes. Do
you have any suggestions on this?
>
>>
>> We will get this warning:
>>
>> XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178
>> XFS (dm-0): Unmount and run xfs_repair
>> XFS (dm-0): First 128 bytes of corrupted metadata buffer:
>> 00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00 ........=.......
>> 000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00 ................
>> 000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf .D...dDA..`.PC..
>> 00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00 .........s......
>> 00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00 .).8.....).@....
>> 000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00 .).H.....I......
>> 00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe .I....E%.I....H.
>> 0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97 .I....Lk.I. ..M.
>> XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c. Return address = 00000000c0ff63c1
>> XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem
>> XFS (dm-0): Please umount the filesystem and rectify the problem(s)
>>
>> >From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
>> its blkno is 0x1a0.
>>
>> Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks")
>> Signed-off-by: Zhang Tianci <[email protected]>
>> ---
>> fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++-
>> 1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
>> index e576560b46e9..35f70e4c6447 100644
>> --- a/fs/xfs/libxfs/xfs_da_btree.c
>> +++ b/fs/xfs/libxfs/xfs_da_btree.c
>> @@ -2318,8 +2318,18 @@ xfs_da3_swap_lastblock(
>> * Copy the last block into the dead buffer and log it.
>> */
>> memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
>> - xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
>> dead_info = dead_buf->b_addr;
>> + /*
>> + * Update the moved block's blkno if it's a dir3 leaf block
>> + */
>> + if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
>> + dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
>> + dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
>> + struct xfs_da3_blkinfo *dap = (struct xfs_da3_blkinfo *)dead_info;
>> +
>> + dap->blkno = cpu_to_be64(dead_buf->b_bn);
>> + }
>> + xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
>
> The fix here looks correct to me, but also a little ugly and ad-hoc.
>
> At last we should be using container_of and not casts for getting from a
> xfs_da_blkinfo to a xfs_da3_blkinfo (even if there is bad precedence
> for the cast in existing code).
>
Thanks, we will optimize the code in the next version of the patchset.
> But I think it would be useful to add a helper that stamps in the blkno
> in for a caller that only has as xfs_da_blkinfo but no xfs_da3_blkinfo
> and use in all the places that do it currently in an open coded fashion
> e.g. xfs_da3_root_join, xfs_da3_root_split, xfs_attr3_leaf_to_node.
>
> That should probably be done on top of the small backportable fix.
>
I think the idea to add helper is great, and we can do it after this
fixes patch is merged.
Thanks,
Jiachen
Hi Jiachen,
kernel test robot noticed the following build errors:
[auto build test ERROR on xfs-linux/for-next]
[also build test ERROR on linus/master v6.7-rc3 next-20231128]
[cannot apply to djwong-xfs/djwong-devel]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Jiachen-Zhang/xfs-ensure-tmp_logflags-is-initialized-in-xfs_bmap_del_extent_real/20231128-135955
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link: https://lore.kernel.org/r/20231128053202.29007-3-zhangjiachen.jaycee%40bytedance.com
patch subject: [PATCH 2/2] xfs: update dir3 leaf block metadata after swap
config: powerpc64-randconfig-r081-20231128 (https://download.01.org/0day-ci/archive/20231128/[email protected]/config)
compiler: powerpc64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231128/[email protected]/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All errors (new ones prefixed by >>):
In file included from include/linux/byteorder/big_endian.h:5,
from arch/powerpc/include/uapi/asm/byteorder.h:14,
from arch/powerpc/include/asm/qspinlock_types.h:6,
from arch/powerpc/include/asm/spinlock_types.h:10,
from include/linux/spinlock_types_raw.h:7,
from include/linux/ratelimit_types.h:7,
from include/linux/printk.h:9,
from include/asm-generic/bug.h:22,
from arch/powerpc/include/asm/bug.h:116,
from include/linux/bug.h:5,
from include/linux/fortify-string.h:5,
from include/linux/string.h:295,
from include/linux/uuid.h:11,
from fs/xfs/xfs_linux.h:10,
from fs/xfs/xfs.h:22,
from fs/xfs/libxfs/xfs_da_btree.c:7:
fs/xfs/libxfs/xfs_da_btree.c: In function 'xfs_da3_swap_lastblock':
>> fs/xfs/libxfs/xfs_da_btree.c:2330:50: error: 'struct xfs_buf' has no member named 'b_bn'
2330 | dap->blkno = cpu_to_be64(dead_buf->b_bn);
| ^~
include/uapi/linux/byteorder/big_endian.h:38:51: note: in definition of macro '__cpu_to_be64'
38 | #define __cpu_to_be64(x) ((__force __be64)(__u64)(x))
| ^
fs/xfs/libxfs/xfs_da_btree.c:2330:30: note: in expansion of macro 'cpu_to_be64'
2330 | dap->blkno = cpu_to_be64(dead_buf->b_bn);
| ^~~~~~~~~~~
vim +2330 fs/xfs/libxfs/xfs_da_btree.c
2254
2255 /*
2256 * Ick. We need to always be able to remove a btree block, even
2257 * if there's no space reservation because the filesystem is full.
2258 * This is called if xfs_bunmapi on a btree block fails due to ENOSPC.
2259 * It swaps the target block with the last block in the file. The
2260 * last block in the file can always be removed since it can't cause
2261 * a bmap btree split to do that.
2262 */
2263 STATIC int
2264 xfs_da3_swap_lastblock(
2265 struct xfs_da_args *args,
2266 xfs_dablk_t *dead_blknop,
2267 struct xfs_buf **dead_bufp)
2268 {
2269 struct xfs_da_blkinfo *dead_info;
2270 struct xfs_da_blkinfo *sib_info;
2271 struct xfs_da_intnode *par_node;
2272 struct xfs_da_intnode *dead_node;
2273 struct xfs_dir2_leaf *dead_leaf2;
2274 struct xfs_da_node_entry *btree;
2275 struct xfs_da3_icnode_hdr par_hdr;
2276 struct xfs_inode *dp;
2277 struct xfs_trans *tp;
2278 struct xfs_mount *mp;
2279 struct xfs_buf *dead_buf;
2280 struct xfs_buf *last_buf;
2281 struct xfs_buf *sib_buf;
2282 struct xfs_buf *par_buf;
2283 xfs_dahash_t dead_hash;
2284 xfs_fileoff_t lastoff;
2285 xfs_dablk_t dead_blkno;
2286 xfs_dablk_t last_blkno;
2287 xfs_dablk_t sib_blkno;
2288 xfs_dablk_t par_blkno;
2289 int error;
2290 int w;
2291 int entno;
2292 int level;
2293 int dead_level;
2294
2295 trace_xfs_da_swap_lastblock(args);
2296
2297 dead_buf = *dead_bufp;
2298 dead_blkno = *dead_blknop;
2299 tp = args->trans;
2300 dp = args->dp;
2301 w = args->whichfork;
2302 ASSERT(w == XFS_DATA_FORK);
2303 mp = dp->i_mount;
2304 lastoff = args->geo->freeblk;
2305 error = xfs_bmap_last_before(tp, dp, &lastoff, w);
2306 if (error)
2307 return error;
2308 if (XFS_IS_CORRUPT(mp, lastoff == 0))
2309 return -EFSCORRUPTED;
2310 /*
2311 * Read the last block in the btree space.
2312 */
2313 last_blkno = (xfs_dablk_t)lastoff - args->geo->fsbcount;
2314 error = xfs_da3_node_read(tp, dp, last_blkno, &last_buf, w);
2315 if (error)
2316 return error;
2317 /*
2318 * Copy the last block into the dead buffer and log it.
2319 */
2320 memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
2321 dead_info = dead_buf->b_addr;
2322 /*
2323 * Update the moved block's blkno if it's a dir3 leaf block
2324 */
2325 if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
2326 dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
2327 dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
2328 struct xfs_da3_blkinfo *dap = (struct xfs_da3_blkinfo *)dead_info;
2329
> 2330 dap->blkno = cpu_to_be64(dead_buf->b_bn);
2331 }
2332 xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
2333 /*
2334 * Get values from the moved block.
2335 */
2336 if (dead_info->magic == cpu_to_be16(XFS_DIR2_LEAFN_MAGIC) ||
2337 dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC)) {
2338 struct xfs_dir3_icleaf_hdr leafhdr;
2339 struct xfs_dir2_leaf_entry *ents;
2340
2341 dead_leaf2 = (xfs_dir2_leaf_t *)dead_info;
2342 xfs_dir2_leaf_hdr_from_disk(dp->i_mount, &leafhdr,
2343 dead_leaf2);
2344 ents = leafhdr.ents;
2345 dead_level = 0;
2346 dead_hash = be32_to_cpu(ents[leafhdr.count - 1].hashval);
2347 } else {
2348 struct xfs_da3_icnode_hdr deadhdr;
2349
2350 dead_node = (xfs_da_intnode_t *)dead_info;
2351 xfs_da3_node_hdr_from_disk(dp->i_mount, &deadhdr, dead_node);
2352 btree = deadhdr.btree;
2353 dead_level = deadhdr.level;
2354 dead_hash = be32_to_cpu(btree[deadhdr.count - 1].hashval);
2355 }
2356 sib_buf = par_buf = NULL;
2357 /*
2358 * If the moved block has a left sibling, fix up the pointers.
2359 */
2360 if ((sib_blkno = be32_to_cpu(dead_info->back))) {
2361 error = xfs_da3_node_read(tp, dp, sib_blkno, &sib_buf, w);
2362 if (error)
2363 goto done;
2364 sib_info = sib_buf->b_addr;
2365 if (XFS_IS_CORRUPT(mp,
2366 be32_to_cpu(sib_info->forw) != last_blkno ||
2367 sib_info->magic != dead_info->magic)) {
2368 error = -EFSCORRUPTED;
2369 goto done;
2370 }
2371 sib_info->forw = cpu_to_be32(dead_blkno);
2372 xfs_trans_log_buf(tp, sib_buf,
2373 XFS_DA_LOGRANGE(sib_info, &sib_info->forw,
2374 sizeof(sib_info->forw)));
2375 sib_buf = NULL;
2376 }
2377 /*
2378 * If the moved block has a right sibling, fix up the pointers.
2379 */
2380 if ((sib_blkno = be32_to_cpu(dead_info->forw))) {
2381 error = xfs_da3_node_read(tp, dp, sib_blkno, &sib_buf, w);
2382 if (error)
2383 goto done;
2384 sib_info = sib_buf->b_addr;
2385 if (XFS_IS_CORRUPT(mp,
2386 be32_to_cpu(sib_info->back) != last_blkno ||
2387 sib_info->magic != dead_info->magic)) {
2388 error = -EFSCORRUPTED;
2389 goto done;
2390 }
2391 sib_info->back = cpu_to_be32(dead_blkno);
2392 xfs_trans_log_buf(tp, sib_buf,
2393 XFS_DA_LOGRANGE(sib_info, &sib_info->back,
2394 sizeof(sib_info->back)));
2395 sib_buf = NULL;
2396 }
2397 par_blkno = args->geo->leafblk;
2398 level = -1;
2399 /*
2400 * Walk down the tree looking for the parent of the moved block.
2401 */
2402 for (;;) {
2403 error = xfs_da3_node_read(tp, dp, par_blkno, &par_buf, w);
2404 if (error)
2405 goto done;
2406 par_node = par_buf->b_addr;
2407 xfs_da3_node_hdr_from_disk(dp->i_mount, &par_hdr, par_node);
2408 if (XFS_IS_CORRUPT(mp,
2409 level >= 0 && level != par_hdr.level + 1)) {
2410 error = -EFSCORRUPTED;
2411 goto done;
2412 }
2413 level = par_hdr.level;
2414 btree = par_hdr.btree;
2415 for (entno = 0;
2416 entno < par_hdr.count &&
2417 be32_to_cpu(btree[entno].hashval) < dead_hash;
2418 entno++)
2419 continue;
2420 if (XFS_IS_CORRUPT(mp, entno == par_hdr.count)) {
2421 error = -EFSCORRUPTED;
2422 goto done;
2423 }
2424 par_blkno = be32_to_cpu(btree[entno].before);
2425 if (level == dead_level + 1)
2426 break;
2427 xfs_trans_brelse(tp, par_buf);
2428 par_buf = NULL;
2429 }
2430 /*
2431 * We're in the right parent block.
2432 * Look for the right entry.
2433 */
2434 for (;;) {
2435 for (;
2436 entno < par_hdr.count &&
2437 be32_to_cpu(btree[entno].before) != last_blkno;
2438 entno++)
2439 continue;
2440 if (entno < par_hdr.count)
2441 break;
2442 par_blkno = par_hdr.forw;
2443 xfs_trans_brelse(tp, par_buf);
2444 par_buf = NULL;
2445 if (XFS_IS_CORRUPT(mp, par_blkno == 0)) {
2446 error = -EFSCORRUPTED;
2447 goto done;
2448 }
2449 error = xfs_da3_node_read(tp, dp, par_blkno, &par_buf, w);
2450 if (error)
2451 goto done;
2452 par_node = par_buf->b_addr;
2453 xfs_da3_node_hdr_from_disk(dp->i_mount, &par_hdr, par_node);
2454 if (XFS_IS_CORRUPT(mp, par_hdr.level != level)) {
2455 error = -EFSCORRUPTED;
2456 goto done;
2457 }
2458 btree = par_hdr.btree;
2459 entno = 0;
2460 }
2461 /*
2462 * Update the parent entry pointing to the moved block.
2463 */
2464 btree[entno].before = cpu_to_be32(dead_blkno);
2465 xfs_trans_log_buf(tp, par_buf,
2466 XFS_DA_LOGRANGE(par_node, &btree[entno].before,
2467 sizeof(btree[entno].before)));
2468 *dead_blknop = last_blkno;
2469 *dead_bufp = last_buf;
2470 return 0;
2471 done:
2472 if (par_buf)
2473 xfs_trans_brelse(tp, par_buf);
2474 if (sib_buf)
2475 xfs_trans_brelse(tp, sib_buf);
2476 xfs_trans_brelse(tp, last_buf);
2477 return error;
2478 }
2479
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi Jiachen,
kernel test robot noticed the following build errors:
[auto build test ERROR on xfs-linux/for-next]
[also build test ERROR on linus/master v6.7-rc3 next-20231128]
[cannot apply to djwong-xfs/djwong-devel]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Jiachen-Zhang/xfs-ensure-tmp_logflags-is-initialized-in-xfs_bmap_del_extent_real/20231128-135955
base: https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
patch link: https://lore.kernel.org/r/20231128053202.29007-3-zhangjiachen.jaycee%40bytedance.com
patch subject: [PATCH 2/2] xfs: update dir3 leaf block metadata after swap
config: i386-randconfig-141-20231128 (https://download.01.org/0day-ci/archive/20231128/[email protected]/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231128/[email protected]/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All errors (new ones prefixed by >>):
>> fs/xfs/libxfs/xfs_da_btree.c:2330:38: error: no member named 'b_bn' in 'struct xfs_buf'
dap->blkno = cpu_to_be64(dead_buf->b_bn);
~~~~~~~~ ^
include/linux/byteorder/generic.h:92:21: note: expanded from macro 'cpu_to_be64'
#define cpu_to_be64 __cpu_to_be64
^
include/uapi/linux/byteorder/little_endian.h:38:53: note: expanded from macro '__cpu_to_be64'
#define __cpu_to_be64(x) ((__force __be64)__swab64((x)))
^
include/uapi/linux/swab.h:128:54: note: expanded from macro '__swab64'
#define __swab64(x) (__u64)__builtin_bswap64((__u64)(x))
^
1 error generated.
vim +2330 fs/xfs/libxfs/xfs_da_btree.c
2254
2255 /*
2256 * Ick. We need to always be able to remove a btree block, even
2257 * if there's no space reservation because the filesystem is full.
2258 * This is called if xfs_bunmapi on a btree block fails due to ENOSPC.
2259 * It swaps the target block with the last block in the file. The
2260 * last block in the file can always be removed since it can't cause
2261 * a bmap btree split to do that.
2262 */
2263 STATIC int
2264 xfs_da3_swap_lastblock(
2265 struct xfs_da_args *args,
2266 xfs_dablk_t *dead_blknop,
2267 struct xfs_buf **dead_bufp)
2268 {
2269 struct xfs_da_blkinfo *dead_info;
2270 struct xfs_da_blkinfo *sib_info;
2271 struct xfs_da_intnode *par_node;
2272 struct xfs_da_intnode *dead_node;
2273 struct xfs_dir2_leaf *dead_leaf2;
2274 struct xfs_da_node_entry *btree;
2275 struct xfs_da3_icnode_hdr par_hdr;
2276 struct xfs_inode *dp;
2277 struct xfs_trans *tp;
2278 struct xfs_mount *mp;
2279 struct xfs_buf *dead_buf;
2280 struct xfs_buf *last_buf;
2281 struct xfs_buf *sib_buf;
2282 struct xfs_buf *par_buf;
2283 xfs_dahash_t dead_hash;
2284 xfs_fileoff_t lastoff;
2285 xfs_dablk_t dead_blkno;
2286 xfs_dablk_t last_blkno;
2287 xfs_dablk_t sib_blkno;
2288 xfs_dablk_t par_blkno;
2289 int error;
2290 int w;
2291 int entno;
2292 int level;
2293 int dead_level;
2294
2295 trace_xfs_da_swap_lastblock(args);
2296
2297 dead_buf = *dead_bufp;
2298 dead_blkno = *dead_blknop;
2299 tp = args->trans;
2300 dp = args->dp;
2301 w = args->whichfork;
2302 ASSERT(w == XFS_DATA_FORK);
2303 mp = dp->i_mount;
2304 lastoff = args->geo->freeblk;
2305 error = xfs_bmap_last_before(tp, dp, &lastoff, w);
2306 if (error)
2307 return error;
2308 if (XFS_IS_CORRUPT(mp, lastoff == 0))
2309 return -EFSCORRUPTED;
2310 /*
2311 * Read the last block in the btree space.
2312 */
2313 last_blkno = (xfs_dablk_t)lastoff - args->geo->fsbcount;
2314 error = xfs_da3_node_read(tp, dp, last_blkno, &last_buf, w);
2315 if (error)
2316 return error;
2317 /*
2318 * Copy the last block into the dead buffer and log it.
2319 */
2320 memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
2321 dead_info = dead_buf->b_addr;
2322 /*
2323 * Update the moved block's blkno if it's a dir3 leaf block
2324 */
2325 if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
2326 dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
2327 dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
2328 struct xfs_da3_blkinfo *dap = (struct xfs_da3_blkinfo *)dead_info;
2329
> 2330 dap->blkno = cpu_to_be64(dead_buf->b_bn);
2331 }
2332 xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
2333 /*
2334 * Get values from the moved block.
2335 */
2336 if (dead_info->magic == cpu_to_be16(XFS_DIR2_LEAFN_MAGIC) ||
2337 dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC)) {
2338 struct xfs_dir3_icleaf_hdr leafhdr;
2339 struct xfs_dir2_leaf_entry *ents;
2340
2341 dead_leaf2 = (xfs_dir2_leaf_t *)dead_info;
2342 xfs_dir2_leaf_hdr_from_disk(dp->i_mount, &leafhdr,
2343 dead_leaf2);
2344 ents = leafhdr.ents;
2345 dead_level = 0;
2346 dead_hash = be32_to_cpu(ents[leafhdr.count - 1].hashval);
2347 } else {
2348 struct xfs_da3_icnode_hdr deadhdr;
2349
2350 dead_node = (xfs_da_intnode_t *)dead_info;
2351 xfs_da3_node_hdr_from_disk(dp->i_mount, &deadhdr, dead_node);
2352 btree = deadhdr.btree;
2353 dead_level = deadhdr.level;
2354 dead_hash = be32_to_cpu(btree[deadhdr.count - 1].hashval);
2355 }
2356 sib_buf = par_buf = NULL;
2357 /*
2358 * If the moved block has a left sibling, fix up the pointers.
2359 */
2360 if ((sib_blkno = be32_to_cpu(dead_info->back))) {
2361 error = xfs_da3_node_read(tp, dp, sib_blkno, &sib_buf, w);
2362 if (error)
2363 goto done;
2364 sib_info = sib_buf->b_addr;
2365 if (XFS_IS_CORRUPT(mp,
2366 be32_to_cpu(sib_info->forw) != last_blkno ||
2367 sib_info->magic != dead_info->magic)) {
2368 error = -EFSCORRUPTED;
2369 goto done;
2370 }
2371 sib_info->forw = cpu_to_be32(dead_blkno);
2372 xfs_trans_log_buf(tp, sib_buf,
2373 XFS_DA_LOGRANGE(sib_info, &sib_info->forw,
2374 sizeof(sib_info->forw)));
2375 sib_buf = NULL;
2376 }
2377 /*
2378 * If the moved block has a right sibling, fix up the pointers.
2379 */
2380 if ((sib_blkno = be32_to_cpu(dead_info->forw))) {
2381 error = xfs_da3_node_read(tp, dp, sib_blkno, &sib_buf, w);
2382 if (error)
2383 goto done;
2384 sib_info = sib_buf->b_addr;
2385 if (XFS_IS_CORRUPT(mp,
2386 be32_to_cpu(sib_info->back) != last_blkno ||
2387 sib_info->magic != dead_info->magic)) {
2388 error = -EFSCORRUPTED;
2389 goto done;
2390 }
2391 sib_info->back = cpu_to_be32(dead_blkno);
2392 xfs_trans_log_buf(tp, sib_buf,
2393 XFS_DA_LOGRANGE(sib_info, &sib_info->back,
2394 sizeof(sib_info->back)));
2395 sib_buf = NULL;
2396 }
2397 par_blkno = args->geo->leafblk;
2398 level = -1;
2399 /*
2400 * Walk down the tree looking for the parent of the moved block.
2401 */
2402 for (;;) {
2403 error = xfs_da3_node_read(tp, dp, par_blkno, &par_buf, w);
2404 if (error)
2405 goto done;
2406 par_node = par_buf->b_addr;
2407 xfs_da3_node_hdr_from_disk(dp->i_mount, &par_hdr, par_node);
2408 if (XFS_IS_CORRUPT(mp,
2409 level >= 0 && level != par_hdr.level + 1)) {
2410 error = -EFSCORRUPTED;
2411 goto done;
2412 }
2413 level = par_hdr.level;
2414 btree = par_hdr.btree;
2415 for (entno = 0;
2416 entno < par_hdr.count &&
2417 be32_to_cpu(btree[entno].hashval) < dead_hash;
2418 entno++)
2419 continue;
2420 if (XFS_IS_CORRUPT(mp, entno == par_hdr.count)) {
2421 error = -EFSCORRUPTED;
2422 goto done;
2423 }
2424 par_blkno = be32_to_cpu(btree[entno].before);
2425 if (level == dead_level + 1)
2426 break;
2427 xfs_trans_brelse(tp, par_buf);
2428 par_buf = NULL;
2429 }
2430 /*
2431 * We're in the right parent block.
2432 * Look for the right entry.
2433 */
2434 for (;;) {
2435 for (;
2436 entno < par_hdr.count &&
2437 be32_to_cpu(btree[entno].before) != last_blkno;
2438 entno++)
2439 continue;
2440 if (entno < par_hdr.count)
2441 break;
2442 par_blkno = par_hdr.forw;
2443 xfs_trans_brelse(tp, par_buf);
2444 par_buf = NULL;
2445 if (XFS_IS_CORRUPT(mp, par_blkno == 0)) {
2446 error = -EFSCORRUPTED;
2447 goto done;
2448 }
2449 error = xfs_da3_node_read(tp, dp, par_blkno, &par_buf, w);
2450 if (error)
2451 goto done;
2452 par_node = par_buf->b_addr;
2453 xfs_da3_node_hdr_from_disk(dp->i_mount, &par_hdr, par_node);
2454 if (XFS_IS_CORRUPT(mp, par_hdr.level != level)) {
2455 error = -EFSCORRUPTED;
2456 goto done;
2457 }
2458 btree = par_hdr.btree;
2459 entno = 0;
2460 }
2461 /*
2462 * Update the parent entry pointing to the moved block.
2463 */
2464 btree[entno].before = cpu_to_be32(dead_blkno);
2465 xfs_trans_log_buf(tp, par_buf,
2466 XFS_DA_LOGRANGE(par_node, &btree[entno].before,
2467 sizeof(btree[entno].before)));
2468 *dead_blknop = last_blkno;
2469 *dead_bufp = last_buf;
2470 return 0;
2471 done:
2472 if (par_buf)
2473 xfs_trans_brelse(tp, par_buf);
2474 if (sib_buf)
2475 xfs_trans_brelse(tp, sib_buf);
2476 xfs_trans_brelse(tp, last_buf);
2477 return error;
2478 }
2479
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Tue, Nov 28, 2023 at 12:19:23AM -0800, Christoph Hellwig wrote:
> On Tue, Nov 28, 2023 at 01:32:01PM +0800, Jiachen Zhang wrote:
> > In the case of returning -ENOSPC, ensure tmp_logflags is initialized by 0.
> > Otherwise the caller __xfs_bunmapi will set uninitialized illegal
> > tmp_logflags value into xfs log, which might cause unpredictable error
> > in the log recovery procedure.
>
> This looks good:
>
> Reviewed-by: Christoph Hellwig <[email protected]>
>
> But I wonder if removing the local flags variable and always directly
> assigning to *logflagsp might be more robust in the long run.
Yes, I think it's better to eliminate opportunities for subtle logic
bombs by not open-coding variable aliasing. Perhaps this function
should set *logflagsp = 0 at the start of the function so that we don't
have to deal with uninitialized outparams, especially since the caller
uses it even on an error return.
--D
On Tue, Nov 28, 2023 at 05:39:50PM +0800, Jiachen Zhang wrote:
> On 2023/11/28 16:39, Christoph Hellwig wrote:
> > On Tue, Nov 28, 2023 at 01:32:02PM +0800, Jiachen Zhang wrote:
> > > From: Zhang Tianci <[email protected]>
> > >
> > > xfs_da3_swap_lastblock() copy the last block content to the dead block,
> > > but do not update the metadata in it. We need update some metadata
> > > for some kinds of type block, such as dir3 leafn block records its
> > > blkno, we shall update it to the dead block blkno. Otherwise,
> > > before write the xfs_buf to disk, the verify_write() will fail in
> > > blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.
> >
> > Do you have a reproducer for this? It would be very helpful to add it
> > to xfstests.
>
> Hi Christoph,
>
> Thanks for the review!
>
> It's hard to reproduce the issue. Currently we can reproduce it with
> some kernel code changes. We forcely reserve 0 t_blk_res for xfs_remove
> on kernel version 4.19:
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index f2d06e1e4906..c8f84b95a0ec 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -2551,13 +2551,8 @@ xfs_remove(
> * insert tries to happen, instead trimming the LAST
> * block from the directory.
> */
> - resblks = XFS_REMOVE_SPACE_RES(mp);
> - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, resblks, 0, 0,
> &tp);
> - if (error == -ENOSPC) {
> - resblks = 0;
> - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0,
> - &tp);
> - }
> + resblks = 0;
> + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_remove, 0, 0, 0, &tp);
> if (error) {
> ASSERT(error != -ENOSPC);
> goto std_return
>
>
> After insmod the new modified xfs.ko, run the following scripts, and it
> can reproduce the problem consistently on the final `umount mnt`:
>
> fallocate -l 1G xfs.img
> mkfs.xfs -f xfs.img
> mkdir -p mnt
> losetup /dev/loop0 xfs.img
> mount -t xfs /dev/loop0 mnt
> pushd mnt
> mkdir dir3
> prefix="a_"
> for j in $(seq 0 13); do
> for i in $(seq 0 2800); do
> touch dir3/${prefix}_${i}_${j}
> done
> for i in $(seq 0 2500); do
> rm -f dir3/${prefix}_${i}_${j}
> if [ "$i" == "2094" ] && [ "$j" == "13" ]; then
> echo "should reproduce now, so break here!"
> break;
> fi
> done
> done
> popd
> umount mnt
>
>
> We are still trying to make a reproducer without any kernel changes. Do
> you have any suggestions on this?
Add a debugging knob that calls xfs_da3_swap_lastblock without trying
bunmapi, modify the script to activate the knob, then that can be turned
into an fstest.
>
> >
> > >
> > > We will get this warning:
> > >
> > > XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178
> > > XFS (dm-0): Unmount and run xfs_repair
> > > XFS (dm-0): First 128 bytes of corrupted metadata buffer:
> > > 00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00 ........=.......
> > > 000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00 ................
> > > 000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf .D...dDA..`.PC..
> > > 00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00 .........s......
> > > 00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00 .).8.....).@....
> > > 000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00 .).H.....I......
> > > 00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe .I....E%.I....H.
> > > 0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97 .I....Lk.I. ..M.
> > > XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c. Return address = 00000000c0ff63c1
> > > XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem
> > > XFS (dm-0): Please umount the filesystem and rectify the problem(s)
Aha, that might explain the weird recovery failures that I've been
seeing every now and then with my parent pointer recovery stress test.
> > >
> > > >From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
> > > its blkno is 0x1a0.
> > >
> > > Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks")
> > > Signed-off-by: Zhang Tianci <[email protected]>
> > > ---
> > > fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++-
> > > 1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> > > index e576560b46e9..35f70e4c6447 100644
> > > --- a/fs/xfs/libxfs/xfs_da_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_da_btree.c
> > > @@ -2318,8 +2318,18 @@ xfs_da3_swap_lastblock(
> > > * Copy the last block into the dead buffer and log it.
> > > */
> > > memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
> > > - xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
> > > dead_info = dead_buf->b_addr;
> > > + /*
> > > + * Update the moved block's blkno if it's a dir3 leaf block
> > > + */
> > > + if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
> > > + dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
> > > + dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
> > > + struct xfs_da3_blkinfo *dap = (struct xfs_da3_blkinfo *)dead_info;
> > > +
> > > + dap->blkno = cpu_to_be64(dead_buf->b_bn);
dap->blkno = cpu_to_be64(xfs_buf_daddr(dead_buf));
(IOWs, please send patches against latest upstream, not 4.19.)
((Code looks good to me too, compile errors and style notwithstanding.))
--D
> > > + }
> > > + xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
> >
> > The fix here looks correct to me, but also a little ugly and ad-hoc.
> >
> > At last we should be using container_of and not casts for getting from a
> > xfs_da_blkinfo to a xfs_da3_blkinfo (even if there is bad precedence
> > for the cast in existing code).
> >
>
> Thanks, we will optimize the code in the next version of the patchset.
>
> > But I think it would be useful to add a helper that stamps in the blkno
> > in for a caller that only has as xfs_da_blkinfo but no xfs_da3_blkinfo
> > and use in all the places that do it currently in an open coded fashion
> > e.g. xfs_da3_root_join, xfs_da3_root_split, xfs_attr3_leaf_to_node.
> >
> > That should probably be done on top of the small backportable fix.
> >
>
> I think the idea to add helper is great, and we can do it after this
> fixes patch is merged.
>
>
> Thanks,
> Jiachen
>
>
On Tue, Nov 28, 2023 at 01:32:02PM +0800, Jiachen Zhang wrote:
> From: Zhang Tianci <[email protected]>
>
> xfs_da3_swap_lastblock() copy the last block content to the dead block,
> but do not update the metadata in it. We need update some metadata
> for some kinds of type block, such as dir3 leafn block records its
> blkno, we shall update it to the dead block blkno. Otherwise,
> before write the xfs_buf to disk, the verify_write() will fail in
> blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.
>
> We will get this warning:
>
> XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178
> XFS (dm-0): Unmount and run xfs_repair
> XFS (dm-0): First 128 bytes of corrupted metadata buffer:
> 00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00 ........=.......
> 000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00 ................
> 000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf .D...dDA..`.PC..
> 00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00 .........s......
> 00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00 .).8.....).@....
> 000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00 .).H.....I......
> 00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe .I....E%.I....H.
> 0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97 .I....Lk.I. ..M.
> XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c. Return address = 00000000c0ff63c1
> XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem
> XFS (dm-0): Please umount the filesystem and rectify the problem(s)
>
> From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
> its blkno is 0x1a0.
>
> Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks")
> Signed-off-by: Zhang Tianci <[email protected]>
> ---
> fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index e576560b46e9..35f70e4c6447 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -2318,8 +2318,18 @@ xfs_da3_swap_lastblock(
> * Copy the last block into the dead buffer and log it.
> */
> memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
> - xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
> dead_info = dead_buf->b_addr;
> + /*
> + * Update the moved block's blkno if it's a dir3 leaf block
> + */
> + if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
> + dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
> + dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
a.k.a.
if (xfs_has_crc(mp)) {
i.e. this is not specific to the buffer type being processed, it's
specific to v4 vs v5 on-disk format. Hence it's a fs-feature check,
not a block magic number check.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Wed, Nov 29, 2023 at 10:15:52AM +1100, Dave Chinner wrote:
> > + /*
> > + * Update the moved block's blkno if it's a dir3 leaf block
> > + */
> > + if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
> > + dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
> > + dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
>
> a.k.a.
>
> if (xfs_has_crc(mp)) {
>
> i.e. this is not specific to the buffer type being processed, it's
> specific to v4 vs v5 on-disk format. Hence it's a fs-feature check,
> not a block magic number check.
We have these magic based checks in quite a few places right now,
so I'm not surprised that Jiachen picked it up from there..
On Wed, Nov 29, 2023 at 10:15:52AM +1100, Dave Chinner wrote:
> On Tue, Nov 28, 2023 at 01:32:02PM +0800, Jiachen Zhang wrote:
> > From: Zhang Tianci <[email protected]>
> >
> > xfs_da3_swap_lastblock() copy the last block content to the dead block,
> > but do not update the metadata in it. We need update some metadata
> > for some kinds of type block, such as dir3 leafn block records its
> > blkno, we shall update it to the dead block blkno. Otherwise,
> > before write the xfs_buf to disk, the verify_write() will fail in
> > blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.
> >
> > We will get this warning:
> >
> > XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178
> > XFS (dm-0): Unmount and run xfs_repair
> > XFS (dm-0): First 128 bytes of corrupted metadata buffer:
> > 00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00 ........=.......
> > 000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00 ................
> > 000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf .D...dDA..`.PC..
> > 00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00 .........s......
> > 00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00 .).8.....).@....
> > 000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00 .).H.....I......
> > 00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe .I....E%.I....H.
> > 0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97 .I....Lk.I. ..M.
> > XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c. Return address = 00000000c0ff63c1
> > XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem
> > XFS (dm-0): Please umount the filesystem and rectify the problem(s)
> >
> > From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
> > its blkno is 0x1a0.
> >
> > Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks")
> > Signed-off-by: Zhang Tianci <[email protected]>
> > ---
> > fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++-
> > 1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> > index e576560b46e9..35f70e4c6447 100644
> > --- a/fs/xfs/libxfs/xfs_da_btree.c
> > +++ b/fs/xfs/libxfs/xfs_da_btree.c
> > @@ -2318,8 +2318,18 @@ xfs_da3_swap_lastblock(
> > * Copy the last block into the dead buffer and log it.
> > */
> > memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
> > - xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
> > dead_info = dead_buf->b_addr;
> > + /*
> > + * Update the moved block's blkno if it's a dir3 leaf block
> > + */
> > + if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
> > + dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
> > + dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
On second thought, does this code have to handle XFS_DA3_NODE_MAGIC as
well?
>
> a.k.a.
>
> if (xfs_has_crc(mp)) {
>
> i.e. this is not specific to the buffer type being processed, it's
> specific to v4 vs v5 on-disk format. Hence it's a fs-feature check,
> not a block magic number check.
in which case Dave's comment is correct, yes?
--D
> Cheers,
>
> Dave.
> --
> Dave Chinner
> [email protected]
>
On Wed, Nov 29, 2023 at 2:34 PM Darrick J. Wong <[email protected]> wrote:
>
> On Wed, Nov 29, 2023 at 10:15:52AM +1100, Dave Chinner wrote:
> > On Tue, Nov 28, 2023 at 01:32:02PM +0800, Jiachen Zhang wrote:
> > > From: Zhang Tianci <[email protected]>
> > >
> > > xfs_da3_swap_lastblock() copy the last block content to the dead block,
> > > but do not update the metadata in it. We need update some metadata
> > > for some kinds of type block, such as dir3 leafn block records its
> > > blkno, we shall update it to the dead block blkno. Otherwise,
> > > before write the xfs_buf to disk, the verify_write() will fail in
> > > blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.
> > >
> > > We will get this warning:
> > >
> > > XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178
> > > XFS (dm-0): Unmount and run xfs_repair
> > > XFS (dm-0): First 128 bytes of corrupted metadata buffer:
> > > 00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00 ........=.......
> > > 000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00 ................
> > > 000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf .D...dDA..`.PC..
> > > 00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00 .........s......
> > > 00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00 .).8.....).@....
> > > 000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00 .).H.....I......
> > > 00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe .I....E%.I....H.
> > > 0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97 .I....Lk.I. ..M.
> > > XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c. Return address = 00000000c0ff63c1
> > > XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem
> > > XFS (dm-0): Please umount the filesystem and rectify the problem(s)
> > >
> > > From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
> > > its blkno is 0x1a0.
> > >
> > > Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks")
> > > Signed-off-by: Zhang Tianci <[email protected]>
> > > ---
> > > fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++-
> > > 1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> > > index e576560b46e9..35f70e4c6447 100644
> > > --- a/fs/xfs/libxfs/xfs_da_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_da_btree.c
> > > @@ -2318,8 +2318,18 @@ xfs_da3_swap_lastblock(
> > > * Copy the last block into the dead buffer and log it.
> > > */
> > > memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
> > > - xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
> > > dead_info = dead_buf->b_addr;
> > > + /*
> > > + * Update the moved block's blkno if it's a dir3 leaf block
> > > + */
> > > + if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
> > > + dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
> > > + dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
>
> On second thought, does this code have to handle XFS_DA3_NODE_MAGIC as
> well?
I think the node block is not too possible the last block, but it's
harmless to add this check.
I would use Dave's suggestion to check xfs's crc-feature rather than
magic. I think it's equivalent
in this function.
We will send the v2 patchset soon.
Thanks.
>
> >
> > a.k.a.
> >
> > if (xfs_has_crc(mp)) {
> >
> > i.e. this is not specific to the buffer type being processed, it's
> > specific to v4 vs v5 on-disk format. Hence it's a fs-feature check,
> > not a block magic number check.
>
> in which case Dave's comment is correct, yes?
>
> --D
>
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > [email protected]
> >
On Tue, Nov 28, 2023 at 10:34:09PM -0800, Christoph Hellwig wrote:
> On Wed, Nov 29, 2023 at 10:15:52AM +1100, Dave Chinner wrote:
> > > + /*
> > > + * Update the moved block's blkno if it's a dir3 leaf block
> > > + */
> > > + if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
> > > + dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
> > > + dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
> >
> > a.k.a.
> >
> > if (xfs_has_crc(mp)) {
> >
> > i.e. this is not specific to the buffer type being processed, it's
> > specific to v4 vs v5 on-disk format. Hence it's a fs-feature check,
> > not a block magic number check.
>
> We have these magic based checks in quite a few places right now,
> so I'm not surprised that Jiachen picked it up from there..
Yes, but that doesn't mean the magic number check has been used
correctly here.
That is, we use the magic number check when the code has a
conditional on the type of buffer being processed (i.e. what block
type are we operating on? e.g. DANODE vs LEAFN as is checked a few
lines further down in this code). When the conditional
is "what on-disk format are we operating on?" such as when we are
decoding headers or running verifiers, we use xfs_has_crc() because
we can't trust magic numbers to be correct prior to validation.
Hence we use xfs_has_crc() to determine how to decode/encode/verify
the structure header, not the magic number in the block.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Tue, Nov 28, 2023 at 10:34:33PM -0800, Darrick J. Wong wrote:
> On Wed, Nov 29, 2023 at 10:15:52AM +1100, Dave Chinner wrote:
> > On Tue, Nov 28, 2023 at 01:32:02PM +0800, Jiachen Zhang wrote:
> > > From: Zhang Tianci <[email protected]>
> > >
> > > xfs_da3_swap_lastblock() copy the last block content to the dead block,
> > > but do not update the metadata in it. We need update some metadata
> > > for some kinds of type block, such as dir3 leafn block records its
> > > blkno, we shall update it to the dead block blkno. Otherwise,
> > > before write the xfs_buf to disk, the verify_write() will fail in
> > > blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.
> > >
> > > We will get this warning:
> > >
> > > XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178
> > > XFS (dm-0): Unmount and run xfs_repair
> > > XFS (dm-0): First 128 bytes of corrupted metadata buffer:
> > > 00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00 ........=.......
> > > 000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00 ................
> > > 000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf .D...dDA..`.PC..
> > > 00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00 .........s......
> > > 00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00 .).8.....).@....
> > > 000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00 .).H.....I......
> > > 00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe .I....E%.I....H.
> > > 0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97 .I....Lk.I. ..M.
> > > XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c. Return address = 00000000c0ff63c1
> > > XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem
> > > XFS (dm-0): Please umount the filesystem and rectify the problem(s)
> > >
> > > From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
> > > its blkno is 0x1a0.
> > >
> > > Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks")
> > > Signed-off-by: Zhang Tianci <[email protected]>
> > > ---
> > > fs/xfs/libxfs/xfs_da_btree.c | 12 +++++++++++-
> > > 1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> > > index e576560b46e9..35f70e4c6447 100644
> > > --- a/fs/xfs/libxfs/xfs_da_btree.c
> > > +++ b/fs/xfs/libxfs/xfs_da_btree.c
> > > @@ -2318,8 +2318,18 @@ xfs_da3_swap_lastblock(
> > > * Copy the last block into the dead buffer and log it.
> > > */
> > > memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize);
> > > - xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1);
> > > dead_info = dead_buf->b_addr;
> > > + /*
> > > + * Update the moved block's blkno if it's a dir3 leaf block
> > > + */
> > > + if (dead_info->magic == cpu_to_be16(XFS_DIR3_LEAF1_MAGIC) ||
> > > + dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC) ||
> > > + dead_info->magic == cpu_to_be16(XFS_ATTR3_LEAF_MAGIC)) {
>
> On second thought, does this code have to handle XFS_DA3_NODE_MAGIC as
> well?
Yes, it does. The code to decode the block as a danode instead of
leaf1/leafn block is a few lines further down.
This code does not support ATTR_LEAF blocks, however.
xfs_da_shrink_inode() will only try to swap blocks on the data fork,
never on the attribute fork. That's largely a moot issue, though.
-Dave.
--
Dave Chinner
[email protected]