2022-10-26 20:05:26

by Jakob Unterwurzacher

[permalink] [raw]
Subject: ext4 online resize -> EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC

Hi,

it looks like I am hitting a similar issue as reported by Borislav Petkov
in April 2022 ( https://lore.kernel.org/lkml/[email protected]/ ).

I'm on kernel 6.0.5 and see this on arm64 as well as x86_64.
I have a 100% reproducer using a loop mount, here it is:

truncate -s 16g ext4.img
mkfs.ext4 ext4.img 500m
mkdir ext4.mnt
mount ext4.img ext4.mnt
resize2fs ext4.img

And these are the kernel messages it generates:

[ 33.774267] loop0: detected capacity change from 0 to 33554432
[ 33.796319] EXT4-fs (loop0): mounted filesystem with ordered data mode. Quota mode: none.
[ 33.796518] ext4 filesystem being mounted at /root/ext4.mnt supports timestamps until 2038 (0x7fffffff)
[ 33.799324] EXT4-fs (loop0): resizing filesystem from 512000 to 16777216 blocks
[ 33.933110] EXT4-fs (loop0): resized filesystem to 16777216
[ 33.965633] EXT4-fs (loop0): Invalid checksum for backup superblock 8193

[ 33.965675] EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC
[ 33.965884] EXT4-fs (loop0): Invalid checksum for backup superblock 24577

[ 33.965902] EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC
[ 33.966058] EXT4-fs (loop0): Invalid checksum for backup superblock 40961

[ 33.966075] EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC
[ 33.966225] EXT4-fs (loop0): Invalid checksum for backup superblock 57345

[ 33.966242] EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC
[ 33.966398] EXT4-fs (loop0): Invalid checksum for backup superblock 73729

[ 33.966415] EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC
[ 33.966557] EXT4-fs (loop0): Invalid checksum for backup superblock 204801

[ 33.966574] EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC
[ 33.966765] EXT4-fs (loop0): Invalid checksum for backup superblock 221185

[ 33.966784] EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC
[ 33.966946] EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC
[ 33.967074] EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC
[ 33.967237] EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC

e2fsck seems mostly happy, should I be concerned?

e2fsck ext4.img

e2fsck 1.46.2 (28-Feb-2021)
ext4.img contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
ext4.img: 11/4161536 files (0.0% non-contiguous), 536410/16777216 blocks

Thank you,
Jakob


2022-10-28 04:16:05

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext4 online resize -> EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC

On Wed, Oct 26, 2022 at 07:49:56PM +0000, Unterwurzacher, Jakob wrote:
>
> it looks like I am hitting a similar issue as reported by Borislav Petkov
> in April 2022 ( https://lore.kernel.org/lkml/[email protected]/ ).
>
> I'm on kernel 6.0.5 and see this on arm64 as well as x86_64.
> I have a 100% reproducer using a loop mount, here it is:
>
> truncate -s 16g ext4.img
> mkfs.ext4 ext4.img 500m
> mkdir ext4.mnt
> mount ext4.img ext4.mnt
> resize2fs ext4.img

Thanks for the reproducer! The following patch should fix things.

- Ted

From 9a8c5b0d061554fedd7dbe894e63aa34d0bac7c4 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <[email protected]>
Date: Thu, 27 Oct 2022 16:04:36 -0400
Subject: [PATCH] ext4: update the backup superblock's at the end of the online
resize

When expanding a file system using online resize, various fields in
the superblock (e.g., s_blocks_count, s_inodes_count, etc.) change.
To update the backup superblocks, the online resize uses the function
update_backups() in fs/ext4/resize.c. This function was not updating
the checksum field in the backup superblocks. This wasn't a big deal
previously, because e2fsck didn't care about the checksum field in the
backup superblock. (And indeed, update_backups() goes all the way
back to the ext3 days, well before we had support for metadata
checksums.)

However, there is an alternate, more general way of updating
superblock fields, ext4_update_primary_sb() in fs/ext4/ioctl.c. This
function does check the checksum of the backup superblock, and if it
doesn't match will mark the file system as corrupted. That was
clearly not the intent, so avoid to aborting the resize when a bad
superblock is found.

In addition, teach update_backups() to properly update the checksum in
the backup superblocks. We will eventually want to unify
updapte_backups() with the infrasture in ext4_update_primary_sb(), but
that's for another day.

Note: The problem has been around for a while; it just didn't really
matter until ext4_update_primary_sb() was added by commit bbc605cdb1e1
("ext4: implement support for get/set fs label"). And it became
trivially easy to reproduce after commit 827891a38acc ("ext4: update
the s_overhead_clusters in the backup sb's when resizing") in v6.0.

Cc: [email protected] # 5.17+
Fixes: bbc605cdb1e1 ("ext4: implement support for get/set fs label")
Signed-off-by: Theodore Ts'o <[email protected]>
---
fs/ext4/ioctl.c | 3 +--
fs/ext4/resize.c | 5 +++++
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 4d49c5cfb690..790d5ffe8559 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -145,9 +145,8 @@ static int ext4_update_backup_sb(struct super_block *sb,
if (ext4_has_metadata_csum(sb) &&
es->s_checksum != ext4_superblock_csum(sb, es)) {
ext4_msg(sb, KERN_ERR, "Invalid checksum for backup "
- "superblock %llu\n", sb_block);
+ "superblock %llu", sb_block);
unlock_buffer(bh);
- err = -EFSBADCRC;
goto out_bh;
}
func(es, arg);
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 6dfe9ccae0c5..46b87ffeb304 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1158,6 +1158,7 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data,
while (group < sbi->s_groups_count) {
struct buffer_head *bh;
ext4_fsblk_t backup_block;
+ struct ext4_super_block *es;

/* Out of journal space, and can't get more - abort - so sad */
err = ext4_resize_ensure_credits_batch(handle, 1);
@@ -1186,6 +1187,10 @@ static void update_backups(struct super_block *sb, sector_t blk_off, char *data,
memcpy(bh->b_data, data, size);
if (rest)
memset(bh->b_data + size, 0, rest);
+ es = (struct ext4_super_block *) bh->b_data;
+ es->s_block_group_nr = cpu_to_le16(group);
+ if (ext4_has_metadata_csum(sb))
+ es->s_checksum = ext4_superblock_csum(sb, es);
set_buffer_uptodate(bh);
unlock_buffer(bh);
err = ext4_handle_dirty_metadata(handle, NULL, bh);
--
2.31.0


2022-10-28 12:00:04

by Jakob Unterwurzacher

[permalink] [raw]
Subject: Re: ext4 online resize -> EXT4-fs error (device loop0) in ext4_update_backup_sb:174: Filesystem failed CRC

On 28.10.22 05:59, Theodore Ts'o wrote:
>
> Thanks for the reproducer! The following patch should fix things.
>
> - Ted
>
> From 9a8c5b0d061554fedd7dbe894e63aa34d0bac7c4 Mon Sep 17 00:00:00 2001
> From: Theodore Ts'o <[email protected]>
> Date: Thu, 27 Oct 2022 16:04:36 -0400
> Subject: [PATCH] ext4: update the backup superblock's at the end of the online
> resize

Hi Theodore,

I tested the patch on arm64 and it fixes the issue. Now the kernel
messages are just this:

> [ 14.769997] EXT4-fs (mmcblk2p1): resizing filesystem from 139771 to 3888507 blocks
> [ 15.020593] EXT4-fs (mmcblk2p1): resized filesystem to 3888507
fsck after the resize is happy too.

Thank you!
Jakob