2022-04-28 13:06:34

by Borislav Petkov

[permalink] [raw]
Subject: EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC

Hi,

the errors at the end of this mail come from one of my test boxes booted
with latest Linus:

8f4dd16603ce ("Merge branch 'akpm' (patches from Andrew)")

+ tip/master.

A second boot into the same kernel says:

[ 5.427329] EXT4-fs (sda5): warning: mounting fs with errors, running e2fsck is recommended
[ 5.435681] EXT4-fs (sda5): mounted filesystem with ordered data mode. Quota mode: disabled.
...

[ 316.621377] EXT4-fs (sda5): error count since last fsck: 14
[ 316.621645] EXT4-fs (sda5): initial error at time 1651146136: ext4_update_backup_sb:165
[ 316.621948] EXT4-fs (sda5): last error at time 1651146136: ext4_update_backup_sb:165


And it used to work fine with rc3:

EXT4-fs (sda5): mounted filesystem with ordered data mode. Quota mode: disabled.

so before I go and fsck the partition, I thought I should report it
first - maybe something new in ext4 land is not behaving as it should...

And since rc3 I see:

$ git log --oneline v5.18-rc3.. fs/ext4/
c00c5e1d157b Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
eb7054212eac ext4: update the cached overhead value in the superblock
85d825dbf489 ext4: force overhead calculation if the s_overhead_cluster makes no sense
10b01ee92df5 ext4: fix overhead calculation to account for the reserved gdt blocks
2da376228a24 ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
c186f0887fe7 ext4: fix use-after-free in ext4_search_dir
b98535d09179 ext4: fix bug_on in start_this_handle during umount filesystem
a2b0b205d125 ext4: fix symlink file size not match to file content
ad5cd4f4ee4d ext4: fix fallocate to use file_modified to update permissions consistently

so there is something which just got applied...

[ 4.742960] device-mapper: ioctl: 4.46.0-ioctl (2022-02-22) initialised: [email protected]
[ 4.766518] loop: module loaded
[ 4.836287] EXT4-fs (sda5): mounted filesystem with ordered data mode. Quota mode: disabled.
[ 4.840733] EXT4-fs (sda5): Invalid checksum for backup superblock 32768

[ 4.843142] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
[ 4.844802] EXT4-fs (sda5): Invalid checksum for backup superblock 98304

[ 4.847239] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
[ 4.848942] EXT4-fs (sda5): Invalid checksum for backup superblock 163840

[ 4.851344] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
[ 4.852919] EXT4-fs (sda5): Invalid checksum for backup superblock 229376

[ 4.855270] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
[ 4.856910] EXT4-fs (sda5): Invalid checksum for backup superblock 294912

[ 4.859279] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
[ 4.860946] EXT4-fs (sda5): Invalid checksum for backup superblock 819200

[ 4.863429] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
[ 4.865182] EXT4-fs (sda5): Invalid checksum for backup superblock 884736

[ 4.867793] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
[ 4.869583] EXT4-fs (sda5): Invalid checksum for backup superblock 1605632

[ 4.872285] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
[ 4.874109] EXT4-fs (sda5): Invalid checksum for backup superblock 2654208

[ 4.877056] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
[ 4.878751] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


2022-05-26 18:38:42

by Ritesh Harjani

[permalink] [raw]
Subject: Re: EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC

On 22/04/28 02:55PM, Borislav Petkov wrote:
> Hi,
>
> the errors at the end of this mail come from one of my test boxes booted
> with latest Linus:
>
> 8f4dd16603ce ("Merge branch 'akpm' (patches from Andrew)")
>
> + tip/master.
>
> A second boot into the same kernel says:
>
> [ 5.427329] EXT4-fs (sda5): warning: mounting fs with errors, running e2fsck is recommended
> [ 5.435681] EXT4-fs (sda5): mounted filesystem with ordered data mode. Quota mode: disabled.
> ...
>
> [ 316.621377] EXT4-fs (sda5): error count since last fsck: 14
> [ 316.621645] EXT4-fs (sda5): initial error at time 1651146136: ext4_update_backup_sb:165
> [ 316.621948] EXT4-fs (sda5): last error at time 1651146136: ext4_update_backup_sb:165

Could you please help us understand little more about your setup. Is this (sda5)
somehow a backup image saved/restored using e2image?

>
>
> And it used to work fine with rc3:
>
> EXT4-fs (sda5): mounted filesystem with ordered data mode. Quota mode: disabled.
>
> so before I go and fsck the partition, I thought I should report it
> first - maybe something new in ext4 land is not behaving as it should...
>
> And since rc3 I see:
>
> $ git log --oneline v5.18-rc3.. fs/ext4/
> c00c5e1d157b Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
> eb7054212eac ext4: update the cached overhead value in the superblock
> 85d825dbf489 ext4: force overhead calculation if the s_overhead_cluster makes no sense
> 10b01ee92df5 ext4: fix overhead calculation to account for the reserved gdt blocks

^^^ looks like these patches might have triggered the check on the backup
superblock if the on-disk s_overhead_cluster doesn't match with in kernel
calculation.

> 2da376228a24 ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
> c186f0887fe7 ext4: fix use-after-free in ext4_search_dir
> b98535d09179 ext4: fix bug_on in start_this_handle during umount filesystem
> a2b0b205d125 ext4: fix symlink file size not match to file content
> ad5cd4f4ee4d ext4: fix fallocate to use file_modified to update permissions consistently
>
> so there is something which just got applied...
>
> [ 4.742960] device-mapper: ioctl: 4.46.0-ioctl (2022-02-22) initialised: [email protected]
> [ 4.766518] loop: module loaded
> [ 4.836287] EXT4-fs (sda5): mounted filesystem with ordered data mode. Quota mode: disabled.
> [ 4.840733] EXT4-fs (sda5): Invalid checksum for backup superblock 32768
>
> [ 4.843142] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [ 4.844802] EXT4-fs (sda5): Invalid checksum for backup superblock 98304
>
> [ 4.847239] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [ 4.848942] EXT4-fs (sda5): Invalid checksum for backup superblock 163840
>
> [ 4.851344] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [ 4.852919] EXT4-fs (sda5): Invalid checksum for backup superblock 229376
>
> [ 4.855270] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [ 4.856910] EXT4-fs (sda5): Invalid checksum for backup superblock 294912
>
> [ 4.859279] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [ 4.860946] EXT4-fs (sda5): Invalid checksum for backup superblock 819200
>
> [ 4.863429] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [ 4.865182] EXT4-fs (sda5): Invalid checksum for backup superblock 884736
>
> [ 4.867793] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [ 4.869583] EXT4-fs (sda5): Invalid checksum for backup superblock 1605632
>
> [ 4.872285] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [ 4.874109] EXT4-fs (sda5): Invalid checksum for backup superblock 2654208
>
> [ 4.877056] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC
> [ 4.878751] EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC

All of the prints above shows the prints coming from ext4_update_backup_sb()
which is getting called during mount from ext4_fill_super() -> ext4_update_overhead()

So, recently I have also been reported with a similar problem, where in filesystem
image which was saved using e2image on v5.17 kernel (with e2fsck 1.45.5 (07-Jan-2020)).
Then on upgrading the kernel to v5.18, when this FS image (via e2image) was mounted
using loop device (or restored to a block device), the above error messages
were observed.

My theory so far is, that somehow the s_overhead_cluster calculation saved on
the disk was not correct (since I guess earlier version of e2fsprog 1.45.5 might
not be storing s_overhead_cluster information on disk durnig mkfs??).
Then on upgrading the kernel, the 3 patches mentioned would recalculate the
sbi->s_overhead for non-bigalloc filesystem during mount and if it doesn't match
the on disk es->s_overhead_cluster value, it will try to update all superblocks
via (ext4_update_overhead())

why CRC checksum failure -
...Before updating backup superblock it will check the checksum to make sure
that the superblock backup copy is not corrupt.
And I guess e2image doesn't stores the backup superblocks while saving the
image. So those blocks are all zeroed. Hence the superblock checksum problem
is getting reported with the case which I am seeing it internally.

So, putting down my thoughts here for discussion -

- 1st is this consider a valid usecase to use e2image save/restore of disk image
(users could backup using "-a" option which will also take the backup of all the FS
data + critical metadata).

- Given we might use this way of updating backup superblock copies in kernel for
even other values in future and users could upgrade their kernels but might
still use older e2fsprogs, does it make sense to provide an option in e2image
to save copies of backup superblocks too?

- I haven't yet spend much time for a solution for above problem. i.e. What
should we do for users who might still might take up backup w/o this
additional option to save backup superblocks. With this kernel thinks that the
backup superblock is corrupt, since it's checksum doesn't match.

-ritesh

2022-05-27 00:34:04

by Borislav Petkov

[permalink] [raw]
Subject: Re: EXT4-fs error (device sda5) in ext4_update_backup_sb:165: Filesystem failed CRC

On Thu, May 26, 2022 at 08:24:02PM +0530, Ritesh Harjani wrote:
> Could you please help us understand little more about your setup. Is
> this (sda5) somehow a backup image saved/restored using e2image?

Nah, just a separate, normal partition.

> So, recently I have also been reported with a similar problem, where in filesystem
> image which was saved using e2image on v5.17 kernel (with e2fsck 1.45.5 (07-Jan-2020)).
> Then on upgrading the kernel to v5.18, when this FS image (via e2image) was mounted
> using loop device (or restored to a block device), the above error messages
> were observed.

That could be the case. See, this box is simply a test laptop where I
boot kernels all the time and userspace doesn't get updated, pretty
much. So if old tools massage the superblock and new kernels then touch
it again, that could mean there's some discrepancy there... but what do
I know - I have not clue about fs.

In any case, 5.18-rc7+ didn't have those earlier messages but simply:

EXT4-fs (sda5): warning: mounting fs with errors, running e2fsck is recommended
EXT4-fs (sda5): mounted filesystem with ordered data mode. Quota mode: disabled.

but I haven't run any fsck yet. Since this is a test box, it doesn't
matter a whole lot, though.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette