2023-01-03 00:38:18

by Zsolt Murzsa

[permalink] [raw]
Subject: ext4 superblock checksum invalid after running resize2fs

Hi!

I've had the same issue with twice in the last couple of days with the resize2fs online expand function.
I have a md raid 1, with an LVM volume, which is formatted with ext4. I resized the volume (from 4T to 5T), then I ran resize2fs, which ran without error, the file system got bigger.

After a few hours, I reset the machine (unsafely), due to some zombie processes, but after restarting, the system could not mount the filesystem.
I checked the disks, and ran some hardware checks, but I didn't find anything wrong. I thought the hard reset caused some problem.

That was the problem: "Superblock checksum does not match superblock". I tried several superblocks, e2fsck, testdisk, but nothing helped, dumpe2fs showed all the data about the superblock.
I started to restore from a backup.

In the meantime, I found the debugfs tool, with which I could skip the checksum check and thus see all the folders and files that I restored to a separate disk.
I replaced the two drives, recreated md RAID 1, LVM, then reformatted with ext4, started copying the data back.

I ran out of space so expanded the LV and ran resize2fs again (from 3T to 5T). It ran successfully again, the attached file system is 5T.
Then I ran an e2fsck.

"e2fsck -n /dev/vg1/data
e2fsck 1.46.5 (30-Dec-2021)
Warning! /dev/vg1/data is mounted.
ext2fs_open2: Superblock checksum does not match superblock
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Superblock checksum does not match superblock while trying to open /dev/vg1/data

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
or
e2fsck -b 32768 <device>"

I'm shocked it happened again.
I can currently write / read the files, but it is suspicious that I will not be able to mount the filesystem again.
In the first case, I couldn't find a simple solution, but is it possible to fix the checksum somehow?
It takes a lot of time to use debugfs to copy everything to another drive and back again.

My current kernel version: 5.19.17-1-pve.
I can attach all the superblocks (Both the first and second case), or any other information, if needed.

Best Regards,
Zsolt Murzsa


2023-01-03 02:41:36

by Baokun Li

[permalink] [raw]
Subject: Re: ext4 superblock checksum invalid after running resize2fs

On 2023/1/3 8:35, Zsolt Murzsa wrote:
> Hi!
>
> I've had the same issue with twice in the last couple of days with the resize2fs online expand function.
> I have a md raid 1, with an LVM volume, which is formatted with ext4. I resized the volume (from 4T to 5T), then I ran resize2fs, which ran without error, the file system got bigger.
>
> After a few hours, I reset the machine (unsafely), due to some zombie processes, but after restarting, the system could not mount the filesystem.
> I checked the disks, and ran some hardware checks, but I didn't find anything wrong. I thought the hard reset caused some problem.
>
> That was the problem: "Superblock checksum does not match superblock". I tried several superblocks, e2fsck, testdisk, but nothing helped, dumpe2fs showed all the data about the superblock.
> I started to restore from a backup.
>
> In the meantime, I found the debugfs tool, with which I could skip the checksum check and thus see all the folders and files that I restored to a separate disk.
> I replaced the two drives, recreated md RAID 1, LVM, then reformatted with ext4, started copying the data back.
>
> I ran out of space so expanded the LV and ran resize2fs again (from 3T to 5T). It ran successfully again, the attached file system is 5T.
> Then I ran an e2fsck.
>
> "e2fsck -n /dev/vg1/data
> e2fsck 1.46.5 (30-Dec-2021)
> Warning! /dev/vg1/data is mounted.
> ext2fs_open2: Superblock checksum does not match superblock
> e2fsck: Superblock invalid, trying backup blocks...
> e2fsck: Superblock checksum does not match superblock while trying to open /dev/vg1/data
>
> The superblock could not be read or does not describe a valid ext2/ext3/ext4
> filesystem. If the device is valid and it really contains an ext2/ext3/ext4
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
> e2fsck -b 8193 <device>
> or
> e2fsck -b 32768 <device>"
>
> I'm shocked it happened again.
> I can currently write / read the files, but it is suspicious that I will not be able to mount the filesystem again.
> In the first case, I couldn't find a simple solution, but is it possible to fix the checksum somehow?
> It takes a lot of time to use debugfs to copy everything to another drive and back again.
>
> My current kernel version: 5.19.17-1-pve.
> I can attach all the superblocks (Both the first and second case), or any other information, if needed.
>
> Best Regards,
> Zsolt Murzsa

Hi Zsolt,

Maybe this patch on the mainline has fixed your problem:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a408f33e895e455f16cf964cb5cd4979b658db7b

--
With Best Regards,
Baokun Li
.

2023-01-03 21:24:23

by Andreas Dilger

[permalink] [raw]
Subject: Re: ext4 superblock checksum invalid after running resize2fs

On Jan 2, 2023, at 5:35 PM, Zsolt Murzsa <[email protected]> wrote:
>
> Hi!
>
> I've had the same issue with twice in the last couple of days with the resize2fs online expand function.
> I have a md raid 1, with an LVM volume, which is formatted with ext4. I resized the volume (from 4T to 5T), then I ran resize2fs, which ran without error, the file system got bigger.
>
> After a few hours, I reset the machine (unsafely), due to some zombie processes, but after restarting, the system could not mount the filesystem.
> I checked the disks, and ran some hardware checks, but I didn't find anything wrong. I thought the hard reset caused some problem.
>
> That was the problem: "Superblock checksum does not match superblock". I tried several superblocks, e2fsck, testdisk, but nothing helped, dumpe2fs showed all the data about the superblock.
> I started to restore from a backup.
>
> In the meantime, I found the debugfs tool, with which I could skip the checksum check and thus see all the folders and files that I restored to a separate disk.
> I replaced the two drives, recreated md RAID 1, LVM, then reformatted with ext4, started copying the data back.
>
> I ran out of space so expanded the LV and ran resize2fs again (from 3T to 5T). It ran successfully again, the attached file system is 5T.
> Then I ran an e2fsck.
>
> "e2fsck -n /dev/vg1/data
> e2fsck 1.46.5 (30-Dec-2021)
> Warning! /dev/vg1/data is mounted.
> ext2fs_open2: Superblock checksum does not match superblock
> e2fsck: Superblock invalid, trying backup blocks...
> e2fsck: Superblock checksum does not match superblock while trying to open /dev/vg1/data
>
> The superblock could not be read or does not describe a valid ext2/ext3/ext4
> filesystem. If the device is valid and it really contains an ext2/ext3/ext4
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
> e2fsck -b 8193 <device>
> or
> e2fsck -b 32768 <device>"

Did you try "e2fsck -fy" to fix the checksum?

> I'm shocked it happened again.
> I can currently write / read the files, but it is suspicious that I will not be able to mount the filesystem again.
> In the first case, I couldn't find a simple solution, but is it possible to fix the checksum somehow?
> It takes a lot of time to use debugfs to copy everything to another drive and back again.
>
> My current kernel version: 5.19.17-1-pve.
> I can attach all the superblocks (Both the first and second case), or any other information, if needed.


Cheers, Andreas






Attachments:
signature.asc (890.00 B)
Message signed with OpenPGP