2009-01-05 13:53:52

by Christian Ohm

[permalink] [raw]
Subject: How to recover a damaged ext4 file system?


Hello,

Since ext4 had its development status removed in 2.6.28, and there seemed to be
no reports of serious problems, I decided to try it on a partition of
semi-important files. Well, after a hard system hang because of the (open
source Radeon) graphics driver, the file system is quite corrupted, and cannot
be mounted any more (that never happened with ext3). mount gives the following
error:

mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

dmesg message:

EXT4-fs: ext4_check_descriptors: Block bitmap for group 0 not in group (block 727012683)!
EXT4-fs: group descriptors corrupted!

I have uploaded the output of fsck .ext4 -n at
http://www.filefactory.com/file/aff6f3g/n/fsck_ext4_bz2 which is over 6MB of
stuff like

---
e2fsck 1.41.3 (12-Oct-2008)
fsck.ext4: Group descriptors look bad... trying backup blocks...
Block bitmap for group 0 is not in group. (block 727012683)
Relocate? no

Inode bitmap for group 0 is not in group. (block 3406175899)
Relocate? no

Inode table for group 0 is not in group. (block 1236188664)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Group descriptor 0 checksum is invalid. Fix? no

Block bitmap for group 1 is not in group. (block 2704710215)
Relocate? no

Inode bitmap for group 1 is not in group. (block 2166870417)
Relocate? no

Inode table for group 1 is not in group. (block 600148394)
WARNING: SEVERE DATA LOSS POSSIBLE.
Relocate? no

Group descriptor 1 checksum is invalid. Fix? no
---

and later

---
Group descriptor 7452 checksum is invalid. Fix? no

Error reading block 1236188664 (Invalid argument). Ignore error? no

data-1000 contains a file system with errors, check forced.
Error reading block 1236188664 (Invalid argument). Ignore error? no

fsck.ext4: Invalid argument while reading bad blocks inode
This doesn't bode well, but we'll try to go on...
Pass 1: Checking inodes, blocks, and sizes
Illegal block number passed to ext2fs_test_block_bitmap #1236188664 for in-use block map
Illegal block number passed to ext2fs_mark_block_bitmap #1236188664 for in-use block map
---


Now as I said, the files are semi-important, meaning I could recover those I
still want with some time, but repairing the file system would be preferable.
Unfortunately I don't have enough space on another harddrive to just copy the
partition and experiment on that, so I haven't tried letting fsck repair the fs
yet, and since it says SEVERE DATA LOSS POSSIBLE I wouldn't like to try that
without copying first.

So my two main questions would be:

1. How can I recover the data on the file system? As I said, I don't need all
the files, but it would save some time. I created it with the mkfs.ext4 from
Debian unstable (1.41.3) with only largefile as extra option, and the default
mount options with kernel 2.6.28. The fs wasn't used for long, and I mostly
copied/created files, without deleting much.

2. Is this corruption a fault of ext4? I guess this is difficult to answer, but
I had ext3 survive any lockups without much problems. So far ext4 seems not
quite that robust, but perhaps another file system would have blown up as well
in this situation. Is there any information I can give you to help make ext4
more robust?

Best regards,
Christian Ohm

PS: I think my first post with the fsck output attached got rejected due to its
size, though I didn't receive a message about that.


2009-01-06 12:05:33

by Andreas Dilger

[permalink] [raw]
Subject: Re: How to recover a damaged ext4 file system?

On Jan 05, 2009 14:53 +0100, Christian Ohm wrote:
> no reports of serious problems, I decided to try it on a partition of
> semi-important files. Well, after a hard system hang because of the (open
> source Radeon) graphics driver, the file system is quite corrupted, and cannot
> be mounted any more (that never happened with ext3). mount gives the following
> error:
>
> mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
> missing codepage or helper program, or other error
> In some cases useful info is found in syslog - try
> dmesg | tail or so
>
> dmesg message:
>
> EXT4-fs: ext4_check_descriptors: Block bitmap for group 0 not in group (block 727012683)!
> EXT4-fs: group descriptors corrupted!

You should try to run e2fsck with the backup group descriptors, using
the -B and/or -b options (at a guess -B 4096 and -b 32768).

> I have uploaded the output of fsck .ext4 -n at
> http://www.filefactory.com/file/aff6f3g/n/fsck_ext4_bz2 which is over 6MB of
> stuff like
>
> ---
> e2fsck 1.41.3 (12-Oct-2008)
> fsck.ext4: Group descriptors look bad... trying backup blocks...
> Block bitmap for group 0 is not in group. (block 727012683)
> Relocate? no
>
> Inode bitmap for group 0 is not in group. (block 3406175899)
> Relocate? no
>
> Inode table for group 0 is not in group. (block 1236188664)
> WARNING: SEVERE DATA LOSS POSSIBLE.
> Relocate? no
>
> Group descriptor 0 checksum is invalid. Fix? no
>
> Block bitmap for group 1 is not in group. (block 2704710215)
> Relocate? no
>
> Inode bitmap for group 1 is not in group. (block 2166870417)
> Relocate? no
>
> Inode table for group 1 is not in group. (block 600148394)
> WARNING: SEVERE DATA LOSS POSSIBLE.
> Relocate? no
>
> Group descriptor 1 checksum is invalid. Fix? no
> ---
>
> and later
>
> ---
> Group descriptor 7452 checksum is invalid. Fix? no
>
> Error reading block 1236188664 (Invalid argument). Ignore error? no
>
> data-1000 contains a file system with errors, check forced.
> Error reading block 1236188664 (Invalid argument). Ignore error? no
>
> fsck.ext4: Invalid argument while reading bad blocks inode
> This doesn't bode well, but we'll try to go on...
> Pass 1: Checking inodes, blocks, and sizes
> Illegal block number passed to ext2fs_test_block_bitmap #1236188664 for in-use block map
> Illegal block number passed to ext2fs_mark_block_bitmap #1236188664 for in-use block map
> ---
>
>
> Now as I said, the files are semi-important, meaning I could recover those I
> still want with some time, but repairing the file system would be preferable.
> Unfortunately I don't have enough space on another harddrive to just copy the
> partition and experiment on that, so I haven't tried letting fsck repair the fs
> yet, and since it says SEVERE DATA LOSS POSSIBLE I wouldn't like to try that
> without copying first.
>
> So my two main questions would be:
>
> 1. How can I recover the data on the file system? As I said, I don't need all
> the files, but it would save some time. I created it with the mkfs.ext4 from
> Debian unstable (1.41.3) with only largefile as extra option, and the default
> mount options with kernel 2.6.28. The fs wasn't used for long, and I mostly
> copied/created files, without deleting much.
>
> 2. Is this corruption a fault of ext4? I guess this is difficult to answer, but
> I had ext3 survive any lockups without much problems. So far ext4 seems not
> quite that robust, but perhaps another file system would have blown up as well
> in this situation. Is there any information I can give you to help make ext4
> more robust?
>
> Best regards,
> Christian Ohm
>
> PS: I think my first post with the fsck output attached got rejected due to its
> size, though I didn't receive a message about that.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-01-06 19:34:08

by Theodore Ts'o

[permalink] [raw]
Subject: Re: How to recover a damaged ext4 file system?

On Tue, Jan 06, 2009 at 05:05:27AM -0700, Andreas Dilger wrote:
>
> You should try to run e2fsck with the backup group descriptors, using
> the -B and/or -b options (at a guess -B 4096 and -b 32768).

That probably won't help, given that the fsck transcript already says
this:

> > fsck.ext4: Group descriptors look bad... trying backup blocks...

It looks like both the primary and the backup block group descriptors
are bad. I'm not sure how this happened; normally nothing touches the
backup block superblocks at all. Stupid question --- are you sure the
partition table is sane; that's always the first thing to check.

Can you upload someplace the output of

dumpe2fs /dev/XXX
dumpe2fs -o superblock=32768 /dev/XXX
dumpe2fs -o superblock=98304 /dev/XXX

That would be helpful to see what had happened.

> 2. Is this corruption a fault of ext4? I guess this is difficult to
> answer, but I had ext3 survive any lockups without much problems. So
> far ext4 seems not quite that robust, but perhaps another file
> system would have blown up as well in this situation. Is there any
> information I can give you to help make ext4 more robust?

I'm not sure what the hard system hang did, but it looks like it
splattered a lot of random crap all over the harddrive. I doubt ext4
did this, and I doubt ext3 would have done any better.... we need to
know a lot more about exactly what sort damage was done to the
filesytem to say for certain, though.

- Ted



2009-01-07 21:42:18

by Christian Ohm

[permalink] [raw]
Subject: Re: How to recover a damaged ext4 file system?

On Tuesday, 6 January 2009 at 14:34, Theodore Tso wrote:
> On Tue, Jan 06, 2009 at 05:05:27AM -0700, Andreas Dilger wrote:
> >
> > You should try to run e2fsck with the backup group descriptors, using
> > the -B and/or -b options (at a guess -B 4096 and -b 32768).
>
> That probably won't help, given that the fsck transcript already says
> this:
>
> > > fsck.ext4: Group descriptors look bad... trying backup blocks...

Yes, I think I tried that without success.

> It looks like both the primary and the backup block group descriptors
> are bad. I'm not sure how this happened; normally nothing touches the
> backup block superblocks at all. Stupid question --- are you sure the
> partition table is sane; that's always the first thing to check.

I think so; I didn't explicitly look, but didn't notice anything strange.

> Can you upload someplace the output of
>
> dumpe2fs /dev/XXX
> dumpe2fs -o superblock=32768 /dev/XXX
> dumpe2fs -o superblock=98304 /dev/XXX
>
> That would be helpful to see what had happened.

I'll do that soon; I got another harddisk to copy the partition, but both disks
aren't connected right now.

> > 2. Is this corruption a fault of ext4? I guess this is difficult to
> > answer, but I had ext3 survive any lockups without much problems. So
> > far ext4 seems not quite that robust, but perhaps another file
> > system would have blown up as well in this situation. Is there any
> > information I can give you to help make ext4 more robust?
>
> I'm not sure what the hard system hang did, but it looks like it
> splattered a lot of random crap all over the harddrive. I doubt ext4
> did this, and I doubt ext3 would have done any better.... we need to
> know a lot more about exactly what sort damage was done to the
> filesytem to say for certain, though.

I did one copy of the partition already (took three hours, so not something to
do often...), and ran fsck -y on that. The result was an endless fsck loop like
that described in
http://www.linuxquestions.org/questions/linux-hardware-18/corrupt-ext3-partition-need-to-recover-376366/.
Oh, and I have to try if dumpe2fs actually works, either that or debugfs failed
when I tried to run it on the original disk (I also ran dumpe2fs on the copy
while fsck was doing its looping, and depending on the time it did or did not
find a file system on the device). Anyway, I hope I can experiment some more
tomorrow.

Oh, and is there a human understandable description of the on-disk data format
to compare with a hexdump? A (admittedly very short) search didn't turn up
anything.

Best regards,
Christian Ohm


2009-01-08 10:11:52

by Andreas Dilger

[permalink] [raw]
Subject: Re: How to recover a damaged ext4 file system?

On Jan 07, 2009 22:42 +0100, Christian Ohm wrote:
> > Can you upload someplace the output of
> >
> > dumpe2fs /dev/XXX
> > dumpe2fs -o superblock=32768 /dev/XXX
> > dumpe2fs -o superblock=98304 /dev/XXX
> >
> > That would be helpful to see what had happened.
>
> I'll do that soon; I got another harddisk to copy the partition, but both
> disks aren't connected right now.

You could also and compile and run the e2fsprogs "findsuper" tool (I've
attached it here, it isn't built by default). This will scan the specified
device and look for ext2/3/4 superblock signatures.

> > > 2. Is this corruption a fault of ext4? I guess this is difficult to
> > > answer, but I had ext3 survive any lockups without much problems. So
> > > far ext4 seems not quite that robust, but perhaps another file
> > > system would have blown up as well in this situation. Is there any
> > > information I can give you to help make ext4 more robust?
> >
> > I'm not sure what the hard system hang did, but it looks like it
> > splattered a lot of random crap all over the harddrive. I doubt ext4
> > did this, and I doubt ext3 would have done any better.... we need to
> > know a lot more about exactly what sort damage was done to the
> > filesytem to say for certain, though.
>
> I did one copy of the partition already (took three hours, so not something to
> do often...), and ran fsck -y on that. The result was an endless fsck loop like
> that described in
> http://www.linuxquestions.org/questions/linux-hardware-18/corrupt-ext3-partition-need-to-recover-376366/.
> Oh, and I have to try if dumpe2fs actually works, either that or debugfs failed
> when I tried to run it on the original disk (I also ran dumpe2fs on the copy
> while fsck was doing its looping, and depending on the time it did or did not
> find a file system on the device). Anyway, I hope I can experiment some more
> tomorrow.
>
> Oh, and is there a human understandable description of the on-disk data format
> to compare with a hexdump? A (admittedly very short) search didn't turn up
> anything.
>
> Best regards,
> Christian Ohm
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


Attachments:
(No filename) (2.35 kB)
findsuper.c (7.91 kB)
Download all attachments

2009-02-07 16:29:46

by Christian Ohm

[permalink] [raw]
Subject: Re: How to recover a damaged ext4 file system?

On Tuesday, 6 January 2009 at 14:34, Theodore Tso wrote:
> It looks like both the primary and the backup block group descriptors
> are bad. I'm not sure how this happened; normally nothing touches the
> backup block superblocks at all. Stupid question --- are you sure the
> partition table is sane; that's always the first thing to check.

I created a new partition on the second drive, and I hope I used exactly the
same options. The result of fdisk -l is the following:

corrupted drive:

Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xaaaaaaaa

Device Boot Start End Blocks Id System
/dev/sde1 1 121601 976760032 83 Linux

new partition on similar drive:

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xaaaaaaaa

Device Boot Start End Blocks Id System
/dev/sdb1 1 121601 976760001 83 Linux

The only difference is the number of blocks of the partition, I guess since the
start and end are the same this should be equal as well.

> Can you upload someplace the output of
>
> dumpe2fs /dev/XXX
> dumpe2fs -o superblock=32768 /dev/XXX
> dumpe2fs -o superblock=98304 /dev/XXX
>
> That would be helpful to see what had happened.

Uploaded at http://www.filefactory.com/file/afg88b1/n/dumps_tar_bz2. dump-0 is
the output of the first command, dump-32768 the second, and the third was equal
to the second. The following two lines weren't redirected into the files (even
with 2>&1), and were the same for all three commands (well, at least for the
first line that's not really surprising).

dumpe2fs 1.41.3 (12-Oct-2008)
ext2fs_read_bb_inode: Invalid argument-

I couldn't yet compile the findsuper program (some missing headers), but since
dumpe2fs found some more or less valid data, it shouldn't be necessary, right?

I also tried the R-Linux recovery program mentioned from
http://www.data-recovery-software.net/Linux_Recovery.shtml, but that didn't
really work (not surprising, since it's for ext3 only).

Best regards,
Christian Ohm

PS: Sorry for the late answer, I'll reply more quickly now.

2009-02-07 19:04:34

by Eric Sandeen

[permalink] [raw]
Subject: Re: How to recover a damaged ext4 file system?

Christian Ohm wrote:
> On Tuesday, 6 January 2009 at 14:34, Theodore Tso wrote:
>> It looks like both the primary and the backup block group descriptors
>> are bad. I'm not sure how this happened; normally nothing touches the
>> backup block superblocks at all. Stupid question --- are you sure the
>> partition table is sane; that's always the first thing to check.
>
> I created a new partition on the second drive, and I hope I used exactly the
> same options. The result of fdisk -l is the following:
>
> corrupted drive:
>
> Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0xaaaaaaaa
>
> Device Boot Start End Blocks Id System
> /dev/sde1 1 121601 976760032 83 Linux
>
> new partition on similar drive:
>
> Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0xaaaaaaaa
>
> Device Boot Start End Blocks Id System
> /dev/sdb1 1 121601 976760001 83 Linux
>
> The only difference is the number of blocks of the partition, I guess since the
> start and end are the same this should be equal as well.

that's counting "cylinders" - try "fdisk -u" to be able to display (or
specify) geometry in sectors, which is not a unit open to interpretation...

-Eric


2009-02-12 21:39:09

by Christian Ohm

[permalink] [raw]
Subject: Re: How to recover a damaged ext4 file system?

On Saturday, 7 February 2009 at 13:04, Eric Sandeen wrote:
> that's counting "cylinders" - try "fdisk -u" to be able to display (or
> specify) geometry in sectors, which is not a unit open to interpretation...

Corrupted disk:

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0xaaaaaaaa

Device Boot Start End Blocks Id System
/dev/sdc1 1 1953520064 976760032 83 Linux

New partition:

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0xaaaaaaaa

Device Boot Start End Blocks Id System
/dev/sdc1 63 1953520064 976760001 83 Linux


Both disks show the exact same size in sectors (in the kernel messages as
well), so the new partition on the new drive should be exactly the same as the
one on the old drive. For some reason the new partition starts at sector 63,
while the old one starts at sector 1 - but that could be a difference in
creating the partitions (unless sector 1 is an invalid starting sector?).

Best regards,
Christian Ohm