LinuxLists.cc - Fwd: strange e2fsck magic number behaviour

2013-09-12 16:39:34

Subject: Fwd: strange e2fsck magic number behaviour

I'm currently trying to recover an ext4 filesystem. Last night, during
a resize operation, the system (Ubuntu 12.04 LTS on my fix-stuff usb
stick) locked up hard and eventually crashed. Restarting,
unsurprisingly, gparted offered to check the volume. e2fsck, called
from within gparted, replayed the journal overnight and completed the
resize.

however, where I was expecting a volume with about 3.5GB of free
space, there was now a volume with 32GB free space, a bit more than
50% utilised. inevitably, trying to boot the linux that lives in there
dropped into grub rescue.

going back, I tried to e2fsck it. this reported large numbers of inode
issues and eventually reported clean. I could mount the volume, but
file metadata looked generally broken (lots of ?s). testdisk showed
the partitions were intact, although it claimed the drive was the
wrong size (incorrectly), and found lots of deleted files within my
ecryptfs home folder. It also found the backup superblocks for the
damaged volume.

the first couple I tried were corrupt, but the third was valid. e2fsck
-b [superblock] -y reports fixing a lot of inode things, checksums,
and then restarts. it then starts to report hunormous numbers of
multiply-claimed blocks.

and now comes the interesting bit - at some point, block 16777215
starts to appear more and more often in the inodes, often duplicated,
until it starts to print out the number 16777215 in a fast loop. in
fact, it looks like it hits some inode and keeps printing block
16777215 to the same very long line (it's generated 500MB of log)

I removed the first inode containing this block via debugfs, without
this helping.

It sticks out that 16777215 is a magic number (the maximum in a 48 bit
address space) and I google that either ext4 or e2fsck has had a bug
involving it before.

2013-09-12 16:43:31

by Alexander Harrowell

[permalink] [raw]

Subject: Re: strange e2fsck magic number behaviour

To be clearer, I meant 24 bits.

On Thu, Sep 12, 2013 at 4:39 PM, Alexander Harrowell
<[email protected]> wrote:
> I'm currently trying to recover an ext4 filesystem. Last night, during
> a resize operation, the system (Ubuntu 12.04 LTS on my fix-stuff usb
> stick) locked up hard and eventually crashed. Restarting,
> unsurprisingly, gparted offered to check the volume. e2fsck, called
> from within gparted, replayed the journal overnight and completed the
> resize.
>
> however, where I was expecting a volume with about 3.5GB of free
> space, there was now a volume with 32GB free space, a bit more than
> 50% utilised. inevitably, trying to boot the linux that lives in there
> dropped into grub rescue.
>
> going back, I tried to e2fsck it. this reported large numbers of inode
> issues and eventually reported clean. I could mount the volume, but
> file metadata looked generally broken (lots of ?s). testdisk showed
> the partitions were intact, although it claimed the drive was the
> wrong size (incorrectly), and found lots of deleted files within my
> ecryptfs home folder. It also found the backup superblocks for the
> damaged volume.
>
> the first couple I tried were corrupt, but the third was valid. e2fsck
> -b [superblock] -y reports fixing a lot of inode things, checksums,
> and then restarts. it then starts to report hunormous numbers of
> multiply-claimed blocks.
>
> and now comes the interesting bit - at some point, block 16777215
> starts to appear more and more often in the inodes, often duplicated,
> until it starts to print out the number 16777215 in a fast loop. in
> fact, it looks like it hits some inode and keeps printing block
> 16777215 to the same very long line (it's generated 500MB of log)
>
> I removed the first inode containing this block via debugfs, without
> this helping.
>
> It sticks out that 16777215 is a magic number (the maximum in a 48 bit
> address space) and I google that either ext4 or e2fsck has had a bug
> involving it before.

2013-09-12 16:44:48

by Eric Sandeen

[permalink] [raw]

Subject: Re: Fwd: strange e2fsck magic number behaviour

On 9/12/13 11:39 AM, Alexander Harrowell wrote:
> I'm currently trying to recover an ext4 filesystem. Last night, during
> a resize operation,

from what size to what size? On what kernel?

> the system (Ubuntu 12.04 LTS on my fix-stuff usb
> stick) locked up hard and eventually crashed. Restarting,
> unsurprisingly, gparted offered to check the volume. e2fsck, called
> from within gparted, replayed the journal overnight and completed the
> resize.

hmmm... perhaps.

> however, where I was expecting a volume with about 3.5GB of free
> space, there was now a volume with 32GB free space, a bit more than
> 50% utilised. inevitably, trying to boot the linux that lives in there
> dropped into grub rescue.
>
> going back, I tried to e2fsck it. this reported large numbers of inode
> issues and eventually reported clean. I could mount the volume, but
> file metadata looked generally broken (lots of ?s). testdisk showed
> the partitions were intact, although it claimed the drive was the
> wrong size (incorrectly), and found lots of deleted files within my
> ecryptfs home folder. It also found the backup superblocks for the
> damaged volume.
>
> the first couple I tried were corrupt, but the third was valid. e2fsck
> -b [superblock] -y reports fixing a lot of inode things, checksums,
> and then restarts. it then starts to report hunormous numbers of
> multiply-claimed blocks.
>
> and now comes the interesting bit - at some point, block 16777215
> starts to appear more and more often in the inodes, often duplicated,
> until it starts to print out the number 16777215 in a fast loop. in
> fact, it looks like it hits some inode and keeps printing block
> 16777215 to the same very long line (it's generated 500MB of log)

= 111111111111111111111111 binary.

Guessing it's maybe a bitmap block?

Resize2fs has had a lot of trouble lately it seems. You may have just
been the unlucky recipient of a resize2fs bug...

-Eric

> I removed the first inode containing this block via debugfs, without
> this helping.
>
> It sticks out that 16777215 is a magic number (the maximum in a 48 bit
> address space) and I google that either ext4 or e2fsck has had a bug
> involving it before.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2013-09-12 17:35:36

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Fwd: strange e2fsck magic number behaviour

On Thu, Sep 12, 2013 at 04:39:33PM +0000, Alexander Harrowell wrote:
> I'm currently trying to recover an ext4 filesystem. Last night, during
> a resize operation, the system (Ubuntu 12.04 LTS on my fix-stuff usb
> stick) locked up hard and eventually crashed. Restarting,
> unsurprisingly, gparted offered to check the volume. e2fsck, called
> from within gparted, replayed the journal overnight and completed the
> resize.

How big was this file system? And it sounds like you were doing an
on-line resize (that is, the file system was mounted at the time when
you did the resize)? There were some bugs there with file file
systems with block numbers > 32-bits (i.e., greater than 16TB). But
for smaller file systems, online resize should have been fairly safe.
Certainly I'm not aware of any bugs that resulted in the system
locking up hard.

> and now comes the interesting bit - at some point, block 16777215
> starts to appear more and more often in the inodes, often duplicated,
> until it starts to print out the number 16777215 in a fast loop. in
> fact, it looks like it hits some inode and keeps printing block
> 16777215 to the same very long line (it's generated 500MB of log)

0xFFFFFF or 0x1000000 isn't a magic boundary as far as ext4 is
concerned. It appears that this is showing up as part of the multiply
claimed blocks error message? That usually happens because there was
garbage in an indirect block or in the extent tree.

What you might have remembered is that the maximum number of physical
blocks with ext4 is 48 bits, but what you are reporting is 24 bits,
which is something else quite different.

It would help to see a short except of exactly what e2fsck reported,
so we could see whether it is being reported as a logical block number
or a physical block number. However, I suspect this is really much
more of a symptom rather than the cause.

Regards,

- Ted