2009-05-04 13:11:32

by Marcel Partap

[permalink] [raw]
Subject: fsck ate my ext4 home partition, help!?

Dear fs hackers,
some days ago, out of a sudden i was missing some two hundred pics from my digicam, so when i rebooted my comp (which, mysteriously enough, had hung up to the point where even SYSRQ+B would not work) yesterday and X was just starting (i.e. the home partition was already mounted) i decided to stop xdm service and unmount the filesystem to run a quick check over it. Unmounting went successful, however fsck complained about /dev/sdd4 still being mounted. After confirming (lsof, mtab, empty mount point) that that was not actually the case, i ran fsck -p -v /dev/sdd4 and continued (beyond the fake still-mounted warning).. whereas the previous run of e2fsck with the -n was showing a bunch of stuff to fix, it now instantly bailed out complaining about broken superblock and so on. After that, fsck -n still showed a bunch of (the same?) errors to fix, but remounting the filesystem (already with a bad hunch of course) revealed the havoc that was done: ls -laR showed abundant I/O errors, file names AND attributes consisting of umlauts and question marks, and df reported the size of the fs suddenly at 64 ZETTABYTE! Doom. Remounted ro, root directory looked kinda fine, some stuff was still accessible, but especially the home directory on there not even showed . and .. entries! Obviously this is quite bad, and after having dded the partition to a backup image, i am still unsure on how to approach a recovery of this situation. For sure the data is still there, but how to get at it? It's quite an old volume aswell so probably fragmented heavily...
As i am in uni right now i don't have access to the complete screen buffer log but i can provide to anyone who has any idea how to fix this. If someone can actually help me to get it back in the state it was before invoking e2fsk, i'd be overly thankful and would show my appreciation through a 50$ paypal donation. Please, someone help me unscrew this mess *g
For the record, i am running kernel 2.6.30 RC3 with gentoo's e2fsprogs-1.41.3.. and i have not rebooted the system since the incident so maybe some guerilla forensics can work on my 8GB of RAM?
thx & regards, marcel..
--
Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss f?r nur 17,95 Euro/mtl.!* http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a


2009-05-05 05:33:08

by Christian Kujau

[permalink] [raw]
Subject: Re: fsck ate my ext4 home partition, help!?

On Mon, 4 May 2009, Marcel Partap wrote:
> check over it. Unmounting went successful, however fsck complained about
> /dev/sdd4 still being mounted. After confirming (lsof, mtab, empty mount

Have you checked /proc/mounts? mtab could be stale, lsof not seeing
everything and the "empty mountpoint" could be some other mount on top of
/home.

> previous run of e2fsck with the -n was showing a bunch of stuff to fix,

Do you still have that e2fsck output?

> revealed the havoc that was done: ls -laR showed abundant I/O errors,

Again: error messages would be helpful.

> If someone can actually help me to get it back in the state it was
> before invoking e2fsk,

You did not take the dd image before the first e2fsck, hm? Now that you
have a backup: a few days ago a tool called "extundelete"[0] has been
annunced on ext3-users, maybe that can be of help recovering your fs.

> For the record, i am running kernel 2.6.30 RC3 with gentoo's
> e2fsprogs-1.41.3.. and i have not rebooted the system since the incident

e2fsprogs-1.41.5 has been released recently, you might want to give it a
shot.

Christian.

[0] http://extundelete.sf.net/
--
Perfect Forward Secrecy is when Bruce Schneier whispers something in your ear.

2009-05-05 14:44:43

by Marcel Partap

[permalink] [raw]
Subject: Re: fsck ate my ext4 home partition, help!?


> Have you checked /proc/mounts? mtab could be stale, lsof not seeing
> everything and the "empty mountpoint" could be some other mount on top of
> /home.
Well how did that happen. It indeed still shows up there:
/dev/sdd4 /home ext3 rw,noatime,nodiratime,errors=continue,user_xattr,data=writeback 0 0
Which btw means this actually is an ext3 volume, confirmed by fstab. How did i miss this. Fo0 bar!

>
> > previous run of e2fsck with the -n was showing a bunch of stuff to fix,
> Do you still have that e2fsck output?
> > revealed the havoc that was done: ls -laR showed abundant I/O errors,
> Again: error messages would be helpful.
Sending you the screen output buffer in a minute.

> You did not take the dd image before the first e2fsck, hm?
Well backing up files before trashing them is sooo unadventurous aint it. e2fsck really needs the ability to write all actions to an undo log file. By default. Hmmpf.

> Now that you
> have a backup: a few days ago a tool called "extundelete"[0] has been
> annunced on ext3-users, maybe that can be of help recovering your fs.
Going to let it loose on the partition and have a go, hopefully i don't have to free another 300gig for it to recover the files...


> e2fsprogs-1.41.5 has been released recently, you might want to give it a
> shot.
Hmm will have a look at the changelog and recompile, thx for the notice.

> Christian.
> Perfect Forward Secrecy is when Bruce Schneier whispers something in your
> ear.
Thx so much for your time and effort to provide with a new mission briefing. On to recovery and BEYOND!!!
marcel ;)
--
Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

2009-05-05 15:06:29

by Marcel Partap

[permalink] [raw]
Subject: Re: fsck ate my ext4 home partition, help!?

Ok, i see more clearly now the vast extents of my stupidity, and the gap between perception and reality:
- the partition is and has been ext3 all the way (sorry linux-ext4 for misspamming!)
- i treated /proc/mounts with ignorance
- it seems out of lazyness i did not actually check lsof/ mount point the way i should have (ouch! what i did was as good indeed as not doing anything.. looking/grepping for the WRONG mount path eew)
- the BOINC daemon was running and holding locks on its working directory, even still until i just shut it down. SHEESH! Sometimes it just needs a couple more iterations of rereconsidering the actual situation.

So people, treat your filesystems with the respect and attention they deserve. And TRUST warning messages spit out by highly adept file system tools MORE than your intuition. These tools *do* work as advertised.
Well and the point about backing up precious stuff -=[before]=- knocking out your FS structures already has been made before i guess.

regards marcel.
--
Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss f?r nur 17,95 Euro/mtl.!* http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a

2009-05-05 16:52:19

by Andreas Dilger

[permalink] [raw]
Subject: Re: fsck ate my ext4 home partition, help!?

On May 05, 2009 16:44 +0200, Marcel Partap wrote:
> > You did not take the dd image before the first e2fsck, hm?
> Well backing up files before trashing them is sooo unadventurous aint it.
> e2fsck really needs the ability to write all actions to an undo log file.
> By default. Hmmpf.

There is a feature in newer e2fsprogs that does create an undo log for
e2fsck, but the performance isn't necessarily great. I don't know how
bad it gets, but maybe for the majority of people this would be an
acceptable alternative to the risk of major filesystem errors.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2009-05-05 20:49:19

by Christian Kujau

[permalink] [raw]
Subject: Re: fsck ate my ext4 home partition, help!?

On Tue, 5 May 2009, Marcel Partap wrote:
> Sending you the screen output buffer in a minute.

I took the liberty and attached a part of your log to this email.
>From what I can see, fsck warns every time that the filesystem is mounted,
while in fact it is not (or so it seems, still: /proc/mounts should have
told for sure). But you ran e2fsck almost always in interactive or
readonly mode, only 2 times e2fsck attempted to alter the fs:

----------------------
localhost ~ # fsck -p -v /dev/sdd4
fsck 1.41.4 (27-Jan-2009)
/dev/sdd4 is mounted.
WARNING!!! Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.
Do you really want to continue (y/n)? yes
/dev/sdd4: recovering journal
fsck.ext3: Bad magic number in super-block while trying to re-open /dev/sdd4
----------------------

And the 2nd time it reported:

----------------------
fsck.ext3: Bad magic number in super-block while trying to open /dev/sdd4
/dev/sdd4: The superblock could not be read or does not describe a correct ext2
----------------------

Now that you have a backup copy, I'd suggest to get that "but sdd4 is
mounted" error out of the way and try to e2fsck with a different
superblock. I find it a bit harsh for ext3 to bail out completly when
almost nothing has been altered by e2fsck. Then again, we still don't know
what caused the filesystem errors in the first place and how long they
have been there, waiting to be discovered by these weird e2fsck runs...

Christian.
--
On Bruce Schneier's birthday, a person standing at the very center of
Stonehenge casts a shadow in the shape of Bruce Schneier's PGP public key
fingerprint.


Attachments:
sdc4mess_p.txt (21.31 kB)

2009-05-06 13:43:17

by Marcel Partap

[permalink] [raw]
Subject: Re: fsck ate my ext4 home partition, help!?

Hmm yeah. First attempt issuing a fsck.ext3 -yv /dev/sdd4 resulted in a lost+found frenzy - everything under the former directory /mnt/sdc4/homedirs/currenthomebase (which was mounted under /home) got relinked into lost+found with sequential numbers... Not bad, the data is there - but this is quite unusable iykwim..

> Now that you have a backup copy, I'd suggest to get that "but sdd4 is
> mounted" error out of the way and try to e2fsck with a different
> superblock.

Uhmm, well. So i again dded the backup image to the partition, ran mkfs.ext3 -nv /dev/sdd4 to get a list of the FS's backup superblocks, then tried to see if any of them is in a better state than the original one by doing
> for blockpos in 32768 98304 163840 229376 294912 819200 884736 1605632
> 2654208 4096000 7962624 11239424 20480000 23887872; do fsck.ext3 -vnb
> $blockpos /dev/sdd4 > fsck-$blockpos.log; done
and then comparing those output files. Unfortunately, all show the same resulting output meaning there is no benefit from using them. A script i found @ http://blog.windfluechter.net/index.php?/archives/307-Automatically-restore-files-from-lost+found-improved.html which can move objects back in place from lost+found has to backup all filenames BEFORE running into this situation so is not of great help at this point..

Oh and this extundelete tool - i couldn't quite put it to the test because as soon as i let it loose on the partition - well it quickly eats up all memory causing the oom_killer to terminate it.

Force-mounting the partition _without_ repairing it just results in an empty mount point.

Ain't there no alternative way to reconstruct the directory structure, it surely can't be overwritten completely...??
regards marcel.
--
Psssst! Schon vom neuen GMX MultiMessenger geh?rt? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

2009-05-07 04:01:21

by Christian Kujau

[permalink] [raw]
Subject: Re: fsck ate my ext4 home partition, help!?

On Wed, 6 May 2009, Marcel Partap wrote:
> Oh and this extundelete tool - i couldn't quite put it
> to the test because as soon as i let it loose on the partition - well it
> quickly eats up all memory causing the oom_killer to terminate it.

There are more "undelete" tools listed in the wiki:
http://ext4.wiki.kernel.org/index.php/Undeletion

...but the results may not be perfect, to say the least.

Christian.
--
Bruce Schneier doesn't keep secrets -- they keep themselves out of fear.