While helping a friend recover from a catastrophic "rm -rf" accident,
I discovered that deleted files have the inode number in their old
directory entries zeroed. This makes it impossible to match file
names with recovered files. I've verified this behavior on Mandrake
8.1 with Mandrake's stock 2.4.8 kernel. In my kernel sources and
in the stock 2.4.8 sources, the function ext2_delete_entry() in
fs/ext2/dir.c has this line:
dir->inode = 0;
I've done some searching with Google for discussion of this feature.
I didn't find any, but I did find a patch that appears to have
introduced the above line and an "annotated" listing of dir.c
suggesting that the line is part of revision 1.2. The patch that
apparently introduced this line looks like it was part of a major
overhaul.
Now, I'm tempted to comment the line out in my kernel and see
what happens. But it does occur to me that hackers with more
experience than I may zeroing the inode number for a reason and
may be depending on it elsewhere in the kernel. Or perhaps the
ext2 flavor of fsck will malfunction if deleted directory entries
have a non-zero inode?
Can anybody suggest a reason for the existence of the above line
in fs/ext2/dir.c? Or possibly suggest what would certainly break
if I removed it in my kernel?
Thanks!
Paul Allen
[email protected]
On Sat, 16 Mar 2002, Paul Allen wrote:
> While helping a friend recover from a catastrophic "rm -rf" accident,
> I discovered that deleted files have the inode number in their old
> directory entries zeroed. This makes it impossible to match file
> names with recovered files. I've verified this behavior on Mandrake
> 8.1 with Mandrake's stock 2.4.8 kernel. In my kernel sources and
> in the stock 2.4.8 sources, the function ext2_delete_entry() in
> fs/ext2/dir.c has this line:
>
> dir->inode = 0;
>
> I've done some searching with Google for discussion of this feature.
Try "A Fast Filesystem for UNIX(tm)", by McKusick et.al.
On Sat, Mar 16, 2002 at 12:24:15AM -0800, Paul Allen wrote:
> While helping a friend recover from a catastrophic "rm -rf" accident,
> I discovered that deleted files have the inode number in their old
> directory entries zeroed. This makes it impossible to match file
> names with recovered files. I've verified this behavior on Mandrake
> 8.1 with Mandrake's stock 2.4.8 kernel. In my kernel sources and
> in the stock 2.4.8 sources, the function ext2_delete_entry() in
> fs/ext2/dir.c has this line:
>
> dir->inode = 0;
>
> Now, I'm tempted to comment the line out in my kernel and see
> what happens. But it does occur to me that hackers with more
> experience than I may zeroing the inode number for a reason and
> may be depending on it elsewhere in the kernel. Or perhaps the
> ext2 flavor of fsck will malfunction if deleted directory entries
> have a non-zero inode?
Um.... the way directory entries are marked as deleted is by zeroing
out the inode number.
So if you take out that line, deleted files will appear not to be
deleted, the kernel will get confused, and you can be sure that fsck
will complain.
- Ted
On Sun, 17 Mar 2002 [email protected] wrote:
> On Sat, Mar 16, 2002 at 12:24:15AM -0800, Paul Allen wrote:
> > While helping a friend recover from a catastrophic "rm -rf" accident,
> > I discovered that deleted files have the inode number in their old
> > directory entries zeroed. This makes it impossible to match file
> > names with recovered files. I've verified this behavior on Mandrake
> > 8.1 with Mandrake's stock 2.4.8 kernel. In my kernel sources and
> > in the stock 2.4.8 sources, the function ext2_delete_entry() in
> > fs/ext2/dir.c has this line:
> >
> > dir->inode = 0;
> >
> > Now, I'm tempted to comment the line out in my kernel and see
> > what happens. But it does occur to me that hackers with more
> > experience than I may zeroing the inode number for a reason and
> > may be depending on it elsewhere in the kernel. Or perhaps the
> > ext2 flavor of fsck will malfunction if deleted directory entries
> > have a non-zero inode?
>
> Um.... the way directory entries are marked as deleted is by zeroing
> out the inode number.
>
> So if you take out that line, deleted files will appear not to be
> deleted, the kernel will get confused, and you can be sure that fsck
> will complain.
Yes and no.
Procedurally, rm -rf must delete all children before deleting the parent,
but in the end result, it is sufficient to have marked the parent deleted,
without flushing the modified child directories back to disk.
Also, (for the benefit of our readers) in the case of ext2 directories,
dirents are in the form
[inode][reclen][namelen]["name"][inode][reclen][namelen]["name"]
where reclen is effectively a pointer to the next record. It should be
sufficient for the purposes of e2fsck and the kernel that records be
unlinked from the list by extending the previous record and the inode in
the entry be marked unused in the inode bitmap. So I see no reason to be
zeroing the contents of unreferenced disk space, as it needlessly hinders
future rescue attempts.
Paul, if you feel like hacking, I once wrote a Perl module that
understands (pre-sparse-superblock) Ext2 disk layouts:
http://waste.org/~oxymoron/E2fs.pm
In combination with other scripts, I've used it to recover gigabytes of
files, even in the presence of mangled directories and zeroed indirect
blocks (which hopefully are no longer senselessly zeroed by current
kernels).
--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."
On Sun, Mar 17, 2002 at 11:21:08AM -0600, Oliver Xymoron wrote:
> Also, (for the benefit of our readers) in the case of ext2 directories,
> dirents are in the form
> [inode][reclen][namelen]["name"][inode][reclen][namelen]["name"]
> where reclen is effectively a pointer to the next record. It should be
> sufficient for the purposes of e2fsck and the kernel that records be
> unlinked from the list by extending the previous record and the inode in
> the entry be marked unused in the inode bitmap. So I see no reason to be
> zeroing the contents of unreferenced disk space, as it needlessly hinders
> future rescue attempts.
Out of curiosity... how would you mark the first entry in a directory
as 'deleted' under your suggestion?
Also, I'm not certain, but I suspect that the reclen vs namelen
difference allows the ext2(/3) format to be extended while minimizing
breakage to existing code. One day another field might be added to the
inode and any assumptions regarding the size of a record length would
limit such extensions. (One such field is currently the 'file type',
although, the file type does not actually use up any additional bytes)
After all, if the record length was always the inode length + name
length + the name, I would personally vote for removing the reclen
altogether. :-)
mark (who likes "rm -fr" being very fast... the easiest way to not
remove things you don't want to remove is (1) keep backups,
and (2) don't use it as a habit. Additionally, using a shell
like /bin/zsh allows you to catch nasty typos involving
"rm -fr *")
--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
On Sun, 17 Mar 2002, Mark Mielke wrote:
> On Sun, Mar 17, 2002 at 11:21:08AM -0600, Oliver Xymoron wrote:
> > Also, (for the benefit of our readers) in the case of ext2 directories,
> > dirents are in the form
> > [inode][reclen][namelen]["name"][inode][reclen][namelen]["name"]
> > where reclen is effectively a pointer to the next record. It should be
> > sufficient for the purposes of e2fsck and the kernel that records be
> > unlinked from the list by extending the previous record and the inode in
> > the entry be marked unused in the inode bitmap. So I see no reason to be
> > zeroing the contents of unreferenced disk space, as it needlessly hinders
> > future rescue attempts.
>
> Out of curiosity... how would you mark the first entry in a directory
> as 'deleted' under your suggestion?
It's not a suggestion, the current code already does this:
/*
* ext2_delete_entry deletes a directory entry by merging it with the
* previous entry. Page is up-to-date. Releases the page.
*/
...
if (pde)
pde->rec_len = cpu_to_le16(to-from);
As it happens, the first entry tends to be '.'.
> Also, I'm not certain, but I suspect that the reclen vs namelen
> difference allows the ext2(/3) format to be extended while minimizing
> breakage to existing code. One day another field might be added to the
> inode and any assumptions regarding the size of a record length would
> limit such extensions. (One such field is currently the 'file type',
> although, the file type does not actually use up any additional bytes)
Doesn't matter, reclen still makes it a linked list, and we'd still skip
over 'dead' entries, regardless of content.
--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."
On Sun, Mar 17, 2002 at 03:20:19PM -0600, Oliver Xymoron wrote:
> On Sun, 17 Mar 2002, Mark Mielke wrote:
> > Out of curiosity... how would you mark the first entry in a directory
> > as 'deleted' under your suggestion?
> As it happens, the first entry tends to be '.'.
If this was a guarantee, I would assume that the initial two entries
could be optimized as two inodes.
> > Also, I'm not certain, but I suspect that the reclen vs namelen
> > difference allows the ext2(/3) format to be extended while minimizing
> > breakage to existing code. One day another field might be added to the
> > inode and any assumptions regarding the size of a record length would
> > limit such extensions. (One such field is currently the 'file type',
> > although, the file type does not actually use up any additional bytes)
> Doesn't matter, reclen still makes it a linked list, and we'd still skip
> over 'dead' entries, regardless of content.
If the extra bytes (reclen - namelen) *were* extra bits of file system
information, there would be no safe way of ensuring that the allocation
of a new directory entry didn't 'accidentally' overwrite these bytes.
Exactly how big should you assume reclen *really* is, if reclen
sometimes means the length of the record, and other times means a next
pointer offset?
mark
--
[email protected]/[email protected]/[email protected] __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
OK, so I've been informed by none other than Alan Cox and
Ted Tso that the ext2 filesystem has to zero inode numbers
in deleted directory entries because it has to. And I've
been told to go read McKusick by the amazing Alexander Viro.
Wizards such as these have scant time to waste on questioners
such as myself, and I am grateful for the words they sent
my way.
Although I have been around Unix and kernel sources for
quite a long time, I am not a member of the elite Linux
kernel hacking community. Perhaps you can imagine the
trepidation with which I put forth the following fact:
Prior to 2.4.6, the inode number of a deleted file was only
zeroed if there was no previous entry with a reclen value
to be adjusted. If I'm reading the code right, when the
first entry in a directory block was deleted, its inode
number was zeroed. If any other entry in a directory block
was deleted, the reclen of the previous entry was adjusted
to point past the deleted entry and the inode number was
not zeroed.
With 2.4.6, the ext2_delete_entry() function moved from
fs/ext2/namei.c to fs/ext2/dir.c and its behavior changed.
Now, the inode number is always zeroed.
I've tested file deletion on a stock 2.2.18 kernel, and it
behaved the same as a pre 2.4.6 kernel would. A single
deleted file in the root of an ext2 filesystem on a floppy
retained its inode number. The ext2_delete_entry() function
in the 2.2.18 fs/ext2/namei.c looks very similar to the
2.4.0 version.
So, prior to the 2.4.6 kernel, it appears that most deleted
directory entries had non-zero inode numbers. The kernel,
fsck, and everybody else was perfectly happy. And someone
needing to resurrect filenames to go with deleted inodes
could usually find them. It would be possible for multiple
deleted directory entries to point to the same inode, but I
think this preferable to losing all filenames.
Certainly, file undeletion is dicey on a multi-user system
or one with background activity other than the silly user's
errant "rm" command in the foreground. However, in the
case of a single-user system with little going on except
the foreground shell, retaining most of the inode numbers
on deleted files is arguably useful. In the event that
damaged my friend's filesystem, about 16,000 files were
deleted over a span of a few seconds. Most of these were
chaff: browser caches, metadata stores for KDE, Gnome,
and the like. Not more than a couple hundred of the 16,000
files were actually useful, but they were anonymous needles
in a haystack of data. It's been a month now, and we think
we've got most of the good data recovered.
In short, I liked the pre-2.4.6 behavior. I'm curious as
to the rationale for changing it.
Thanks!
Paul Allen
[email protected]
On Mar 18, 2002 16:50 -0800, Paul Allen wrote:
> Perhaps you can imagine the trepidation with which I put
> forth the following fact:
Yes, it is always tough when you dip your toes into new waters.
In this case I think you may have something. There is always
the chance that Al will still pipe in with "not doing that can
be exploited as a race condition by doing X, Y, and Z".
> With 2.4.6, the ext2_delete_entry() function moved from
> fs/ext2/namei.c to fs/ext2/dir.c and its behavior changed.
> Now, the inode number is always zeroed.
You could always just put an "else" in front of the zeroing, so
it looks like:
if (pde)
pde->rec_len = cpu_to_le16(to-from);
else
dir->inode = 0;
Let us know how it turns out (I think it will be OK).
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert