Hi,
If the mode of a directory gets corrupted, check_filetype() makes wrong decisions for all its sub-directories. For example, using debugfs we can corrupt the mode of a directory to 0140755 (i.e. a socket). e2fsck will set the filetype of all its subdirectories as 6 (filetype for socket). All the subdirectories would be moved to lost+found, and in second run of e2fsck their filetype would be set back to 2.
By the time we come to check_filetype(), we have already verified the "." and ".." entries, so we special case these dirents in check_filetype().
Please consider for review.
Signed-off-by: Andreas Dilger <adilger-KYPl3Ael/[email protected]>
Signed-off-by: Kalpak Shah <kalpak-KYPl3Ael/[email protected]>
Index: e2fsprogs-1.39/e2fsck/pass2.c
===================================================================
--- e2fsprogs-1.39.orig/e2fsck/pass2.c
+++ e2fsprogs-1.39/e2fsck/pass2.c
@@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc
return 1;
}
- if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) {
+ if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) ||
+ ((dirent->name_len && 0xFF) <= 2 && dirent->name[0] == '.' &&
+ (dirent->name[1] == '.' || dirent->name[1] == '\0'))) {
should_be = EXT2_FT_DIR;
} else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map,
dirent->inode)) {
Hi,
The other issue this brings up is maybe pass1 should be checking whether it is the inode mode that is corrupted (by trying to verify block[0] has "." and ".."in it) instead of truncating off those blocks. This would actually be a redundant check as pass2 would also make the same check.
We can have some more checks like if i_blocks!=0 then the file may not be special file, etc. Does e2fsck need to have more such checks to avoid making decisions just based on the mode?
Thanks,
Kalpak.
On Wed, 2007-02-21 at 14:45 +0530, Kalpak Shah wrote:
> Hi,
>
> If the mode of a directory gets corrupted, check_filetype() makes wrong decisions for all its sub-directories. For example, using debugfs we can corrupt the mode of a directory to 0140755 (i.e. a socket). e2fsck will set the filetype of all its subdirectories as 6 (filetype for socket). All the subdirectories would be moved to lost+found, and in second run of e2fsck their filetype would be set back to 2.
>
> By the time we come to check_filetype(), we have already verified the "." and ".." entries, so we special case these dirents in check_filetype().
>
> Please consider for review.
>
> Signed-off-by: Andreas Dilger <[email protected]>
> Signed-off-by: Kalpak Shah <[email protected]>
>
>
> Index: e2fsprogs-1.39/e2fsck/pass2.c
> ===================================================================
> --- e2fsprogs-1.39.orig/e2fsck/pass2.c
> +++ e2fsprogs-1.39/e2fsck/pass2.c
> @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc
> return 1;
> }
>
> - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) {
> + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) ||
> + ((dirent->name_len && 0xFF) <= 2 && dirent->name[0] == '.' &&
> + (dirent->name[1] == '.' || dirent->name[1] == '\0'))) {
> should_be = EXT2_FT_DIR;
> } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map,
> dirent->inode)) {
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Kalpak Shah wrote:
> Hi,
>
> If the mode of a directory gets corrupted, check_filetype() makes wrong decisions for all its sub-directories. For example, using debugfs we can corrupt the mode of a directory to 0140755 (i.e. a socket). e2fsck will set the filetype of all its subdirectories as 6 (filetype for socket). All the subdirectories would be moved to lost+found, and in second run of e2fsck their filetype would be set back to 2.
>
> By the time we come to check_filetype(), we have already verified the "." and ".." entries, so we special case these dirents in check_filetype().
>
> Please consider for review.
>
> Signed-off-by: Andreas Dilger <[email protected]>
> Signed-off-by: Kalpak Shah <[email protected]>
>
>
> Index: e2fsprogs-1.39/e2fsck/pass2.c
> ===================================================================
> --- e2fsprogs-1.39.orig/e2fsck/pass2.c
> +++ e2fsprogs-1.39/e2fsck/pass2.c
> @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc
> return 1;
> }
>
> - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) {
> + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) ||
> + ((dirent->name_len && 0xFF) <= 2 && dirent->name[0] == '.' &&
>
What is the "&& 0xFF" part for? At the very least, it should probably
be "& 0xFF". It doesn't seem like this mask should be needed at all
though, I think.
Thanx...
ps
> + (dirent->name[1] == '.' || dirent->name[1] == '\0'))) {
> should_be = EXT2_FT_DIR;
> } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map,
> dirent->inode)) {
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Hi Peter,
Thanks for pointing out the typo. In the kernel, struct ext3_dir_entry_2 has a 8-bit name_len field followed by 8-bit field for filetype. Whereas in e2fsck, struct ext2_dir_entry has a 16-bit name_len field, the upper 8 bits of which store the filetype. Hence e2fsck masks the upper 8 bits while using name_len.
Here is the patch with the change.
Signed-off-by: Andreas Dilger <adilger-KYPl3Ael/[email protected]>
Signed-off-by: Kalpak Shah <kalpak-KYPl3Ael/[email protected]>
Index: e2fsprogs-1.39/e2fsck/pass2.c
===================================================================
--- e2fsprogs-1.39.orig/e2fsck/pass2.c
+++ e2fsprogs-1.39/e2fsck/pass2.c
@@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc
return 1;
}
- if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) {
+ if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) ||
+ ((dirent->name_len & 0xFF) <= 2 && dirent->name[0] == '.' &&
+ (dirent->name[1] == '.' || dirent->name[1] == '\0'))) {
should_be = EXT2_FT_DIR;
} else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map,
dirent->inode)) {
Thanks,
Kalpak Shah. <kalpak-KYPl3Ael/[email protected]>
On Wed, 2007-02-21 at 09:49 -0500, Peter Staubach wrote:
> Kalpak Shah wrote:
> > Hi,
> >
> > If the mode of a directory gets corrupted, check_filetype() makes wrong decisions for all its sub-directories. For example, using debugfs we can corrupt the mode of a directory to 0140755 (i.e. a socket). e2fsck will set the filetype of all its subdirectories as 6 (filetype for socket). All the subdirectories would be moved to lost+found, and in second run of e2fsck their filetype would be set back to 2.
> >
> > By the time we come to check_filetype(), we have already verified the "." and ".." entries, so we special case these dirents in check_filetype().
> >
> > Please consider for review.
> >
> > Signed-off-by: Andreas Dilger <adilger-KYPl3Ael/[email protected]>
> > Signed-off-by: Kalpak Shah <kalpak-KYPl3Ael/[email protected]>
> >
> >
> > Index: e2fsprogs-1.39/e2fsck/pass2.c
> > ===================================================================
> > --- e2fsprogs-1.39.orig/e2fsck/pass2.c
> > +++ e2fsprogs-1.39/e2fsck/pass2.c
> > @@ -495,7 +495,9 @@ static _INLINE_ int check_filetype(e2fsc
> > return 1;
> > }
> >
> > - if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode)) {
> > + if (ext2fs_test_inode_bitmap(ctx->inode_dir_map, dirent->inode) ||
> > + ((dirent->name_len && 0xFF) <= 2 && dirent->name[0] == '.' &&
> >
>
> What is the "&& 0xFF" part for? At the very least, it should probably
> be "& 0xFF". It doesn't seem like this mask should be needed at all
> though, I think.
>
> Thanx...
>
> ps
>
> > + (dirent->name[1] == '.' || dirent->name[1] == '\0'))) {
> > should_be = EXT2_FT_DIR;
> > } else if (ext2fs_test_inode_bitmap(ctx->inode_reg_map,
> > dirent->inode)) {
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
On Wed, Feb 21, 2007 at 02:45:59PM +0530, Kalpak Shah wrote:
>
> If the mode of a directory gets corrupted, check_filetype() makes
> wrong decisions for all its sub-directories. For example, using
> debugfs we can corrupt the mode of a directory to 0140755 (i.e. a
> socket). e2fsck will set the filetype of all its subdirectories as 6
> (filetype for socket). All the subdirectories would be moved to
> lost+found, and in second run of e2fsck their filetype would be set
> back to 2.
Um, I'm not seeing this. Using stock e2fsprogs, given the following
test image, I'm not seeing the behavior you describe.
It does require a second e2fsck to fix things, which is bad; we need
to make sure the i_size is fixed in pass 1.
- Ted
e2fsck 1.40-WIP (14-Nov-2006)
Pass 1: Checking inodes, blocks, and sizes
Inode 12, i_blocks is 2, should be 0. Fix? yes
Pass 2: Checking directory structure
Entry 'dir' in / (2) has an incorrect filetype (was 2, should be 6).
Fix? yes
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 2 ref count is 4, should be 3. Fix? yes
Inode 12 ref count is 2, should be 1. Fix? yes
Unattached inode 13
Connect to /lost+found? yes
Inode 13 ref count is 2, should be 1. Fix? yes
Unattached inode 14
Connect to /lost+found? yes
Inode 14 ref count is 2, should be 1. Fix? yes
Unattached inode 15
Connect to /lost+found? yes
Inode 15 ref count is 2, should be 1. Fix? yes
Pass 5: Checking group summary information
Block bitmap differences: -21
Fix? yes
Free blocks count wrong for group #0 (75, counted=76).
Fix? yes
Free blocks count wrong (75, counted=76).
Fix? yes
Directories count wrong for group #0 (3, counted=2).
Fix? yes
foo.img: ***** FILE SYSTEM WAS MODIFIED *****
foo.img: 15/16 files (0.0% non-contiguous), 24/100 blocks
<tytso@candygram> {/usr/projects/e2fsprogs/e2fsprogs/build/e2fsck}
730% e2fsck -fy foo.img
e2fsck 1.40-WIP (14-Nov-2006)
Pass 1: Checking inodes, blocks, and sizes
Special (device/socket/fifo) inode 12 has non-zero size. Fix? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
foo.img: ***** FILE SYSTEM WAS MODIFIED *****
foo.img: 15/16 files (0.0% non-contiguous), 24/100 blocks
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-KYPl3Ael/[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
On Sat, Mar 31, 2007 at 02:16:24AM -0600, Andreas Dilger wrote:
> On Mar 30, 2007 20:44 -0400, Theodore Tso wrote:
> > On Wed, Feb 21, 2007 at 02:45:59PM +0530, Kalpak Shah wrote:
> > > If the mode of a directory gets corrupted, check_filetype() makes
> > > wrong decisions for all its sub-directories. For example, using
> > > debugfs we can corrupt the mode of a directory to 0140755 (i.e. a
> > > socket). e2fsck will set the filetype of all its subdirectories as 6
> > > (filetype for socket). All the subdirectories would be moved to
> > > lost+found, and in second run of e2fsck their filetype would be set
> > > back to 2.
Ah, OK, I misunderstand the original bug report. It wasn't that the
filetype of all of the subdirectories are set incorrectly; it was the
filetype of the .. entry in all of the subdirectories. And my test
case created files, not directories in the corrupted directory.
So the correct fix for this problem isn't to add a special case in
pass2's check_filetype() --- since the directory entry being pointed
to really isn't a directory --- but rather in pass3's
fix_dot_dot_proc(), which gets executed when we reparent the
subdirectories to lost+found:
diff -r 11d3e029aa83 e2fsck/pass3.c
--- a/e2fsck/pass3.c Thu Mar 29 00:32:23 2007 -0400
+++ b/e2fsck/pass3.c Sat Mar 31 07:22:40 2007 -0400
@@ -645,6 +645,7 @@ static int fix_dotdot_proc(struct ext2_d
fix_problem(fp->ctx, PR_3_ADJUST_INODE, &pctx);
}
dirent->inode = fp->parent;
+ dirent->name_len = (dirent->name_len & 0xFF) | EXT2_FT_DIR << 8;
fp->done++;
return DIRENT_ABORT | DIRENT_CHANGED;
The question though is whether we should be doing anything more to
notice that the real problem isn't the file type information is
incorrect, but rather that i_mode in the parent directory is
incorrect. Put another way, we can detect that perhaps i_mode is what
is inconsistent, given the following clues:
* The inode contains data blocks (although we have to be careful
since a block or char device will have a non-zero
i_block[0])
* The inode's i_block[0] is valid, isn't being used by any other
inode, and the contents of the data block pointed to
be i_block[0] appears to be a directory (i.e., has a
. and .. entry where '.' points to itself).
* One or more directory entries point to the inode a filetype
of EXT2_FT_DIR.
On the other hand, this is difficult to do right, because the first
clue has a significant chance of false positives in the case of block
and character devices; the second clue can only be completely verified
after pass 1 is completely finished; and the third clue can only be
verified while executing pass 2.
Furthermore, how often and how likely that we would suffer a
single-bit error in i_mode, where the rest of the inode (especially
i_blocks and i_block) won't be trashed, and the worst case if we get
this wrong is that the files end up in lost+found so we don't lose any
data anyway? (This would be less true in the 64-bit "inode in
directory" design, which is one of the reasons why while I like it a
lot, it is definitely a lot more fragile and will require some very
careful handling.)
And of course, if we're going to handle the case where a directory was
wrongly turned into a socket/device/fifo, we should also handle the
case where a regular file is wrongly turned into a socket/device/fifo,
although this gets more difficult since we can't do a test based on
'.' and '..' in the case of a regular file has its i_mode field corrupted.
What we can do in the directory case is to perform the '.' and '..'
check in pass1, although it would require a fair amount of special
case code, and hope that it isn't a false positive or that it happens
to be pointing at another (valid) directory --- although if '.' points
to itself that would seem unlikely.
dir->name_len = (dir->name_len & 0xFF) | EXT2_FT_DIR << 8;
And we can't really check the case where a directory gets turned into
a regular file without destroying e2fsck's performance, since that
would require reading the first block of every single file, and this
also significantly increases the chance of false positives. So it's a
lot of complexity for what seems to have always been an artificial
test case.
Let me see what I can do, but the controlling consideration is going
to be "first, do no harm", which means (a) it must not harm
performance, and (b) it must never cause e2fsck to stop and ask a
question on a valid filesystem.
- Ted
On Sat, Mar 31, 2007 at 08:35:09AM -0400, Theodore Tso wrote:
> And we can't really check the case where a directory gets turned into
> a regular file without destroying e2fsck's performance, since that
> would require reading the first block of every single file, and this
> also significantly increases the chance of false positives. So it's a
> lot of complexity for what seems to have always been an artificial
> test case.
OK, this is what I have done so far. Not checked in yet, but it looks
right to me. Note that this makes your testcase have a very booring
result. :-)
I'm going to let this one soak for a bit to make sure we don't pick up
any fase positives or negatives in the hueristics.
- Ted
diff -r 11d3e029aa83 e2fsck/message.c
--- a/e2fsck/message.c Thu Mar 29 00:32:23 2007 -0400
+++ b/e2fsck/message.c Sat Mar 31 10:36:55 2007 -0400
@@ -35,6 +35,7 @@
* %Id <inode> -> i_dir_acl
* %Iu <inode> -> i_uid
* %Ig <inode> -> i_gid
+ * %It <inode type>
* %j <ino2> inode number
* %m <com_err error message>
* %N <num>
@@ -310,6 +311,25 @@ static _INLINE_ void expand_inode_expres
printf("%d", (inode->i_gid |
(inode->osd2.linux2.l_i_gid_high << 16)));
break;
+ case 't':
+ if (LINUX_S_ISREG(inode->i_mode))
+ printf(_("regular file"));
+ else if (LINUX_S_ISDIR(inode->i_mode))
+ printf(_("directory"));
+ else if (LINUX_S_ISCHR(inode->i_mode))
+ printf(_("character device"));
+ else if (LINUX_S_ISBLK(inode->i_mode))
+ printf(_("block device"));
+ else if (LINUX_S_ISFIFO(inode->i_mode))
+ printf(_("named pipe"));
+ else if (LINUX_S_ISLNK(inode->i_mode))
+ printf(_("symbolic link"));
+ else if (LINUX_S_ISSOCK(inode->i_mode))
+ printf(_("socket"));
+ else
+ printf(_("unknown file type with mode 0%o"),
+ inode->i_mode);
+ break;
default:
no_inode:
printf("%%I%c", ch);
diff -r 11d3e029aa83 e2fsck/pass1.c
--- a/e2fsck/pass1.c Thu Mar 29 00:32:23 2007 -0400
+++ b/e2fsck/pass1.c Sat Mar 31 10:36:55 2007 -0400
@@ -133,11 +133,10 @@ int e2fsck_pass1_check_device_inode(ext2
int i;
/*
- * If i_blocks is non-zero, or the index flag is set, then
- * this is a bogus device/fifo/socket
+ * If the index flag is set, then this is a bogus
+ * device/fifo/socket
*/
- if ((ext2fs_inode_data_blocks(fs, inode) != 0) ||
- (inode->i_flags & EXT2_INDEX_FL))
+ if (inode->i_flags & EXT2_INDEX_FL)
return 0;
/*
@@ -370,6 +369,73 @@ static void check_inode_extra_space(e2fs
if (*eamagic == EXT2_EXT_ATTR_MAGIC) {
/* it seems inode has an extended attribute(s) in body */
check_ea_in_inode(ctx, pctx);
+ }
+}
+
+/*
+ * Check to see if the inode might really be a directory, despite i_mode
+ *
+ * This is a lot of complexity for something for which I'm not really
+ * convinced happens frequently in the wild. If for any reason this
+ * causes any problems, take this code out.
+ * [tytso:20070331.0827EDT]
+ */
+static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
+ char *buf)
+{
+ struct ext2_inode *inode = pctx->inode;
+ int i, not_device = 0;
+ blk_t blk;
+ struct ext2_dir_entry *dirent;
+
+ if (LINUX_S_ISDIR(inode->i_mode) || LINUX_S_ISREG(inode->i_mode))
+ return;
+
+ for (i=0; i < EXT2_N_BLOCKS; i++) {
+ blk = inode->i_block[i];
+ if (!blk)
+ continue;
+ if (i >= 4)
+ not_device++;
+
+ if (blk < ctx->fs->super->s_first_data_block ||
+ blk >= ctx->fs->super->s_blocks_count ||
+ ext2fs_fast_test_block_bitmap(ctx->block_found_map, blk))
+ return; /* Invalid block, can't be dir */
+ }
+
+ if ((LINUX_S_ISCHR(inode->i_mode) || LINUX_S_ISBLK(inode->i_mode)) &&
+ (inode->i_links_count == 1) && !not_device)
+ return;
+
+ if (LINUX_S_ISLNK(inode->i_mode) && inode->i_links_count == 1)
+ return;
+
+ if (ext2fs_read_dir_block(ctx->fs, inode->i_block[0], buf))
+ return;
+
+ dirent = (struct ext2_dir_entry *) buf;
+ if (((dirent->name_len & 0xFF) != 1) ||
+ (dirent->name[0] != '.') ||
+ (dirent->inode != pctx->ino) ||
+ (dirent->rec_len < 12) ||
+ (dirent->rec_len % 4) ||
+ (dirent->rec_len >= ctx->fs->blocksize - 12))
+ return;
+
+ dirent = (struct ext2_dir_entry *) (buf + dirent->rec_len);
+ if (((dirent->name_len & 0xFF) != 2) ||
+ (dirent->name[0] != '.') ||
+ (dirent->name[1] != '.') ||
+ (dirent->rec_len < 12) ||
+ (dirent->rec_len % 4))
+ return;
+
+ if (fix_problem(ctx, PR_1_TREAT_AS_DIRECTORY, pctx)) {
+ inode->i_mode = (inode->i_mode & 07777) | LINUX_S_IFDIR;
+ e2fsck_write_inode_full(ctx, pctx->ino, inode,
+ EXT2_INODE_SIZE(ctx->fs->super),
+ "check_is_really_dir");
}
}
@@ -770,6 +836,7 @@ void e2fsck_pass1(e2fsck_t ctx)
}
check_inode_extra_space(ctx, &pctx);
+ check_is_really_dir(ctx, &pctx, block_buf);
if (LINUX_S_ISDIR(inode->i_mode)) {
ext2fs_mark_inode_bitmap(ctx->inode_dir_map, ino);
diff -r 11d3e029aa83 e2fsck/pass3.c
--- a/e2fsck/pass3.c Thu Mar 29 00:32:23 2007 -0400
+++ b/e2fsck/pass3.c Sat Mar 31 10:36:55 2007 -0400
@@ -645,6 +645,7 @@ static int fix_dotdot_proc(struct ext2_d
fix_problem(fp->ctx, PR_3_ADJUST_INODE, &pctx);
}
dirent->inode = fp->parent;
+ dirent->name_len = (dirent->name_len & 0xFF) | EXT2_FT_DIR << 8;
fp->done++;
return DIRENT_ABORT | DIRENT_CHANGED;
diff -r 11d3e029aa83 e2fsck/problem.c
--- a/e2fsck/problem.c Thu Mar 29 00:32:23 2007 -0400
+++ b/e2fsck/problem.c Sat Mar 31 10:36:55 2007 -0400
@@ -778,6 +778,11 @@ static struct e2fsck_problem problem_tab
{ PR_1_ATTR_HASH,
N_("@a in @i %i has a hash (%N) which is @n (must be 0)\n"),
PROMPT_CLEAR, PR_PREEN_OK },
+
+ /* inode appears to be a directory */
+ { PR_1_TREAT_AS_DIRECTORY,
+ N_("@i %i is a %It but it looks like it is really a directory\n"),
+ PROMPT_FIX, PR_PREEN_OK },
/* Pass 1b errors */
diff -r 11d3e029aa83 e2fsck/problem.h
--- a/e2fsck/problem.h Thu Mar 29 00:32:23 2007 -0400
+++ b/e2fsck/problem.h Sat Mar 31 10:36:55 2007 -0400
@@ -452,6 +452,9 @@ struct problem_context {
/* wrong EA hash value */
#define PR_1_ATTR_HASH 0x010054
+/* inode appears to be a directory */
+#define PR_1_TREAT_AS_DIRECTORY 0x010055
+
/*
* Pass 1b errors
*/
On Sat, 2007-03-31 at 10:39 -0400, Theodore Tso wrote:
> On Sat, Mar 31, 2007 at 08:35:09AM -0400, Theodore Tso wrote:
> > And we can't really check the case where a directory gets turned into
> > a regular file without destroying e2fsck's performance, since that
> > would require reading the first block of every single file, and this
> > also significantly increases the chance of false positives. So it's a
> > lot of complexity for what seems to have always been an artificial
> > test case.
Yes, you can't check whether each regular file is a directory or not.
The only case which is easy to check is whether the mode of a directory
is corrupted since directories will always have a "." and ".." entry.
Note that we will still lose all data of a regular file if its mode is
changed to lets say, a socket. But there is hardly anything we can do
about this since there is no special check that can be associated with a
regular file.
Since dir_index feature is going to be set by default in ext3/4, maybe
we can check for EXT2_INDEX_FL in check_is_really_dir() and if this flag
is set then check if such a file is a directory or not. So we would not
be reading the first block of each file and yet would be able to correct
most directory mode corruptions.
Thanks,
Kalpak.
On Mar 31, 2007 10:39 -0400, Theodore Tso wrote:
> I'm going to let this one soak for a bit to make sure we don't pick up
> any fase positives or negatives in the hueristics.
>
> @@ -133,11 +133,10 @@ int e2fsck_pass1_check_device_inode(ext2
> + * If the index flag is set, then this is a bogus
> + * device/fifo/socket
> */
> - if ((ext2fs_inode_data_blocks(fs, inode) != 0) ||
> - (inode->i_flags & EXT2_INDEX_FL))
> + if (inode->i_flags & EXT2_INDEX_FL)
> return 0;
There were ancient versions of the kernel that left EXT2_INDEX_FL set
on all files, instead of just directories... I'm not sure if those
were in actual released kernels, or just in patches.
> +static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
> + char *buf)
> +{
> + if (ext2fs_read_dir_block(ctx->fs, inode->i_block[0], buf))
> + return;
Do we call check_blocks() on this inode shortly thereafter? If we do then
the overhead of reading the first block is offset by not reading it again
later. Otherwise, this could slow things down.
> + dirent = (struct ext2_dir_entry *) buf;
> + if (((dirent->name_len & 0xFF) != 1) ||
> + (dirent->name[0] != '.') ||
> + (dirent->inode != pctx->ino) ||
> + (dirent->rec_len < 12) ||
> + (dirent->rec_len % 4) ||
> + (dirent->rec_len >= ctx->fs->blocksize - 12))
> + return;
> +
> + dirent = (struct ext2_dir_entry *) (buf + dirent->rec_len);
> + if (((dirent->name_len & 0xFF) != 2) ||
> + (dirent->name[0] != '.') ||
> + (dirent->name[1] != '.') ||
> + (dirent->rec_len < 12) ||
> + (dirent->rec_len % 4))
> + return;
> +
> + if (fix_problem(ctx, PR_1_TREAT_AS_DIRECTORY, pctx)) {
> + inode->i_mode = (inode->i_mode & 07777) | LINUX_S_IFDIR;
> + e2fsck_write_inode_full(ctx, pctx->ino, inode,
> + EXT2_INODE_SIZE(ctx->fs->super),
> + "check_is_really_dir");
> }
The one worry I have here (though I don't think it is necessarily IN
the code you propose) is that someone could create a regular file which
looks like a directory and somehow get it linked into the filesystem
tree, giving them escalated access (e.g. device files owned by them,
suid executables, links to otherwise unreadable files, etc).
It would seem that this is only a danger if the mode on the file is
corrupted, which shouldn't really be doable by a regular user, but
it is definitely something to consider.
I take it that this code fixes the test image I previously posted?
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
On Tue, Apr 03, 2007 at 11:37:16AM -0600, Andreas Dilger wrote:
> On Mar 31, 2007 10:39 -0400, Theodore Tso wrote:
> > I'm going to let this one soak for a bit to make sure we don't pick up
> > any fase positives or negatives in the hueristics.
> >
> > @@ -133,11 +133,10 @@ int e2fsck_pass1_check_device_inode(ext2
> > + * If the index flag is set, then this is a bogus
> > + * device/fifo/socket
> > */
> > - if ((ext2fs_inode_data_blocks(fs, inode) != 0) ||
> > - (inode->i_flags & EXT2_INDEX_FL))
> > + if (inode->i_flags & EXT2_INDEX_FL)
> > return 0;
>
> There were ancient versions of the kernel that left EXT2_INDEX_FL set
> on all files, instead of just directories... I'm not sure if those
> were in actual released kernels, or just in patches.
Well, we've been running with this in e2fsprogs for quite some time
(August 2002, e2fsprogs 1.28), and no one has complained, so I think
we're safe...
> > +static void check_is_really_dir(e2fsck_t ctx, struct problem_context *pctx,
> > + char *buf)
> > +{
> > + if (ext2fs_read_dir_block(ctx->fs, inode->i_block[0], buf))
> > + return;
>
> Do we call check_blocks() on this inode shortly thereafter? If we do then
> the overhead of reading the first block is offset by not reading it again
> later. Otherwise, this could slow things down.
This is why we only do this on special devices; check_is_really_dir()
doesn't do anything on directory or regular files.
> The one worry I have here (though I don't think it is necessarily IN
> the code you propose) is that someone could create a regular file which
> looks like a directory and somehow get it linked into the filesystem
> tree, giving them escalated access (e.g. device files owned by them,
> suid executables, links to otherwise unreadable files, etc).
I thought of that, but I didn't worry about it because I'm not doing
this check on regular files. Come to think of it I should add a check
so we don't do this for long symlinks, for similar reasons. It would
only matter if the user can force filesystem check, which makes it a
long shot, but we should avoid that issue as well.
> I take it that this code fixes the test image I previously posted?
Yup. Take a look at what I checked in into Mercurial. I added two
separate test cases. One of them directly tests the filetype issue,
and the other tests a mutated directory into a char device case.
- Ted