2007-08-09 21:33:36

by Eric Sandeen

[permalink] [raw]
Subject: [PATCH] dir_index: error out instead of BUG on corrupt hash dir limit

A corrupt ondisk hash dir limit will trip an assert in dx_probe,
which calls BUG(). Instead, we can just issue the warning and
fail dx_probe like the other 3 tests just before it. Thanks
to aviro for suggesting this... Tested with a hand-crafted
corrupt ext3 image, issues:

EXT3-fs warning (device loop0): dx_probe: Corrupt limit in dir inode 14337

vs. previous:

Assertion failure in dx_probe() at fs/ext3/namei.c:383: "dx_get_limit(entries) == dx_root_limit(dir, root->info.info_length)"
------------[ cut here ]------------
kernel BUG at fs/ext3/namei.c:383!
...


Signed-off-by: Eric Sandeen <[email protected]>

Index: linux-2.6.22-rc4/fs/ext3/namei.c
===================================================================
--- linux-2.6.22-rc4.orig/fs/ext3/namei.c
+++ linux-2.6.22-rc4/fs/ext3/namei.c
@@ -379,8 +379,16 @@ dx_probe(struct dentry *dentry, struct i

entries = (struct dx_entry *) (((char *)&root->info) +
root->info.info_length);
- assert(dx_get_limit(entries) == dx_root_limit(dir,
- root->info.info_length));
+
+ if (dx_get_limit(entries) != dx_root_limit(dir,
+ root->info.info_length)) {
+ ext3_warning(dir->i_sb, __FUNCTION__,
+ "Corrupt limit in dir inode %ld\n", dir->i_ino);
+ brelse(bh);
+ *err = ERR_BAD_DX_DIR;
+ goto fail;
+ }
+
dxtrace (printk("Look up %x", hash));
while (1)
{
Index: linux-2.6.22-rc4/fs/ext4/namei.c
===================================================================
--- linux-2.6.22-rc4.orig/fs/ext4/namei.c
+++ linux-2.6.22-rc4/fs/ext4/namei.c
@@ -379,8 +379,16 @@ dx_probe(struct dentry *dentry, struct i

entries = (struct dx_entry *) (((char *)&root->info) +
root->info.info_length);
- assert(dx_get_limit(entries) == dx_root_limit(dir,
- root->info.info_length));
+
+ if (dx_get_limit(entries) != dx_root_limit(dir,
+ root->info.info_length)) {
+ ext4_warning(dir->i_sb, __FUNCTION__,
+ "Corrupt limit in dir inode %ld\n", dir->i_ino);
+ brelse(bh);
+ *err = ERR_BAD_DX_DIR;
+ goto fail;
+ }
+
dxtrace (printk("Look up %x", hash));
while (1)
{


2007-08-09 22:41:42

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] dir_index: error out instead of BUG on corrupt hash dir limit

Eric Sandeen wrote:
> A corrupt ondisk hash dir limit will trip an assert in dx_probe,
> which calls BUG(). Instead, we can just issue the warning and
> fail dx_probe like the other 3 tests just before it. Thanks
> to aviro for suggesting this...

BTW this type of corruption is apparently quite easily generated by
using the binary windows-only ext driver at http://www.fs-driver.org....

-Eric

2007-08-10 08:26:22

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] dir_index: error out instead of BUG on corrupt hash dir limit

On Aug 09, 2007 16:33 -0500, Eric Sandeen wrote:
> A corrupt ondisk hash dir limit will trip an assert in dx_probe,
> which calls BUG(). Instead, we can just issue the warning and
> fail dx_probe like the other 3 tests just before it. Thanks
> to aviro for suggesting this... Tested with a hand-crafted
> corrupt ext3 image, issues:
>
> EXT3-fs warning (device loop0): dx_probe: Corrupt limit in dir inode 14337
>
> vs. previous:
>
> Assertion failure in dx_probe() at fs/ext3/namei.c:383: "dx_get_limit(entries) == dx_root_limit(dir, root->info.info_length)"
> ------------[ cut here ]------------
> kernel BUG at fs/ext3/namei.c:383!

This has my blessing, you can add:

Acked-by: Andreas Dilger <[email protected]>

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-08-10 08:29:02

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] dir_index: error out instead of BUG on corrupt hash dir limit

On Aug 09, 2007 17:41 -0500, Eric Sandeen wrote:
> Eric Sandeen wrote:
> > A corrupt ondisk hash dir limit will trip an assert in dx_probe,
> > which calls BUG(). Instead, we can just issue the warning and
> > fail dx_probe like the other 3 tests just before it. Thanks
> > to aviro for suggesting this...
>
> BTW this type of corruption is apparently quite easily generated by
> using the binary windows-only ext driver at http://www.fs-driver.org....

I'd like to see the actual corruption, to find out why the hash-type
check didn't find it. If it is because LDISKFS_DX_HASH_LEGACY hash
type is zero, I think we can disable that hash type, and people will
just have to run "e2fsck -fD" to reindex to a new type. This hasn't
been on for a long, long time.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-08-10 16:08:05

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] dir_index: error out instead of BUG on corrupt hash dir limit

Andreas Dilger wrote:
> On Aug 09, 2007 17:41 -0500, Eric Sandeen wrote:
>> Eric Sandeen wrote:
>>> A corrupt ondisk hash dir limit will trip an assert in dx_probe,
>>> which calls BUG(). Instead, we can just issue the warning and
>>> fail dx_probe like the other 3 tests just before it. Thanks
>>> to aviro for suggesting this...
>> BTW this type of corruption is apparently quite easily generated by
>> using the binary windows-only ext driver at http://www.fs-driver.org....
>
> I'd like to see the actual corruption, to find out why the hash-type
> check didn't find it. If it is because LDISKFS_DX_HASH_LEGACY hash
> type is zero, I think we can disable that hash type, and people will
> just have to run "e2fsck -fD" to reindex to a new type. This hasn't
> been on for a long, long time.

So far I haven't been able to find it in any of the images provided.

We may have to dual-boot windows & run the crummy driver for a while to
track it down, if we care enough.

Which reminds me, what do you think of the wording in the ext3_warning I
added - is "corruption" appropriate? The other warnings aren't quite so
stark... hmm maybe we should add "have you been running a binary-only
driver for windows?" :)

-Eric

2007-08-10 18:19:00

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] dir_index: error out instead of BUG on corrupt hash dir limit

On Aug 10, 2007 11:08 -0500, Eric Sandeen wrote:
> Andreas Dilger wrote:
> > I'd like to see the actual corruption, to find out why the hash-type
> > check didn't find it. If it is because LDISKFS_DX_HASH_LEGACY hash
> > type is zero, I think we can disable that hash type, and people will
> > just have to run "e2fsck -fD" to reindex to a new type. This hasn't
> > been on for a long, long time.
>
> So far I haven't been able to find it in any of the images provided.
>
> We may have to dual-boot windows & run the crummy driver for a while to
> track it down, if we care enough.
>
> Which reminds me, what do you think of the wording in the ext3_warning I
> added - is "corruption" appropriate? The other warnings aren't quite so
> stark... hmm maybe we should add "have you been running a binary-only
> driver for windows?" :)

It would be interesting to check if mounting a dir_index filesystem on
linux with ext2 has the same problem. It _should_ have been that
if rec_len % 4 == 0 (i.e. any valid dirent) we would fail the hash_version
check, but we left in the DX_HASH_LEGACY (0) and that check is blown.
The unused_flags & 1 is only hit for a dirent with DT_FIFO (no good).
The remaining check is indirect_levels > 1, which should be hit for
any dirent with name_len > 1 (i.e. most, but not all).

So, I think you could reproduce this in linux by making an indexed directory
in ext3/4, mounting it as ext2, and then creating a 1-character filename
in the directory, or any length filename and then deleting it.

In addition to your extra check, I think we should remove DX_HASH_LEGACY
check, to catch this more easily. If (hash_version % 4 == 0) the warning
shouldn't even be printed.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-08-10 18:39:52

by Eric Sandeen

[permalink] [raw]
Subject: Re: [PATCH] dir_index: error out instead of BUG on corrupt hash dir limit

Andreas Dilger wrote:
> On Aug 10, 2007 11:08 -0500, Eric Sandeen wrote:
>> Andreas Dilger wrote:
>>> I'd like to see the actual corruption, to find out why the hash-type
>>> check didn't find it. If it is because LDISKFS_DX_HASH_LEGACY hash
>>> type is zero, I think we can disable that hash type, and people will
>>> just have to run "e2fsck -fD" to reindex to a new type. This hasn't
>>> been on for a long, long time.
>> So far I haven't been able to find it in any of the images provided.
>>
>> We may have to dual-boot windows & run the crummy driver for a while to
>> track it down, if we care enough.
>>
>> Which reminds me, what do you think of the wording in the ext3_warning I
>> added - is "corruption" appropriate? The other warnings aren't quite so
>> stark... hmm maybe we should add "have you been running a binary-only
>> driver for windows?" :)
>
> It would be interesting to check if mounting a dir_index filesystem on
> linux with ext2 has the same problem. It _should_ have been that
> if rec_len % 4 == 0 (i.e. any valid dirent) we would fail the hash_version
> check, but we left in the DX_HASH_LEGACY (0) and that check is blown.

Hmmm...

> The unused_flags & 1 is only hit for a dirent with DT_FIFO (no good).
> The remaining check is indirect_levels > 1, which should be hit for
> any dirent with name_len > 1 (i.e. most, but not all).
>
> So, I think you could reproduce this in linux by making an indexed directory
> in ext3/4, mounting it as ext2, and then creating a 1-character filename
> in the directory, or any length filename and then deleting it.

With those quick tests I don't see any problems... you may be giving the
windows driver too much credit here. :)

-Eric

> In addition to your extra check, I think we should remove DX_HASH_LEGACY
> check, to catch this more easily. If (hash_version % 4 == 0) the warning
> shouldn't even be printed.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>

2007-08-10 18:50:58

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH] dir_index: error out instead of BUG on corrupt hash dir limit

On Aug 10, 2007 13:39 -0500, Eric Sandeen wrote:
> Andreas Dilger wrote:
> > It would be interesting to check if mounting a dir_index filesystem on
> > linux with ext2 has the same problem. It _should_ have been that
> > if rec_len % 4 == 0 (i.e. any valid dirent) we would fail the hash_version
> > check, but we left in the DX_HASH_LEGACY (0) and that check is blown.
>
> Hmmm...
>
> > The unused_flags & 1 is only hit for a dirent with DT_FIFO (no good).
> > The remaining check is indirect_levels > 1, which should be hit for
> > any dirent with name_len > 1 (i.e. most, but not all).
> >
> > So, I think you could reproduce this in linux by making an indexed directory
> > in ext3/4, mounting it as ext2, and then creating a 1-character filename
> > in the directory, or any length filename and then deleting it.
>
> With those quick tests I don't see any problems... you may be giving the
> windows driver too much credit here. :)

Ah, true - I forgot about the "EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;"
line that has existed since time immemorial in ext2. Isn't foresight
a wonderful thing.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.