LinuxLists.cc - Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free

2008-08-07 20:33:45

Subject: Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks

On Thu, Aug 07, 2008 at 10:52:51AM -0700, Andrew Morton wrote:
> Yes, please do test 2.6.26.

Did that. I can reproduce the same crash on 2.6.26 and 2.6.26.2.

Sami

2008-08-07 20:32:11

by Sami Liedes

[permalink] [raw]

Subject: Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks

On Thu, Aug 07, 2008 at 11:07:17PM +0300, Sami Liedes wrote:
> On Thu, Aug 07, 2008 at 10:52:51AM -0700, Andrew Morton wrote:
> > Yes, please do test 2.6.26.
>
> Did that. I can reproduce the same crash on 2.6.26 and 2.6.26.2.

2.6.25.15 crashes too, so I might have been wrong about 2.6.25.4
working (unless something changed between those two versions).

Sami

2008-08-18 14:58:42

by Jan Kara

[permalink] [raw]

Subject: Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks

> On Thu, Aug 07, 2008 at 11:07:17PM +0300, Sami Liedes wrote:
> > On Thu, Aug 07, 2008 at 10:52:51AM -0700, Andrew Morton wrote:
> > > Yes, please do test 2.6.26.
> >
> > Did that. I can reproduce the same crash on 2.6.26 and 2.6.26.2.
>
> 2.6.25.15 crashes too, so I might have been wrong about 2.6.25.4
> working (unless something changed between those two versions).
I think this is the same problem Vegard reported in
http://marc.info/?l=linux-ext4&m=121637999611618&w=2.
The problem seems to be in ext2_valid_block_bitmap() which does

bitmap_blk = le32_to_cpu(desc->bg_block_bitmap);
offset = bitmap_blk - group_first_block;
if (!ext2_test_bit(offset, bh->b_data))

(and similarly for inode bitmap). Now when the group descriptor is
corrupted, this simply accesses beyond the bh->b_data...
The patch below should hopefully fix the issue. Can you test it
please?

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs
---

>From 06953717138efe3ad535e78343beb7204ac0d274 Mon Sep 17 00:00:00 2001
From: Jan Kara <[email protected]>
Date: Mon, 18 Aug 2008 16:45:11 +0200
Subject: [PATCH] ext2: Check for corrupted group descriptor before using data in it

We have to check whether a group descriptor isn't corrupted in
read_block_bitmap(). Otherwise ext2_valid_block_bitmap() will try
to access bits outside of bitmap and Oops happens.

CC: Vegard Nossum <[email protected]>
CC: Sami Liedes <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
---
fs/ext2/balloc.c | 29 +++++++++++++++++++++++++++++
1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c
index 10bb02c..9104712 100644
--- a/fs/ext2/balloc.c
+++ b/fs/ext2/balloc.c
@@ -113,6 +113,17 @@ err_out:
return 0;
}

+static int ext2_block_in_group(struct super_block *sb,
+ unsigned int block_group, ext2_fsblk_t block)
+{
+ if (block < ext2_group_first_block_no(sb, block_group))
+ return 0;
+ if (block >= ext2_group_first_block_no(sb, block_group) +
+ EXT2_BLOCKS_PER_GROUP(sb))
+ return 0;
+ return 1;
+}
+
/*
* Read the bitmap for a given block_group,and validate the
* bits for block/inode/inode tables are set in the bitmaps
@@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group)
desc = ext2_get_group_desc(sb, block_group, NULL);
if (!desc)
return NULL;
+ if (!ext2_block_in_group(sb, block_group,
+ le32_to_cpu(desc->bg_block_bitmap)) ||
+ !ext2_block_in_group(sb, block_group,
+ le32_to_cpu(desc->bg_inode_bitmap)) ||
+ !ext2_block_in_group(sb, block_group,
+ le32_to_cpu(desc->bg_inode_table)) ||
+ !ext2_block_in_group(sb, block_group,
+ le32_to_cpu(desc->bg_inode_table) +
+ EXT2_SB(sb)->s_itb_per_group - 1)) {
+ ext2_error(sb, __func__, "Corrupted group descriptor - "
+ "block_group = %u, block_bitmap = %u, "
+ "inode_bitmap = %u, inode_table = %u",
+ block_group,
+ le32_to_cpu(desc->bg_block_bitmap),
+ le32_to_cpu(desc->bg_inode_bitmap),
+ le32_to_cpu(desc->bg_inode_table));
+ return NULL;
+ }
bitmap_blk = le32_to_cpu(desc->bg_block_bitmap);
bh = sb_getblk(sb, bitmap_blk);
if (unlikely(!bh)) {
--
1.5.2.4

2008-08-18 16:51:46

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks

On Mon, Aug 18, 2008 at 04:58:41PM +0200, Jan Kara wrote:
>
> From 06953717138efe3ad535e78343beb7204ac0d274 Mon Sep 17 00:00:00 2001
> From: Jan Kara <[email protected]>
> Date: Mon, 18 Aug 2008 16:45:11 +0200
> Subject: [PATCH] ext2: Check for corrupted group descriptor before using data in it
>
> We have to check whether a group descriptor isn't corrupted in
> read_block_bitmap(). Otherwise ext2_valid_block_bitmap() will try
> to access bits outside of bitmap and Oops happens.
>
> CC: Vegard Nossum <[email protected]>
> CC: Sami Liedes <[email protected]>
> Signed-off-by: Jan Kara <[email protected]>
> ---
> fs/ext2/balloc.c | 29 +++++++++++++++++++++++++++++
> 1 files changed, 29 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c
> index 10bb02c..9104712 100644
> --- a/fs/ext2/balloc.c
> +++ b/fs/ext2/balloc.c
> @@ -113,6 +113,17 @@ err_out:
> return 0;
> }
>
> +static int ext2_block_in_group(struct super_block *sb,
> + unsigned int block_group, ext2_fsblk_t block)
> +{
> + if (block < ext2_group_first_block_no(sb, block_group))
> + return 0;
> + if (block >= ext2_group_first_block_no(sb, block_group) +
> + EXT2_BLOCKS_PER_GROUP(sb))
> + return 0;
> + return 1;
> +}
> +
> /*
> * Read the bitmap for a given block_group,and validate the
> * bits for block/inode/inode tables are set in the bitmaps
> @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group)
> desc = ext2_get_group_desc(sb, block_group, NULL);
> if (!desc)
> return NULL;
> + if (!ext2_block_in_group(sb, block_group,
> + le32_to_cpu(desc->bg_block_bitmap)) ||
> + !ext2_block_in_group(sb, block_group,
> + le32_to_cpu(desc->bg_inode_bitmap)) ||
> + !ext2_block_in_group(sb, block_group,
> + le32_to_cpu(desc->bg_inode_table)) ||
> + !ext2_block_in_group(sb, block_group,
> + le32_to_cpu(desc->bg_inode_table) +
> + EXT2_SB(sb)->s_itb_per_group - 1)) {
> + ext2_error(sb, __func__, "Corrupted group descriptor - "
> + "block_group = %u, block_bitmap = %u, "
> + "inode_bitmap = %u, inode_table = %u",
> + block_group,
> + le32_to_cpu(desc->bg_block_bitmap),
> + le32_to_cpu(desc->bg_inode_bitmap),
> + le32_to_cpu(desc->bg_inode_table));
> + return NULL;
> + }
> bitmap_blk = le32_to_cpu(desc->bg_block_bitmap);
> bh = sb_getblk(sb, bitmap_blk);
> if (unlikely(!bh)) {

Do we need to do this validation every time we do a read_block_bitmap ?
I guess we need to move the validation where we read the desc blocks
from the disk.

-aneesh

2008-08-19 03:24:20

by Andreas Dilger

[permalink] [raw]

Subject: Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks

On Aug 18, 2008 22:21 +0530, Aneesh Kumar wrote:
> > +static int ext2_block_in_group(struct super_block *sb,
> > + unsigned int block_group, ext2_fsblk_t block)
> > +{
> > + if (block < ext2_group_first_block_no(sb, block_group))
> > + return 0;
> > + if (block >= ext2_group_first_block_no(sb, block_group) +
> > + EXT2_BLOCKS_PER_GROUP(sb))
> > + return 0;
> > + return 1;
> > +}
> > +
> > /*
> > * Read the bitmap for a given block_group,and validate the
> > * bits for block/inode/inode tables are set in the bitmaps
> > @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group)
> > desc = ext2_get_group_desc(sb, block_group, NULL);
> > if (!desc)
> > return NULL;
> > + if (!ext2_block_in_group(sb, block_group,
> > + le32_to_cpu(desc->bg_block_bitmap)) ||
> > + !ext2_block_in_group(sb, block_group,
> > + le32_to_cpu(desc->bg_inode_bitmap)) ||
> > + !ext2_block_in_group(sb, block_group,
> > + le32_to_cpu(desc->bg_inode_table)) ||
> > + !ext2_block_in_group(sb, block_group,
> > + le32_to_cpu(desc->bg_inode_table) +
> > + EXT2_SB(sb)->s_itb_per_group - 1)) {

Isn't equivalent checking done in ext2_check_descriptors()? It would make
sense to abstract out the "check one group and return error" code and use
it in both places.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2008-08-19 09:13:41

by Jan Kara

[permalink] [raw]

Subject: Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks

On Mon 18-08-08 21:24:10, Andreas Dilger wrote:
> On Aug 18, 2008 22:21 +0530, Aneesh Kumar wrote:
> > > +static int ext2_block_in_group(struct super_block *sb,
> > > + unsigned int block_group, ext2_fsblk_t block)
> > > +{
> > > + if (block < ext2_group_first_block_no(sb, block_group))
> > > + return 0;
> > > + if (block >= ext2_group_first_block_no(sb, block_group) +
> > > + EXT2_BLOCKS_PER_GROUP(sb))
> > > + return 0;
> > > + return 1;
> > > +}
> > > +
> > > /*
> > > * Read the bitmap for a given block_group,and validate the
> > > * bits for block/inode/inode tables are set in the bitmaps
> > > @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group)
> > > desc = ext2_get_group_desc(sb, block_group, NULL);
> > > if (!desc)
> > > return NULL;
> > > + if (!ext2_block_in_group(sb, block_group,
> > > + le32_to_cpu(desc->bg_block_bitmap)) ||
> > > + !ext2_block_in_group(sb, block_group,
> > > + le32_to_cpu(desc->bg_inode_bitmap)) ||
> > > + !ext2_block_in_group(sb, block_group,
> > > + le32_to_cpu(desc->bg_inode_table)) ||
> > > + !ext2_block_in_group(sb, block_group,
> > > + le32_to_cpu(desc->bg_inode_table) +
> > > + EXT2_SB(sb)->s_itb_per_group - 1)) {
>
> Isn't equivalent checking done in ext2_check_descriptors()? It would make
> sense to abstract out the "check one group and return error" code and use
> it in both places.
Actually yes, it is. Good point. Sami, is it the case that you have
mounted the filesystem, then intentionally corrupted it and after that
the kernel oopsed (as opposed to first corrupting the filesystem image and
mounting it after that)? That would explain how corrupted values could get
to read_block_bitmap() even though ext2_check_descriptors() checked them.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2008-08-19 11:03:15

by Sami Liedes

[permalink] [raw]

Subject: Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks

On Tue, Aug 19, 2008 at 11:13:39AM +0200, Jan Kara wrote:
> > Isn't equivalent checking done in ext2_check_descriptors()? It would make
> > sense to abstract out the "check one group and return error" code and use
> > it in both places.
> Actually yes, it is. Good point. Sami, is it the case that you have
> mounted the filesystem, then intentionally corrupted it and after that
> the kernel oopsed (as opposed to first corrupting the filesystem image and
> mounting it after that)? That would explain how corrupted values could get
> to read_block_bitmap() even though ext2_check_descriptors() checked them.

No, that's not what I do. I corrupt the fs before mounting it, then
mount it, perform normal filesystem operations on it and unmount it.

Here's the most current script I use (zzuf is the fuzzer):

------------------------------------------------------------
#!/bin/sh

if [ "`hostname`" != "fstest" ]; then
echo "This is a dangerous script."
echo "Set your hostname to \`fstest\' if you want to use it."
exit 1
fi

umount /dev/hdb
umount /dev/hdc
/etc/init.d/sysklogd stop
/etc/init.d/klogd stop
/etc/init.d/cron stop
mount /dev/hda / -t ext3 -o remount,ro || exit 1

#ulimit -t 20

for ((s=$1; s<1000000000; s++)); do
umount /mnt
echo '***** zzuffing *****' seed $s
zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit
mount /dev/hdb /mnt -t ext2 -o errors=continue || continue
cd /mnt || continue
timeout 30 cp -r doc doc2 >&/dev/null
timeout 30 find -xdev >&/dev/null
timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
timeout 30 mkdir tmp >&/dev/null
timeout 30 echo whoah >tmp/filu 2>/dev/null
timeout 30 rm -rf /mnt/* >&/dev/null
cd /
done
------------------------------------------------------------

Sami

2008-08-19 21:50:35

by Sami Liedes

[permalink] [raw]

Subject: Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks

On Mon, Aug 18, 2008 at 04:58:41PM +0200, Jan Kara wrote:
> From 06953717138efe3ad535e78343beb7204ac0d274 Mon Sep 17 00:00:00 2001
> From: Jan Kara <[email protected]>
> Date: Mon, 18 Aug 2008 16:45:11 +0200
> Subject: [PATCH] ext2: Check for corrupted group descriptor before using data in it
>
> We have to check whether a group descriptor isn't corrupted in
> read_block_bitmap(). Otherwise ext2_valid_block_bitmap() will try
> to access bits outside of bitmap and Oops happens.

I think something similar is needed for ext3, or at least the
backtrace looks similar to me (tell me if you want me to file a
separate bug for it):

------------------------------------------------------------
[ 1303.485714] EXT3-fs unexpected failure: !jh->b_committed_data;
[ 1303.485714] inconsistent data on disk
[ 1303.485714] BUG: unable to handle kernel paging request at c7edfffc
[ 1303.485714] IP: [<c02ddca9>] read_block_bitmap+0xa3/0x147
[ 1303.485714] *pde = 00007067 *pte = 07edf160
[ 1303.485714] Oops: 0000 [#1] DEBUG_PAGEALLOC
[ 1303.485714]
[ 1303.485714] Pid: 17001, comm: rm Not tainted (2.6.27-rc3 #2)
[ 1303.485714] EIP: 0060:[<c02ddca9>] EFLAGS: 00000246 CPU: 0
[ 1303.485714] EIP is at read_block_bitmap+0xa3/0x147
[ 1303.485714] EAX: ffffffff EBX: c7ee0800 ECX: c7ee0000 EDX: 00000001
[ 1303.485714] ESI: c3c40690 EDI: c7abd000 EBP: c79c4c9c ESP: c79c4c6c
[ 1303.485714] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
[ 1303.485714] Process rm (pid: 17001, ti=c79c4000 task=c79189a0 task.ti=c79c4000)
[ 1303.485714] Stack: 00000246 00000001 00000246 c7abda3c c7413aa0 c5d7f800 00000000 00000000
[ 1303.485714] c7ee0000 00000000 00000000 c3c25064 c79c4cf4 c02dde1f c3c405b0 c79c4ccc
[ 1303.485714] c0317987 00000001 c0314a9b 00000029 0000002a c7abd000 c7440000 c5d7f8ac
[ 1303.485714] Call Trace:
[ 1303.485714] [<c02dde1f>] ? ext3_free_blocks_sb+0x93/0x3d6
[ 1303.485714] [<c0317987>] ? journal_revoke+0x81/0xe3
[ 1303.485714] [<c0314a9b>] ? do_get_write_access+0x381/0x49c
[ 1303.485714] [<c02ed428>] ? __ext3_journal_revoke+0x1e/0x44
[ 1303.485714] [<c02de18d>] ? ext3_free_blocks+0x2b/0x7f
[ 1303.485714] [<c02e3694>] ? ext3_clear_blocks+0x11f/0x141
[ 1303.485714] [<c02e377a>] ? ext3_free_data+0xc4/0x133
[ 1303.485714] [<c02e3a0e>] ? ext3_free_branches+0x225/0x22d
[ 1303.485714] [<c02e3891>] ? ext3_free_branches+0xa8/0x22d
[ 1303.485714] [<c02e3891>] ? ext3_free_branches+0xa8/0x22d
[ 1303.485714] [<c02e407d>] ? ext3_truncate+0x667/0x8af
[ 1303.485714] [<c03153e2>] ? journal_start+0xb2/0x112
[ 1303.485714] [<c031540d>] ? journal_start+0xdd/0x112
[ 1303.485714] [<c03153e2>] ? journal_start+0xb2/0x112
[ 1303.485714] [<c02eb243>] ? ext3_journal_start_sb+0x29/0x4a
[ 1303.485714] [<c02e4389>] ? ext3_delete_inode+0xc4/0xdb
[ 1303.485714] [<c02e42c5>] ? ext3_delete_inode+0x0/0xdb
[ 1303.485714] [<c0276c2b>] ? generic_delete_inode+0x62/0xd5
[ 1303.485714] [<c0276db1>] ? generic_drop_inode+0x113/0x162
[ 1303.485714] [<c0275d3c>] ? iput+0x47/0x4e
[ 1303.485714] [<c02737a7>] ? dentry_iput+0x6b/0xb1
[ 1303.485714] [<c0273859>] ? d_kill+0x1d/0x37
[ 1303.485714] [<c027519b>] ? dput+0x58/0x10a
[ 1303.485714] [<c026d2a4>] ? do_rmdir+0xa4/0xc3
[ 1303.485714] [<c026d2f4>] ? sys_unlinkat+0x31/0x36
[ 1303.485714] [<c0202f3e>] ? syscall_call+0x7/0xb
[ 1303.485714] =======================
[ 1303.485714] Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10 8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 0
1 19 c0 85 c0 74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19
[ 1303.485714] EIP: [<c02ddca9>] read_block_bitmap+0xa3/0x147 SS:ESP 0068:c79c4c6c
[ 1303.485714] ---[ end trace ba199677255b7e73 ]---
------------------------------------------------------------
$ addr2line -e vmlinux -i 0xc02ddca9
include/asm/bitops.h:305
fs/ext3/balloc.c:98
fs/ext3/balloc.c:167

98 if (!ext3_test_bit(offset, bh->b_data))
99 /* bad block bitmap */
100 goto err_out;
------------------------------------------------------------

Sami

2008-08-20 10:25:34

by Jan Kara

[permalink] [raw]

Subject: Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks

> On Tue, Aug 19, 2008 at 11:13:39AM +0200, Jan Kara wrote:
> > > Isn't equivalent checking done in ext2_check_descriptors()? It would make
> > > sense to abstract out the "check one group and return error" code and use
> > > it in both places.
> > Actually yes, it is. Good point. Sami, is it the case that you have
> > mounted the filesystem, then intentionally corrupted it and after that
> > the kernel oopsed (as opposed to first corrupting the filesystem image and
> > mounting it after that)? That would explain how corrupted values could get
> > to read_block_bitmap() even though ext2_check_descriptors() checked them.
>
> No, that's not what I do. I corrupt the fs before mounting it, then
> mount it, perform normal filesystem operations on it and unmount it.
OK, thanks. Then we must somehow corrupt group descriptor block during
the operation. Because I'm pretty sure it *is* corrupted - the oops
is: unable to handle kernel paging request at c7e95ffc. If we look into
registers, we see ECX has c7e96000 (which is probably bh->b_data). In
the second oops it's exactly the same - ECX has c11e4000, the oops is at
address c11e3ffc. So in both cases it is ECX-4. So somehow we managed to
pass negative offset into ext2_test_bit(). But as Andreas pointed out,
when we load descriptors into memory, we check that both bitmaps and
inode table is in ext2_check_descriptors()... The other possibility
would be that we managed to corrupts s_first_data_block in the
superblock. Anyway, both possibilities don't look very likely. I'll try
to reproduce the problem and maybe get more insight... How large is your
filesystem BTW?

> Here's the most current script I use (zzuf is the fuzzer):
>
> ------------------------------------------------------------
> #!/bin/sh
>
> if [ "`hostname`" != "fstest" ]; then
> echo "This is a dangerous script."
> echo "Set your hostname to \`fstest\' if you want to use it."
> exit 1
> fi
>
> umount /dev/hdb
> umount /dev/hdc
> /etc/init.d/sysklogd stop
> /etc/init.d/klogd stop
> /etc/init.d/cron stop
> mount /dev/hda / -t ext3 -o remount,ro || exit 1
>
> #ulimit -t 20
>
> for ((s=$1; s<1000000000; s++)); do
> umount /mnt
> echo '***** zzuffing *****' seed $s
> zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit
> mount /dev/hdb /mnt -t ext2 -o errors=continue || continue
> cd /mnt || continue
> timeout 30 cp -r doc doc2 >&/dev/null
> timeout 30 find -xdev >&/dev/null
> timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
> timeout 30 mkdir tmp >&/dev/null
> timeout 30 echo whoah >tmp/filu 2>/dev/null
> timeout 30 rm -rf /mnt/* >&/dev/null
> cd /
> done
> ------------------------------------------------------------

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2008-08-20 13:43:28

by Sami Liedes

[permalink] [raw]

Subject: Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks

On Wed, Aug 20, 2008 at 12:25:33PM +0200, Jan Kara wrote:
> OK, thanks. Then we must somehow corrupt group descriptor block during
> the operation. Because I'm pretty sure it *is* corrupted - the oops
> is: unable to handle kernel paging request at c7e95ffc. If we look into
> registers, we see ECX has c7e96000 (which is probably bh->b_data). In
> the second oops it's exactly the same - ECX has c11e4000, the oops is at
> address c11e3ffc. So in both cases it is ECX-4. So somehow we managed to
> pass negative offset into ext2_test_bit(). But as Andreas pointed out,
> when we load descriptors into memory, we check that both bitmaps and
> inode table is in ext2_check_descriptors()... The other possibility
> would be that we managed to corrupts s_first_data_block in the
> superblock. Anyway, both possibilities don't look very likely. I'll try
> to reproduce the problem and maybe get more insight... How large is your
> filesystem BTW?

My FS is 10 MiB and tries to be diverse in its contents. It has a copy
of my /dev and a small partial copy of /usr/share/doc.

I put the pristine (non-corrupted) filesystem at

http://www.hut.fi/~sliedes/fsdebug-hdc-ext2.bz2

(520k compressed).

I've been thinking I should write a script to prepare the root
filesystem for the tests, but haven't got that far yet. Basically
(unless I forget some step) I use debootstrap to bootstrap a minimal
Debian system, create some needed devices in it (hd[abc], ttyS0 at
least), set the hostname to fstest, configure getty to listen to
ttyS0, copy the script to /root/runtest (the script's first parameter
is the seed) and install some Debian packages (zzuf and timeout at
least).

Then I make four copies of the images and run four qemus in parallel
since I have four cpus, modifying the first parameter (initial seed)
of the runtest script, e.g. 0, 10M, 20M, 30M.

I guess the approach might be useful for those who write the code too
(or people closer to them than me), since I've already found a fair
number of bugs with it in a fairly short period of time (#10871,
#10882, #10976, #11250, #11253, #11266 for ext[23] bugs, also one ext4
bug I hit when an ext3 fs was detected as ext4; search bugzilla for my
email to see the rest of the bugs).

The current root filesystem is 144M compressed (yeah, there's a lot of
stuff irrelevant to the tests there), I could upload it somewhere if
that helps. After that running the tests is a matter of running
something like

qemu -kernel bzImage -append 'root=/dev/hda console=ttyS0,115200n8' \
-hda hda -hdb hdb -hdc hdc -nographic -serial pty

, attaching a screen session to the allocated pty, logging in as root
and running ./runtest $seed.

Also the tests are not as comprehensive as I'd like. As an example,
some years ago I stress tested reiser4 (it was already "ready") with
pretty mundane operations (without corrupting the fs) and it worked,
but I've got it to break badly at three separate times in separate
ways just by normally using Debian's aptitude - the breakage was in
flock(), and the current tests don't test flock()). Other things to
test would be at least hard links and fifos...

The level of automation isn't quite what I'd like either, optimally
there would just be a single script that takes the kernel image,
filesystem type and number of parallel instances as arguments and runs
the tests.

Sami

2008-08-20 19:07:38

by Andreas Dilger

[permalink] [raw]

Subject: Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks

On Aug 20, 2008 12:25 +0200, Jan Kara wrote:
> > On Tue, Aug 19, 2008 at 11:13:39AM +0200, Jan Kara wrote:
> > > > Isn't equivalent checking done in ext2_check_descriptors()? It would make
> > > > sense to abstract out the "check one group and return error" code and use
> > > > it in both places.
> > > Actually yes, it is. Good point. Sami, is it the case that you have
> > > mounted the filesystem, then intentionally corrupted it and after that
> > > the kernel oopsed (as opposed to first corrupting the filesystem image and
> > > mounting it after that)? That would explain how corrupted values could get
> > > to read_block_bitmap() even though ext2_check_descriptors() checked them.
> >
> > No, that's not what I do. I corrupt the fs before mounting it, then
> > mount it, perform normal filesystem operations on it and unmount it.

> OK, thanks. Then we must somehow corrupt group descriptor block during
> the operation.

Oh, interesting... The data in the journal is probably corrupt, but all
of the superblock/gdt sanity checks are done BEFORE the journal is replayed.

It would seem that the ext*_fill_super() code should do the sanity checks,
and then recheck the superblock and group descriptors after the journal
is replayed. The superblock checking code can be moved out of
ext*_fill_super() into a helper function like ext*_check_super()) and then
calling ext*_check_super() and ext*_check_descriptors() again after journal
replay.

Having journal checksums enabled (ext4) would also detect this problem
before the journal replay corrupts the filesystem metadata.

It doesn't look possible that we can do journal recovery before loading
the GDT because ext*_load_journal()->ext*_get_journal() is doing iget()
and this needs the GDT to read the journal inode.

It might also make sense to just clean up the superblock and group descriptor
table and goto the beginning of fill_super() because in some cases the
superblock contents may have changed in important ways (e.g. crash after
resize of the filesystem which is only in the journal).

> Because I'm pretty sure it *is* corrupted - the oops
> is: unable to handle kernel paging request at c7e95ffc. If we look into
> registers, we see ECX has c7e96000 (which is probably bh->b_data). In
> the second oops it's exactly the same - ECX has c11e4000, the oops is at
> address c11e3ffc. So in both cases it is ECX-4. So somehow we managed to
> pass negative offset into ext2_test_bit(). But as Andreas pointed out,
> when we load descriptors into memory, we check that both bitmaps and
> inode table is in ext2_check_descriptors()... The other possibility
> would be that we managed to corrupts s_first_data_block in the
> superblock. Anyway, both possibilities don't look very likely. I'll try
> to reproduce the problem and maybe get more insight... How large is your
> filesystem BTW?
>
> > Here's the most current script I use (zzuf is the fuzzer):
> >
> > ------------------------------------------------------------
> > #!/bin/sh
> >
> > if [ "`hostname`" != "fstest" ]; then
> > echo "This is a dangerous script."
> > echo "Set your hostname to \`fstest\' if you want to use it."
> > exit 1
> > fi
> >
> > umount /dev/hdb
> > umount /dev/hdc
> > /etc/init.d/sysklogd stop
> > /etc/init.d/klogd stop
> > /etc/init.d/cron stop
> > mount /dev/hda / -t ext3 -o remount,ro || exit 1
> >
> > #ulimit -t 20
> >
> > for ((s=$1; s<1000000000; s++)); do
> > umount /mnt
> > echo '***** zzuffing *****' seed $s
> > zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit
> > mount /dev/hdb /mnt -t ext2 -o errors=continue || continue
> > cd /mnt || continue
> > timeout 30 cp -r doc doc2 >&/dev/null
> > timeout 30 find -xdev >&/dev/null
> > timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
> > timeout 30 mkdir tmp >&/dev/null
> > timeout 30 echo whoah >tmp/filu 2>/dev/null
> > timeout 30 rm -rf /mnt/* >&/dev/null
> > cd /
> > done
> > ------------------------------------------------------------

Oh, hmm, this is ext2 and not ext3, so no journal... I guess my bug is
still valid, but just not this one?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

2008-11-02 05:48:35

by Sami Liedes

[permalink] [raw]

Subject: Re: [Bugme-new] [Bug 11266] New: unable to handle kernel paging request in ext2_free_blocks

[Sorry for duplicates, forgot to use email instead of bugzilla web
interface.]

I now have found an ext3 filesystem for which this bug happens pretty
reproducibly on 2.6.27.4. Increasing commit interval seems to help it happen,
otherwise the journal can be aborted and then the bug no longer happens. I do
realize that this report is for the ext2 bug, but I hope finding a similar bug
on ext3 might help (and even if this is a separate bug, this information should
help resolve it).

Here's how to do it:

1. bunzip2 the attached filesystem image hdb.10000097.bz2

(I did the following inside qemu, hence /dev/hdb)

2. mount /dev/hdb /mnt -t ext3 -o errors=continue,commit=300
3. cd /mnt
4. timeout 30 cp -r doc doc2 >&/dev/null (or manually break cp after 30
seconds, it's jammed anyway)
6. find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null
7. mkdir tmp >&/dev/null
8. echo whoah >tmp/filu 2>/dev/null
9. rm -rf /mnt/* >&/dev/null
10. while completing rm -rf, the following oops occurs:

------------------------------------------------------------
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
block = 4294967295, count = 1
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
block = 4294967295, count = 1
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
block = 4294967295, count = 1
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
block = 4294967295, count = 1
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
block = 4294967295, count = 1
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks in system zones -
Block = 8234, count = 1
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
ext3_forget: aborting transaction: IO failure in __ext3_journal_forget
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks in system zones -
Block = 42, count = 3
EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone -
block = 25630524, count = 1
EXT3-fs error (device hdb) in ext3_free_blocks_sb: Readonly filesystem
EXT3-fs unexpected failure: !jh->b_committed_data;
inconsistent data on disk
BUG: unable to handle kernel paging request at c13fbbfc
IP: [<c02de4f9>] read_block_bitmap+0xa3/0x147
*pde = 07886163 *pte = 013fb160
Oops: 0000 [#1] DEBUG_PAGEALLOC

Pid: 817, comm: rm Not tainted (2.6.27.4 #1)
EIP: 0060:[<c02de4f9>] EFLAGS: 00000206 CPU: 0
EIP is at read_block_bitmap+0xa3/0x147
EAX: ffffdfff EBX: c13fc820 ECX: c13fc000 EDX: 00002001
ESI: c74b15b0 EDI: c7aae400 EBP: c7b7acd0 ESP: c7b7aca0
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process rm (pid: 817, ti=c7b7a000 task=c78a1ce0 task.ti=c7b7a000)
Stack: 00000001 00000000 00000000 c7aaf1c0 00000246 c79cdc00 00000001 00000000
c13fc000 00000000 00000001 c163b37c c7b7ad28 c02de66f c0315003 c740aadc
c7b7ad10 c7440000 c7aaf1c0 00000029 0000202a c7aae400 c7440000 c79cdcac
Call Trace:
[<c02de66f>] ? ext3_free_blocks_sb+0x93/0x3d6
[<c0315003>] ? journal_forget+0xff/0x1aa
[<c02edd83>] ? __ext3_journal_forget+0x19/0x3f
[<c02de9dd>] ? ext3_free_blocks+0x2b/0x7f
[<c02e3f8c>] ? ext3_clear_blocks+0x137/0x159
[<c02e4072>] ? ext3_free_data+0xc4/0x133
[<c02e4320>] ? ext3_free_branches+0x23f/0x247
[<c02e4189>] ? ext3_free_branches+0xa8/0x247
[<c02e4189>] ? ext3_free_branches+0xa8/0x247
[<c02e498d>] ? ext3_truncate+0x665/0x8ad
[<c0316062>] ? journal_start+0xb2/0x112
[<c031608d>] ? journal_start+0xdd/0x112
[<c0316062>] ? journal_start+0xb2/0x112
[<c02ebb53>] ? ext3_journal_start_sb+0x29/0x4a
[<c02e4ca4>] ? ext3_delete_inode+0xcf/0xdb
[<c02e4bd5>] ? ext3_delete_inode+0x0/0xdb
[<c02774b3>] ? generic_delete_inode+0x62/0xd5
[<c0277639>] ? generic_drop_inode+0x113/0x16a
[<c02765ac>] ? iput+0x47/0x4e
[<c026d9f4>] ? do_unlinkat+0xc3/0x13d
[<c054484f>] ? mutex_unlock+0x8/0xa
[<c026fb0b>] ? vfs_readdir+0x60/0x85
[<c026f84c>] ? filldir64+0x0/0xd7
[<c026fbc7>] ? sys_getdents64+0x97/0xa1
[<c026db66>] ? sys_unlinkat+0x23/0x36
[<c0202f1e>] ? syscall_call+0x7/0xb
=======================
Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10
8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 01 19 c0 85 c0
74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19
EIP: [<c02de4f9>] read_block_bitmap+0xa3/0x147 SS:ESP 0068:c7b7aca0
---[ end trace 780108b88e07a03e ]---
------------------------------------------------------------

Sami