LinuxLists.cc - mballoc errors

2008-03-26 21:41:23

Subject: mballoc errors

This report just came in on the fedora list...

EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 717:
23410 blocks in bitmap, 23411 in gd

EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 721:
19309 blocks in bitmap, 19333 in gd

has anyone else seen this problem in testing? I guess it means
freespace accounting got out of sync...?

Thanks,
-Eric

2008-03-26 23:23:56

by Bernd Schubert

[permalink] [raw]

Subject: Re: mballoc errors

On Wed, Mar 26, 2008 at 04:41:22PM -0500, Eric Sandeen wrote:
> This report just came in on the fedora list...
>
> EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 717:
> 23410 blocks in bitmap, 23411 in gd
>
> EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 721:
> 19309 blocks in bitmap, 19333 in gd
>
> has anyone else seen this problem in testing? I guess it means
> freespace accounting got out of sync...?

We have seen this on ldiskfs, one time a hardware raid was writing data
garbage and one time it was after a Lustre upgrade from 1.4 to 1.6 with
a filesystem that probably already had errors before the upgrade.

So maybe this message is a good indicator for generic problems?

Cheers,
Bernd

2008-03-26 23:31:35

by Mingming Cao

[permalink] [raw]

Subject: Re: mballoc errors

On Wed, 2008-03-26 at 16:41 -0500, Eric Sandeen wrote:
> This report just came in on the fedora list...
>
> EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 717:
> 23410 blocks in bitmap, 23411 in gd
>
> EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 721:
> 19309 blocks in bitmap, 19333 in gd
>
> has anyone else seen this problem in testing? I guess it means
> freespace accounting got out of sync...?
>
No, I haven't seen this before. The related code is
ext4_mb_generate_buddy(..)
{
....
if (free != grp->bb_free) {
ext4_error(sb, __func__,
"EXT4-fs: group %lu: %u blocks in bitmap, %u in gd\n",
group, free, grp->bb_free);
/*
* If we intent to continue, we consider group descritor
* corrupt and update bb_free using bitmap value
*/
grp->bb_free = free;
}
}

I guess the free blocks counter in in-memory group descriptor is not
sync with the on-disk bitmap. Looks like mballoc introduced in-core
block group descriptor(struct ext4_group_info) and keep track of free
blocks counter in the in-core block group
descriptor(ext4_group_info.bb_free), while the non-mballoc allocation
accounting is done via the on-disk block group descriptor, which is
likely, the metadata block(s) (e.g index extent block, xattr block) are
all allocated in the old way (ext4_new_blocks_old()).

I guess this out-of-sync happens already without notice, only when the
buddy info for this block group is re-generated after being pushed out
of memory under pressure this check is being hit.

> Thanks,
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-03-27 03:48:11

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: mballoc errors

On Wed, Mar 26, 2008 at 04:41:22PM -0500, Eric Sandeen wrote:
> This report just came in on the fedora list...
>
> EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 717:
> 23410 blocks in bitmap, 23411 in gd
>
> EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 721:
> 19309 blocks in bitmap, 19333 in gd
>
> has anyone else seen this problem in testing? I guess it means
> freespace accounting got out of sync...?
>

The bitmap or group desc is corrupt. The number of free blocks counted
via bitmap and found via desc->bg_free_blocks_count are not same. We
fall back to one calculated the bitmap.

-aneesh

2008-03-27 07:53:05

by Solofo.Ramangalahy

[permalink] [raw]

Subject: mballoc errors

Eric Sandeen writes:
> has anyone else seen this problem in testing?

This was reported in relation with fsfuzzer testing:
http://www.mail-archive.com/linux-ext4%40vger.kernel.org/msg04889.html

--
solofo

2008-03-28 14:03:19

by Valerie Clement

[permalink] [raw]

Subject: Re: mballoc errors

Eric Sandeen wrote:
> This report just came in on the fedora list...
>
> EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 717:
> 23410 blocks in bitmap, 23411 in gd
>
> EXT4-fs error (device dm-0): ext4_mb_generate_buddy: EXT4-fs: group 721:
> 19309 blocks in bitmap, 19333 in gd
>
> has anyone else seen this problem in testing? I guess it means
> freespace accounting got out of sync...?
>
I've got no problem running tests on filesystems with a 4K blocksize.
But since I work on FS with 1K blocksize, I've got the following errors
while running fsstress:

EXT4-fs error (device sdc1): ext4_ext_find_extent: bad header in inode #4407333:
invalid magic - magic dada, entries 56026, max 56026(0), depth 56026(0)

or

EXT4-fs error (device sdc1): ext4_valid_block_bitmap: Invalid block bitmap -
block_group = 6767, block = 55435267

I already found a bug in the patch ext4-fallocate-full-fs-ENOSPC-handling.
In the function, ext4_ext_zeroout()
/* convert ee_pblock in 512 byte sector */
ee_pblock = ee_pblock << (blkbits >> 9);
should be
ee_pblock = ee_pblock << (blkbits - 9);

But after doing this change, the problems occur more quickly.
I'm not sure that the function ext4_ext_zeroout() is correct if
blocksize < pagesize.
Aneesh, could you check that please?
Thanks,
Valerie

How I reproduce the problem:
# mkfs.ext3 -I256 -E test_fs -b 1024 /dev/sdc1
# mount -t ext4dev /dev/sdc1 /mnt/test
# fsstress -d /mnt/test -n1000 -p1000

2008-03-31 20:18:30

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: mballoc errors

On Mon, Mar 31, 2008 at 04:58:06PM +0200, Valerie Clement wrote:
> Aneesh Kumar K.V wrote:
>> I looked at the code. Nothing appears straight. I am now running tests.
>> meanwhile you can dump the block ee_pblock in the ext4_ext_zeroout
>> and see if we are zeroing some wrong blocks that would be great.
>>
> I didn't see anything incorrect here for the moment, but adding traces
> in the code often changes the behavior so that I can't reproduce the problem
> running the same test.
>
>>
>>> How I reproduce the problem:
>>> # mkfs.ext3 -I256 -E test_fs -b 1024 /dev/sdc1
>>> # mount -t ext4dev /dev/sdc1 /mnt/test
>>> # fsstress -d /mnt/test -n1000 -p1000
>>>
>>
>> Does the fsstress you are using have fallocate support ?. If so can you
>> send me the patch so that i can run the same test.
> No, the fsstress I'm using doesn't support fallocate.

That means it is not due to ext4_ext_zeroout. Which implies we have a
generic file system corruption.

>
>> Also can you disable delalloc and try.
> OK, done. The tests are still running.
>
>> ENOSPC handling with delalloc is not yet done
> I often got the problem when the disk is filled to 10% of its capacity.
>
>

I actually added fallocate to fsstress and created the file filesystem
as you suggested. I am able to reproduce the problem once. Currently
doing a code audit. Will let you know if i make any progress.

-aneesh
>

2008-03-31 20:50:41

by Aneesh Kumar K.V

[permalink] [raw]

Subject: [PATCH] fix file system corruption [ was Re: mballoc errors ]

On Tue, Apr 01, 2008 at 01:48:19AM +0530, Aneesh Kumar K.V wrote:
> On Mon, Mar 31, 2008 at 04:58:06PM +0200, Valerie Clement wrote:
>
> I actually added fallocate to fsstress and created the file filesystem
> as you suggested. I am able to reproduce the problem once. Currently
> doing a code audit. Will let you know if i make any progress.
>

I found one case where it could fail. We are returning total extent size
in case of converting from uninitialized extent to initialized one.
Now if we have the below layout

[p --------x--------]

where p is the start block and x is the location where we intent to
write. If converting the above uninit extent resulted in

[p zeroed-out][uninit].

We should not be returning the size of
zeroed-out-extent. That is because in ext4_ext_get_blocks we cache
the extent information as below

out_new:
....
/* Cache only when it is _not_ an uninitialized extent */
if (create != EXT4_CREATE_UNINITIALIZED_EXT)
ext4_ext_put_in_cache(inode, iblock, allocated, newblock,
EXT4_EXT_CACHE_EXTENT);
out:

That would mean we are caching an extent starting from newblock of
size allocated where allocated would be the size of zeroout extent,
which is actually wrong.

I am yet to test the patch. If you have time can you test the below
change

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 964d2c1..91f8f72 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2264,7 +2264,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
ex->ee_len = orig_ex.ee_len;
ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
ext4_ext_dirty(handle, inode, path + depth);
- return le16_to_cpu(ex->ee_len);
+ /* zeroed the full extent */
+ return allocated;
}

/* ex1: ee_block to iblock - 1 : uninitialized */
@@ -2309,11 +2310,13 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
ex->ee_len = orig_ex.ee_len;
ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
ext4_ext_dirty(handle, inode, path + depth);
- return le16_to_cpu(ex->ee_len);
+ /* zeroed the full extent */
+ return allocated;

} else if (err)
goto fix_extent_len;

+ /* zeroed the second half */
return allocated;
}
ex3 = &newex;
@@ -2331,7 +2334,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
ex->ee_len = orig_ex.ee_len;
ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
ext4_ext_dirty(handle, inode, path + depth);
- return le16_to_cpu(ex->ee_len);
+ /* zeroed the full extent */
+ return allocated;

} else if (err)
goto fix_extent_len;
@@ -2379,7 +2383,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
ex->ee_len = orig_ex.ee_len;
ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
ext4_ext_dirty(handle, inode, path + depth);
- return le16_to_cpu(ex->ee_len);
+ /* zero out the first half */
+ return allocated;
}
}
/*
@@ -2446,7 +2451,8 @@ insert:
ex->ee_len = orig_ex.ee_len;
ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
ext4_ext_dirty(handle, inode, path + depth);
- return le16_to_cpu(ex->ee_len);
+ /* zero out the first half */
+ return allocated;
} else if (err)
goto fix_extent_len;
out:

2008-04-01 08:38:31

by Valerie Clement

[permalink] [raw]

Subject: Re: [PATCH] fix file system corruption [ was Re: mballoc errors ]

Aneesh Kumar K.V wrote:
> I found one case where it could fail. We are returning total extent size
> in case of converting from uninitialized extent to initialized one.
> Now if we have the below layout
>
> [p --------x--------]
>
> where p is the start block and x is the location where we intent to
> write. If converting the above uninit extent resulted in
>
> [p zeroed-out][uninit].
>
> We should not be returning the size of
> zeroed-out-extent. That is because in ext4_ext_get_blocks we cache
> the extent information as below
>
> out_new:
> ....
> /* Cache only when it is _not_ an uninitialized extent */
> if (create != EXT4_CREATE_UNINITIALIZED_EXT)
> ext4_ext_put_in_cache(inode, iblock, allocated, newblock,
> EXT4_EXT_CACHE_EXTENT);
> out:
>
>
> That would mean we are caching an extent starting from newblock of
> size allocated where allocated would be the size of zeroout extent,
> which is actually wrong.
>
> I am yet to test the patch. If you have time can you test the below
> change

Hi Aneesh,
I tested your patch but unfortunately I still reproduced the problem with it:
EXT4-fs error (device sdc1): ext4_valid_block_bitmap: Invalid block bitmap -
block_group = 6233, block = 51060737

Is there another thing I could do to help debugging the problem?
Valerie

>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 964d2c1..91f8f72 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -2264,7 +2264,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> ex->ee_len = orig_ex.ee_len;
> ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
> ext4_ext_dirty(handle, inode, path + depth);
> - return le16_to_cpu(ex->ee_len);
> + /* zeroed the full extent */
> + return allocated;
> }
>
> /* ex1: ee_block to iblock - 1 : uninitialized */
> @@ -2309,11 +2310,13 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> ex->ee_len = orig_ex.ee_len;
> ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
> ext4_ext_dirty(handle, inode, path + depth);
> - return le16_to_cpu(ex->ee_len);
> + /* zeroed the full extent */
> + return allocated;
>
> } else if (err)
> goto fix_extent_len;
>
> + /* zeroed the second half */
> return allocated;
> }
> ex3 = &newex;
> @@ -2331,7 +2334,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> ex->ee_len = orig_ex.ee_len;
> ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
> ext4_ext_dirty(handle, inode, path + depth);
> - return le16_to_cpu(ex->ee_len);
> + /* zeroed the full extent */
> + return allocated;
>
> } else if (err)
> goto fix_extent_len;
> @@ -2379,7 +2383,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
> ex->ee_len = orig_ex.ee_len;
> ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
> ext4_ext_dirty(handle, inode, path + depth);
> - return le16_to_cpu(ex->ee_len);
> + /* zero out the first half */
> + return allocated;
> }
> }
> /*
> @@ -2446,7 +2451,8 @@ insert:
> ex->ee_len = orig_ex.ee_len;
> ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
> ext4_ext_dirty(handle, inode, path + depth);
> - return le16_to_cpu(ex->ee_len);
> + /* zero out the first half */
> + return allocated;
> } else if (err)
> goto fix_extent_len;
> out:
>
>

2008-04-01 09:02:00

by Aneesh Kumar K.V

[permalink] [raw]

Subject: Re: [PATCH] fix file system corruption [ was Re: mballoc errors ]

On Tue, Apr 01, 2008 at 10:38:50AM +0200, Valerie Clement wrote:
>
> Hi Aneesh,
> I tested your patch but unfortunately I still reproduced the problem with it:
> EXT4-fs error (device sdc1): ext4_valid_block_bitmap: Invalid block
> bitmap - block_group = 6233, block = 51060737
>

Below is the patch i am testing. It is running fine for me here. I am
testing the patch with a modified fsstress that use fallocate.

Regarding the test case that you have i am not yet sure what is causing
the file system corruption. It cannot be ext4_ext_zeroout or
ext4_ext_convert_to_initialized because they are used only when we are
writing to falloc area and you are not preallocating anything in your
test. Did you make sure you are not using delalloc ? All my tests are
run with stable branch.

You would also need the single line change you found in ext4_ext_zeroout
It is already a part of patch queue.

commit c664eac338ae9c6139ffec72fca84b222bf142b2
Author: Aneesh Kumar K.V <[email protected]>
Date: Tue Apr 1 13:23:21 2008 +0530

ext4: Cache the correct extent length for uninit extent.

When we convert uninitialized extent to initialized extent
we need to make sure we return the number of blocks in the
extent from the file system block corresponding to logical
file block. Other wise we cache wrong extent details and this
results in file system corruption.

Signed-off-by: Aneesh Kumar K.V <[email protected]>

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 964d2c1..30f0f99 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2264,7 +2264,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
ex->ee_len = orig_ex.ee_len;
ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
ext4_ext_dirty(handle, inode, path + depth);
- return le16_to_cpu(ex->ee_len);
+ /* zeroed the full extent */
+ return allocated;
}

/* ex1: ee_block to iblock - 1 : uninitialized */
@@ -2309,11 +2310,44 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
ex->ee_len = orig_ex.ee_len;
ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
ext4_ext_dirty(handle, inode, path + depth);
- return le16_to_cpu(ex->ee_len);
+ /* zeroed the full extent */
+ return allocated;

} else if (err)
goto fix_extent_len;

+ /* We need to zero out the second half because
+ * fallocate request can update file size and
+ * converting the second half to initialized extent
+ * implies that we can leak some junk data to user
+ * space
+ */
+ err = ext4_ext_zeroout(inode, ex3);
+ if (err) {
+ /*
+ * We should actually mark the
+ * second half as uninit and return error
+ * Insert would have changed the extent
+ */
+ depth = ext_depth(inode);
+ ext4_ext_drop_refs(path);
+ path = ext4_ext_find_extent(inode,
+ iblock, path);
+ if (IS_ERR(path)) {
+ err = PTR_ERR(path);
+ return err;
+ }
+ ex = path[depth].p_ext;
+ err = ext4_ext_get_access(handle, inode,
+ path + depth);
+ if (err)
+ return err;
+ ext4_ext_mark_uninitialized(ex);
+ ext4_ext_dirty(handle, inode, path + depth);
+ return err;
+ }
+
+ /* zeroed the second half */
return allocated;
}
ex3 = &newex;
@@ -2331,7 +2365,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
ex->ee_len = orig_ex.ee_len;
ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
ext4_ext_dirty(handle, inode, path + depth);
- return le16_to_cpu(ex->ee_len);
+ /* zeroed the full extent */
+ return allocated;

} else if (err)
goto fix_extent_len;
@@ -2379,7 +2414,8 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
ex->ee_len = orig_ex.ee_len;
ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
ext4_ext_dirty(handle, inode, path + depth);
- return le16_to_cpu(ex->ee_len);
+ /* zero out the first half */
+ return allocated;
}
}
/*
@@ -2446,7 +2482,8 @@ insert:
ex->ee_len = orig_ex.ee_len;
ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
ext4_ext_dirty(handle, inode, path + depth);
- return le16_to_cpu(ex->ee_len);
+ /* zero out the first half */
+ return allocated;
} else if (err)
goto fix_extent_len;
out:

2008-04-01 09:45:36

by Valerie Clement

[permalink] [raw]

Subject: Re: [PATCH] fix file system corruption [ was Re: mballoc errors ]

Aneesh Kumar K.V wrote:
> Regarding the test case that you have i am not yet sure what is causing
> the file system corruption. It cannot be ext4_ext_zeroout or
> ext4_ext_convert_to_initialized because they are used only when we are
> writing to falloc area and you are not preallocating anything in your
> test. Did you make sure you are not using delalloc ? All my tests are
> run with stable branch.

Oh I'm sorry, I was not enough precise in my previous mails.
When I disable the delalloc feature, I don't reproduce the problem.
So, there is another problem. I will look into the delalloc code.
Valerie