LinuxLists.cc - Ext4 tree backports for 2.6.27.10 and 2.6.28

2009-01-17 18:43:57

Subject: Ext4 tree backports for 2.6.27.10 and 2.6.28

I've created a couple of ext4 backport branches which have been uploaded
to the ext4 git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git
http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git

The master branch contains the latest ext4 patch queue, against Linus's
tip; currently, this is versus 2.6.29-rc2. The 'for_linus' tag is
located on this branch, and currently contains a number of urgent fixes
that I plan to be pushing to Linus after he gets back from
linux.conf.au. They are mostly fixes that prevent OOPS or cpu lockups
when ext4 mounts an intentionally corrupted filesystem.

The ext4-stable branch is based off of 2.6.28, and contains all of the
patches which were pushed to linus during the 2.6.29-rc1 merge window,
plus the fixes listed above in the 'master' branch. It is designed for
people who want the very latest in the ext4 tree versus a stable kernel.

The for-stable branch is currently based off of 2.6.28, and contains a
candidate set of patches to be included in the 2.6.28.y stable tree.
Ext4 developers --- I would appreciate it if you could review the
patches on the ext4-stable tree, and see if I missed any patches which
in your opinion should be pushed to the 2.6.28.y tree. Furthermore, if
some folks could test the for-stable branch and let me know whether or
not you found it stable, I would appreciate it. Some of the patches
were relatively painful to backport, given the desire to remove
"non-critical" patches, so I'm not 100% certain I got the backports
completely right. Please test!

The for-stable-2.6.27 is currently based off of 2.6.27.11, and it
contains a candidate set of patches to be incuded in the 2.6.27.y tree.
It was even more difficult to backport these patches to 2.6.27.y, and so
I would **really** appreciate if some folks could review and test this
branch. In addition, a number of changes (in particular some of
Aneesh's resize race condition patches) were painful enough that I
decided to abort and not try to do the backport. It was late, and I was
getting tired.... If someone would like to try to backport some of
these missing patches, I would appreciate it; Aneesh, you might have
better luck since they were originally your patches, and they were
complicated enough that I was worried that there might have been
prerequisites that I had missed so they would function correctly.

The patch backports can be summarized in this table below. It contains
the original mainline commit ID, the commit ID in the 2.6.28 for-stable
branch, and the commit ID in the for-stable-2.6.27 branch. If you see
"-----" in the column for the 2.6.27-stable column, those were patches
which I did not backport due to lack of valor/courage at 11pm at night.

- Ted

mainline 2.6.28 2.6.27
commit-description
-------------------------------------------------------------------------
f99b2589 485f02f 11599d0
ext4: Add support for non-native signed/unsigned htree hash algorithms

2a21e37e 7426272 8443aef
ext4: tone down ext4_da_writepages warnings

791b7f08 2efd58c c7eef47
ext4: Fix the delalloc writepages to allocate blocks at the right offset.

565a9617 ce99b0d b2a193d
ext4: avoid ext4_error when mounting a fs with a single bg

ff7ef329 3a04ef3 626e5b9
ext4: Widen type of ext4_sb_info.s_mb_maxs[]

fd98496f f97e641 dc270b3
jbd2: Add barrier not supported test to journal_wait_on_commit_record

032115fc b9475aa c1944c2
ext4: Don't overwrite allocation_context ac_status

e21675d4 c31a2b2 -----
ext4: Add blocks added during resize to bitmap

920313a7 2a4f6ca -----
ext4: Use EXT4_GROUP_INFO_NEED_INIT_BIT during resize

c3a326a6 66364e6 24a5c92
ext4: cleanup mballoc header files

7a2fcbf7 39a0b8b -----
ext4: don't use blocks freed but not yet committed in buddy cache init

e8134b27 83a082c c712c85
ext4: Fix race between read_block_bitmap() and mark_diskspace_used()

39341867 8e53df4 4d3302c
ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()

e97fcd95 20d6100 7e081e8
jbd2: Add BH_JBDPrivateStart

2ccb5fb9 92f1c0e -----
ext4: Use new buffer_head flag to check uninit group bitmaps initialization

648f5879 51eef9f 469a48a
ext4: mark the blocks/inode bitmap beyond end of group as used

8556e8f3 5a2c7ad 686beef
ext4: Don't allow new groups to be added during block allocation

29eaf024 39d994e 0c56383
ext4: Init the complete page while building buddy cache

0087d9fb 808dfdb -----
ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc

4ec11028 cf7da20 -----
ext4: Add sanity checks for the superblock before mounting the filesystem

2009-01-17 22:16:57

From: Aneesh Kumar K.V <[email protected]>
Subject: [PATCH] ext4: don't use blocks freed but not yet committed in buddy cache init

When we generate buddy cache (especially during resize) we need to
make sure we don't use the blocks freed but not yet comitted. This
makes sure we have the right value of free blocks count in the group
info and also in the bitmap. This also ensures the ordered mode
consistency

Signed-off-by: Aneesh Kumar K.V <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>

---
fs/ext4/mballoc.c | 82 ++++++++++++++++++++++++++++++++++++++--------------
1 files changed, 60 insertions(+), 22 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 93e4c09..ea13c5e 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -335,6 +335,8 @@ static struct kmem_cache *ext4_ac_cachep;
static struct kmem_cache *ext4_free_ext_cachep;
static void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
ext4_group_t group);
+static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap,
+ ext4_group_t group);
static int ext4_mb_init_per_dev_proc(struct super_block *sb);
static int ext4_mb_destroy_per_dev_proc(struct super_block *sb);
static void ext4_mb_free_committed_blocks(struct super_block *);
@@ -859,7 +861,9 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
/*
* incore got set to the group block bitmap below
*/
+ ext4_lock_group(sb, group);
ext4_mb_generate_buddy(sb, data, incore, group);
+ ext4_unlock_group(sb, group);
incore = NULL;
} else {
/* this is block of bitmap */
@@ -873,6 +877,7 @@ static int ext4_mb_init_cache(struct page *page, char *incore)

/* mark all preallocated blks used in in-core bitmap */
ext4_mb_generate_from_pa(sb, data, group);
+ ext4_mb_generate_from_freelist(sb, data, group);
ext4_unlock_group(sb, group);

/* set incore so that the buddy information can be
@@ -3583,6 +3588,32 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
}

/*
+ * the function goes through all block freed in the group
+ * but not yet committed and marks them used in in-core bitmap.
+ * buddy must be generated from this bitmap
+ * Need to be called with ext4 group lock (ext4_lock_group)
+ */
+static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap,
+ ext4_group_t group)
+{
+ struct rb_node *n;
+ struct ext4_group_info *grp;
+ struct ext4_free_data *entry;
+
+ grp = ext4_get_group_info(sb, group);
+ n = rb_first(&(grp->bb_free_root));
+
+ while (n) {
+ entry = rb_entry(n, struct ext4_free_data, node);
+ mb_set_bits(sb_bgl_lock(EXT4_SB(sb), group),
+ bitmap, entry->start_blk,
+ entry->count);
+ n = rb_next(n);
+ }
+ return;
+}
+
+/*
* the function goes through all preallocation in this group and marks them
* used in in-core bitmap. buddy must be generated from this bitmap
* Need to be called with ext4 group lock (ext4_lock_group)
@@ -4720,27 +4751,22 @@ static int can_merge(struct ext4_free_data *entry1,

static noinline_for_stack int
ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
- ext4_group_t group, ext4_grpblk_t block, int count)
+ struct ext4_free_data *new_entry)
{
+ ext4_grpblk_t block;
+ struct ext4_free_data *entry;
struct ext4_group_info *db = e4b->bd_info;
struct super_block *sb = e4b->bd_sb;
struct ext4_sb_info *sbi = EXT4_SB(sb);
- struct ext4_free_data *entry, *new_entry;
struct rb_node **n = &db->bb_free_root.rb_node, *node;
struct rb_node *parent = NULL, *new_node;

-
BUG_ON(e4b->bd_bitmap_page == NULL);
BUG_ON(e4b->bd_buddy_page == NULL);

- new_entry = kmem_cache_alloc(ext4_free_ext_cachep, GFP_NOFS);
- new_entry->start_blk = block;
- new_entry->group = group;
- new_entry->count = count;
- new_entry->t_tid = handle->h_transaction->t_tid;
new_node = &new_entry->node;
+ block = new_entry->start_blk;

- ext4_lock_group(sb, group);
if (!*n) {
/* first free block exent. We need to
protect buddy cache from being freed,
@@ -4799,7 +4825,6 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
spin_lock(&sbi->s_md_lock);
list_add(&new_entry->list, &sbi->s_active_transaction);
spin_unlock(&sbi->s_md_lock);
- ext4_unlock_group(sb, group);
return 0;
}

@@ -4906,15 +4931,6 @@ do_more:
BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data));
}
#endif
- mb_clear_bits(sb_bgl_lock(sbi, block_group), bitmap_bh->b_data,
- bit, count);
-
- /* We dirtied the bitmap block */
- BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
- err = ext4_journal_dirty_metadata(handle, bitmap_bh);
- if (err)
- goto error_return;
-
if (ac) {
ac->ac_b_ex.fe_group = block_group;
ac->ac_b_ex.fe_start = bit;
@@ -4926,11 +4942,29 @@ do_more:
if (err)
goto error_return;
if (metadata) {
- /* blocks being freed are metadata. these blocks shouldn't
- * be used until this transaction is committed */
- ext4_mb_free_metadata(handle, &e4b, block_group, bit, count);
+ struct ext4_free_data *new_entry;
+ /*
+ * blocks being freed are metadata. these blocks shouldn't
+ * be used until this transaction is committed
+ */
+ new_entry = kmem_cache_alloc(ext4_free_ext_cachep, GFP_NOFS);
+ new_entry->start_blk = bit;
+ new_entry->group = block_group;
+ new_entry->count = count;
+ new_entry->t_tid = handle->h_transaction->t_tid;
+ ext4_lock_group(sb, block_group);
+ mb_clear_bits(sb_bgl_lock(sbi, block_group), bitmap_bh->b_data,
+ bit, count);
+ ext4_mb_free_metadata(handle, &e4b, new_entry);
+ ext4_unlock_group(sb, block_group);
} else {
ext4_lock_group(sb, block_group);
+ /* need to update group_info->bb_free and bitmap
+ * with group lock held. generate_buddy look at
+ * them with group lock_held
+ */
+ mb_clear_bits(sb_bgl_lock(sbi, block_group), bitmap_bh->b_data,
+ bit, count);
mb_free_blocks(inode, &e4b, bit, count);
ext4_mb_return_to_preallocation(inode, &e4b, block, count);
ext4_unlock_group(sb, block_group);
@@ -4953,6 +4987,10 @@ do_more:

*freed += count;

+ /* We dirtied the bitmap block */
+ BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
+ err = ext4_journal_dirty_metadata(handle, bitmap_bh);
+
/* And the group descriptor block */
BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
ret = ext4_journal_dirty_metadata(handle, gd_bh);
--
tg: (7830455..) aneesh-7-dont-use-blocks-freed-but-not-yet-committed-in-buddy-cache-init (depends on: use-rb-tree-for-free-blocks-tracking)

2009-01-20 08:29:33

by Aneesh Kumar K.V

[permalink] [raw]

Subject: [PATCH] ext4: Add blocks added during resize to bitmap

2009-01-22 19:50:12

by Greg KH

[permalink] [raw]

On Sunday 30 August 2009 10:24:56 Nick Dokos wrote:
> > Hi,
> >
> > I am running 2.6.31-rc8+ and have ext4 corruption that will not go away.
> >
> > My root fs is ext4 on sdb3. I have moved the directory with corruption into lost+found and booted to a rescuse
> > system (arch linux) and run fsck.ext4 on the filesystem, which then reports its clean... Booting back into my
> > gentoo system and attempting to remove the xx directory from lost+found gives:
> >
> > [ 172.408799] EXT4-fs error (device sdb3): ext4_ext_check_inode: bad header/extent in inode #706801: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
> > [ 172.429410] EXT4-fs error (device sdb3): ext4_ext_check_inode: bad header/extent in inode #706801: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
> > [ 172.449920] EXT4-fs error (device sdb3): ext4_ext_check_inode: bad header/extent in inode #706801: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
> >
> > The above is repeatable.
> >
> > How can I _really_ clean this fs? What info is needed to help the process?
> >
>
> The first two pieces of information needed would be the version of e2fsprogs
> that you are running (e2fsck -V) and the stat of inode 706801:
>
> debugfs -R 'stat <706801>' /dev/sdb3
>

e2fsck 1.41.9 (22-Aug-2009)
Using EXT2FS Library version 1.41.9, 22-Aug-2009

Inode: 706801 Type: regular Mode: 0644 Flags: 0x80000
Generation: 28075061 Version: 0x00000000:00000001
User: 0 Group: 0 Size: 1442
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 0
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x4a8beb6a:a8c2e6e4 -- Wed Aug 19 08:09:14 2009
atime: 0x4a8beb6a:00000000 -- Wed Aug 19 08:09:14 2009
mtime: 0x4a8b17d9:00000000 -- Tue Aug 18 17:06:33 2009
crtime: 0x4a8beb6a:9e456724 -- Wed Aug 19 08:09:14 2009
Size of extra inode fields: 28
EXTENTS:

and of 76804 which also has a problem

Inode: 706804 Type: regular Mode: 0644 Flags: 0x80000
Generation: 4140131203 Version: 0x00000000:00000001
User: 1000 Group: 100 Size: 523
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 0
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x4a95c957:348db0b8 -- Wed Aug 26 19:46:31 2009
atime: 0x4a991f2b:cf3a606c -- Sat Aug 29 08:29:31 2009
mtime: 0x4a95c957:348db0b8 -- Wed Aug 26 19:46:31 2009
crtime: 0x4a95c957:348db0b8 -- Wed Aug 26 19:46:31 2009
Size of extra inode fields: 28
EXTENTS:

Hope this helps
Ed