2007-10-01 22:09:33

by Avantika Mathur

[permalink] [raw]
Subject: Ext4 devel interlock meeting minutes (October 1, 2007)

Ext4 Developer Interlock Call: 10/1/07 Meeting Minutes

Attendees: Mingming Cao, Aneesh Veetil, Dave Kleikamp, Akira Fujita, Ted
Ts'o, Jose Santos, Avantika Mathur, Jean-Pierre Dion

Minutes can be accessed at:
http://ext4.wiki.kernel.org/index.php/Ext4_Developer%27s_Conference_Call

EXT4 PATCH QUEUE

Multiple Block Allocation:
- Main outstanding issue is still need for better documentation of the code.
- Aneesh has marked the places that need explanation with fixme, and
Alex will be adding the needed comments. If this gets done quickly, we
can try to push this feature to 2.6.24
- Aneesh has asked for recommendations for a benchmark to run which will
test the mballoc allocator. Mingming suggested using dd and dbench, as
was done in the past for basic mballoc patches. To test the in-core
preallocation, kernel untar tests. Look at the file fragmentation after
running the test.

Large Block Patches:
- Mingming will post these patches to lkml for review

Uninitialized block groups:
- the uninitialized block group patches were posted to lkml and are in
the -mm tree.

JBD-stats through procfs patches:
- Ted will look at these patches and see if they are ready to push upstream

generic-find-next-le-bit:
- This function is only used in multiple block allocation patches. It
would not be a good idea to post this before mballoc, since the function
would be unused and may be cleaned up later.

Delayed Allocation:
- There has been an lkml thread about these patches.
- We have and approach that works for ext4, implemented at the vfs
level, but unless we can prove it can work for other filesystems, it
will not be accepted.
- Christoph Hellwig has commented that these patches will not work for
XFS.
- Mingming will try the patches with ext2/3, and Shaggy will try to find
time to test with JFS.
- If this approach works for other filesystems, we can try to push to
mainline as a placeholder.

truncate mutex patch:
- There is a performance concern with these patches. Mingming suggested
doing some testing to see what the actual performance regression is.
- Ted suggests breaking into two patches:
-- Patch 1 - Change the name of the mutex to block_mapped_mutex and
update comments to reflect the usage of the mutex
-- Patch 2 - Change to a read/write mutex

GENERAL
- Those patches we would still like to see in 2.6.24 need to be pushed
very quickly
- We should avoid having ext2/3 patches in the ext4-patch-queue and git
tree.
- Some of the patches that we just send to lkml for review end up being
pulled into the -mm tree, then Ted pulls them to the ext4-git tree, and
Andrew must remove from -mm. When we post for comments, we should use
[PATCH,RFC] to avoid this.
- Mingming suggested having a patch queue for the ext4-git tree for
easier review. Ted is also considering having a pu branch for the git
tree. Ted will think about what will work best.
- In the ext4-patch qDescription of patch include which version of -mm
the patch is in


2007-10-02 05:58:37

by Andreas Dilger

[permalink] [raw]
Subject: Re: Ext4 devel interlock meeting minutes (October 1, 2007)

On Oct 01, 2007 15:10 -0700, Avantika Mathur wrote:
> Delayed Allocation:
> - There has been an lkml thread about these patches.
> - We have and approach that works for ext4, implemented at the vfs
> level, but unless we can prove it can work for other filesystems, it
> will not be accepted.
> - Christoph Hellwig has commented that these patches will not work for
> XFS.

Hmm, but I thought that Christoph also agreed that it would be OK to
get the ext4 delalloc code merged separately, so long as it doesn't
need big/any changes to the VFS to implement it. It might be that the
ext4 and XFS code is different enough that they cannot share the
delayed allocation code.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-10-02 09:24:59

by Valerie Clement

[permalink] [raw]
Subject: Re: Ext4 devel interlock meeting minutes (October 1, 2007)

Avantika Mathur wrote:
> Multiple Block Allocation:
> - Main outstanding issue is still need for better documentation of the
> code.
> - Aneesh has marked the places that need explanation with fixme, and
> Alex will be adding the needed comments. If this gets done quickly, we
> can try to push this feature to 2.6.24
> - Aneesh has asked for recommendations for a benchmark to run which will
> test the mballoc allocator. Mingming suggested using dd and dbench, as
> was done in the past for basic mballoc patches. To test the in-core
> preallocation, kernel untar tests. Look at the file fragmentation after
> running the test.

Currently, the mballoc feature is not compatible with the uninit_groups
feature. I have just tried a simple test which failed. Isn't this a problem?

Another point that is not clear for me, is mballoc compatible with
delalloc now, or is there still a restriction?

Val?rie

2007-10-02 11:37:18

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: Ext4 devel interlock meeting minutes (October 1, 2007)



Valerie Clement wrote:
> Avantika Mathur wrote:
>> Multiple Block Allocation:
>> - Main outstanding issue is still need for better documentation of the
>> code.
>> - Aneesh has marked the places that need explanation with fixme, and
>> Alex will be adding the needed comments. If this gets done quickly,
>> we can try to push this feature to 2.6.24
>> - Aneesh has asked for recommendations for a benchmark to run which
>> will test the mballoc allocator. Mingming suggested using dd and
>> dbench, as was done in the past for basic mballoc patches. To test
>> the in-core preallocation, kernel untar tests. Look at the file
>> fragmentation after running the test.
>
> Currently, the mballoc feature is not compatible with the uninit_groups
> feature. I have just tried a simple test which failed. Isn't this a
> problem?
>
> Another point that is not clear for me, is mballoc compatible with
> delalloc now, or is there still a restriction?
>

I thought i had merged the patch from Andreas.

http://marc.info/?l=linux-ext4&m=118833298930860&w=2

But i don't see it in mballoc-core.patch

Will update the mballoc patch

-aneesh

2007-10-02 14:12:40

by Andreas Dilger

[permalink] [raw]
Subject: Re: Ext4 devel interlock meeting minutes (October 1, 2007)

On Oct 02, 2007 11:24 +0200, Valerie Clement wrote:
> Currently, the mballoc feature is not compatible with the uninit_groups
> feature. I have just tried a simple test which failed. Isn't this a problem?

I thought Avantika posted the incremental patch to ext4 mballoc to work
with the uninit groups patch? This is what we have in our ext3 patch:


Index: linux-rhel5/fs/ext3/mballoc.c
===================================================================
--- linux-rhel5.orig/fs/ext3/mballoc.c 2007-07-18 17:32:04.000000000 +0200
+++ linux-rhel5/fs/ext3/mballoc.c 2007-07-18 17:32:15.000000000 +0200
@@ -36,6 +36,8 @@
#include <linux/seq_file.h>
#include <linux/version.h>

+#include "group.h"
+
/*
* MUSTDO:
* - test ext3_ext_search_left() and ext3_ext_search_right()
@@ -323,6 +325,7 @@ struct ext3_group_info {
unsigned long bb_state;
unsigned long bb_tid;
struct ext3_free_metadata *bb_md_cur;
+ struct ext3_group_desc *bb_gdp;
unsigned short bb_first_free;
unsigned short bb_free;
unsigned short bb_fragments;
@@ -943,10 +946,7 @@ static int ext3_mb_init_cache(struct pag
if (first_group + i >= EXT3_SB(sb)->s_groups_count)
break;

- err = -EIO;
- desc = ext3_get_group_desc(sb, first_group + i, NULL);
- if (desc == NULL)
- goto out;
+ desc = EXT3_GROUP_INFO(sb, first_group + i)->bb_gdp;

err = -ENOMEM;
bh[i] = sb_getblk(sb, le32_to_cpu(desc->bg_block_bitmap));
@@ -961,7 +961,12 @@ static int ext3_mb_init_cache(struct pag
unlock_buffer(bh[i]);
continue;
}
-
+ if (desc->bg_flags & cpu_to_le16(EXT3_BG_BLOCK_UNINIT)) {
+ ext3_init_block_bitmap(sb, bh[i], first_group + i,desc);
+ set_buffer_uptodate(bh[i]);
+ unlock_buffer(bh[i]);
+ continue;
+ }
get_bh(bh[i]);
bh[i]->b_end_io = end_buffer_read_sync;
submit_bh(READ, bh[i]);
@@ -1732,6 +1737,10 @@ static int ext3_mb_good_group(struct ext
switch (cr) {
case 0:
BUG_ON(ac->ac_2order == 0);
+ /* If this group is uninitialized, skip it initially */
+ if (grp->bb_gdp->bg_flags &
+ cpu_to_le16(EXT3_BG_BLOCK_UNINIT))
+ return 0;
bits = ac->ac_sb->s_blocksize_bits + 1;
for (i = ac->ac_2order; i <= bits; i++)
if (grp->bb_counters[i] > 0)
@@ -1825,7 +1834,9 @@ repeat:
}

ac->ac_groups_scanned++;
- if (cr == 0)
+ if (cr == 0 || (e3b.bd_info->bb_gdp->bg_flags &
+ cpu_to_le16(EXT3_BG_BLOCK_UNINIT) &&
+ ac->ac_2order != 0))
ext3_mb_simple_scan_group(ac, &e3b);
else if (cr == 1 && ac->ac_g_ex.fe_len == sbi->s_stripe)
ext3_mb_scan_aligned(ac, &e3b);
@@ -2304,12 +2315,13 @@ int ext3_mb_init_backend(struct super_bl
i--;
goto err_freebuddy;
}
+ memset(meta_group_info[j], 0, len);
desc = ext3_get_group_desc(sb, i, NULL);
+ meta_group_info[j]->bb_gdp = desc;
if (desc == NULL) {
printk(KERN_ERR"EXT3-fs: can't read descriptor %u\n",i);
goto err_freebuddy;
}
- memset(meta_group_info[j], 0, len);
set_bit(EXT3_GROUP_INFO_NEED_INIT_BIT,
&meta_group_info[j]->bb_state);

@@ -2958,9 +2970,17 @@ int ext3_mb_mark_diskspace_used(struct e
mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);

spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
+ if (gdp->bg_flags & cpu_to_le16(EXT3_BG_BLOCK_UNINIT)) {
+ gdp->bg_flags &= cpu_to_le16(~EXT3_BG_BLOCK_UNINIT);
+ gdp->bg_free_blocks_count =
+ cpu_to_le16(ext3_free_blocks_after_init(sb,
+ ac->ac_b_ex.fe_group,
+ gdp));
+ }
gdp->bg_free_blocks_count =
cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count)
- ac->ac_b_ex.fe_len);
+ gdp->bg_checksum = ext3_group_desc_csum(sbi, ac->ac_b_ex.fe_group, gdp);
spin_unlock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
percpu_counter_mod(&sbi->s_freeblocks_counter, - ac->ac_b_ex.fe_len);

@@ -4346,6 +4366,7 @@ do_more:
spin_lock(sb_bgl_lock(sbi, block_group));
gdp->bg_free_blocks_count =
cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count) + count);
+ gdp->bg_checksum = ext3_group_desc_csum(sbi, block_group, gdp);
spin_unlock(sb_bgl_lock(sbi, block_group));
percpu_counter_mod(&sbi->s_freeblocks_counter, count);


Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-10-02 14:23:50

by Eric Sandeen

[permalink] [raw]
Subject: Re: Ext4 devel interlock meeting minutes (October 1, 2007)

Andreas Dilger wrote:
> On Oct 01, 2007 15:10 -0700, Avantika Mathur wrote:
>> Delayed Allocation:
>> - There has been an lkml thread about these patches.
>> - We have and approach that works for ext4, implemented at the vfs
>> level, but unless we can prove it can work for other filesystems, it
>> will not be accepted.
>> - Christoph Hellwig has commented that these patches will not work for
>> XFS.
>
> Hmm, but I thought that Christoph also agreed that it would be OK to
> get the ext4 delalloc code merged separately, so long as it doesn't
> need big/any changes to the VFS to implement it. It might be that the
> ext4 and XFS code is different enough that they cannot share the
> delayed allocation code.

IIRC there is only one change to the VFS, a test for buffer_delay() in
__block_write_full_page(). XFS is the only current caller of
set_buffer_delay, and XFS does not even use the block_write_full_page path.

The rest was a huge chunk dropped into mpage.c, but not interfering with
anything else.

So, I don't see any conflict.

-Eric

2007-10-02 18:10:49

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: Ext4 devel interlock meeting minutes (October 1, 2007)



Andreas Dilger wrote:
> On Oct 02, 2007 11:24 +0200, Valerie Clement wrote:
>> Currently, the mballoc feature is not compatible with the uninit_groups
>> feature. I have just tried a simple test which failed. Isn't this a problem?
>
> I thought Avantika posted the incremental patch to ext4 mballoc to work
> with the uninit groups patch? This is what we have in our ext3 patch:
>
>
>

This is what i have right now. The balloc.c and group.h changes should actually
go in uninitialized block group patch. This patch is slightly different from the
one posted by Andreas. Here instead of adding ext4_block_group descriptor again
to ext4_group_info i am explicitly calling ext4_get_group_desc.

-aneesh


diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 321ad1e..a8aebf2 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -106,6 +106,16 @@ unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
for (bit = (ext4_inode_table(sb, gdp) - start),
bit_max = bit + sbi->s_itb_per_group; bit < bit_max; bit++)
ext4_set_bit(bit, bh->b_data);
+
+ /*
+ * Also if the number of blocks within the group is less than the
+ * blocksize * 8 ( which is the size of bitmap ), set rest of the
+ * block bitmap to 1
+ */
+ for (bit = EXT4_BLOCKS_PER_GROUP(sb);
+ bit < sb->s_blocksize * 8; bit++) {
+ ext4_set_bit(bit, bh->b_data);
+ }
}

return free_blocks - sbi->s_itb_per_group - 2;
diff --git a/fs/ext4/group.h b/fs/ext4/group.h
index 9310979..5165311 100644
--- a/fs/ext4/group.h
+++ b/fs/ext4/group.h
@@ -8,9 +8,6 @@

#ifndef _LINUX_EXT4_GROUP_H
#define _LINUX_EXT4_GROUP_H
-#if defined(CONFIG_CRC16)
-#include <linux/crc16.h>
-#endif

extern __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 group,
struct ext4_group_desc *gdp);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 5ffc80b..28ad4fc 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -34,6 +34,7 @@
#include <linux/pagemap.h>
#include <linux/seq_file.h>
#include <linux/version.h>
+#include "group.h"

/*
* MUSTDO:
@@ -893,6 +894,14 @@ static int ext4_mb_init_cache(struct page *page, char *incore)
continue;
}

+ if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
+
+ ext4_init_block_bitmap(sb, bh[i],
+ first_group + i, desc);
+ set_buffer_uptodate(bh[i]);
+ unlock_buffer(bh[i]);
+ continue;
+ }
get_bh(bh[i]);
bh[i]->b_end_io = end_buffer_read_sync;
submit_bh(READ, bh[i]);
@@ -1702,11 +1711,10 @@ static void ext4_mb_scan_aligned(struct ext4_allocation_context *ac,
static int ext4_mb_good_group(struct ext4_allocation_context *ac,
int group, int cr)
{
+ unsigned free, fragments;
+ unsigned i, bits;
+ struct ext4_group_desc *desc;
struct ext4_group_info *grp = EXT4_GROUP_INFO(ac->ac_sb, group);
- unsigned free;
- unsigned fragments;
- unsigned i;
- unsigned bits;

BUG_ON(cr < 0 || cr >= 4);
BUG_ON(EXT4_MB_GRP_NEED_INIT(grp));
@@ -1721,6 +1729,11 @@ static int ext4_mb_good_group(struct ext4_allocation_context *ac,
switch (cr) {
case 0:
BUG_ON(ac->ac_2order == 0);
+ /* If this group is uninitialized, skip it initially */
+ desc = ext4_get_group_desc(ac->ac_sb, group, NULL);
+ if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))
+ return 0;
+
bits = ac->ac_sb->s_blocksize_bits + 1;
for (i = ac->ac_2order; i <= bits; i++)
if (grp->bb_counters[i] > 0)
@@ -1805,6 +1818,7 @@ repeat:
ac->ac_criteria = cr;
for (i = 0; i < EXT4_SB(sb)->s_groups_count; group++, i++) {
struct ext4_group_info *grp;
+ struct ext4_group_desc *desc;

if (group == EXT4_SB(sb)->s_groups_count)
group = 0;
@@ -1844,12 +1858,16 @@ repeat:
}

ac->ac_groups_scanned++;
- if (cr == 0)
+ desc = ext4_get_group_desc(sb, group, NULL);
+ if (cr == 0 || (desc->bg_flags &
+ cpu_to_le16(EXT4_BG_BLOCK_UNINIT) &&
+ ac->ac_2order != 0)) {
ext4_mb_simple_scan_group(ac, &e4b);
- else if (cr == 1 && ac->ac_g_ex.fe_len == sbi->s_stripe)
+ } else if (cr == 1 && ac->ac_g_ex.fe_len == sbi->s_stripe) {
ext4_mb_scan_aligned(ac, &e4b);
- else
+ } else {
ext4_mb_complex_scan_group(ac, &e4b);
+ }

ext4_unlock_group(sb, group);
ext4_mb_release_desc(&e4b);
@@ -2267,11 +2285,8 @@ static void ext4_mb_store_history(struct ext4_allocation_context *ac)

static int ext4_mb_init_backend(struct super_block *sb)
{
+ int i, j, len, metalen;
struct ext4_sb_info *sbi = EXT4_SB(sb);
- int i;
- int j;
- int len;
- int metalen;
int num_meta_group_infos =
(sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) >>
EXT4_DESC_PER_BLOCK_BITS(sb);
@@ -2321,7 +2336,7 @@ static int ext4_mb_init_backend(struct super_block *sb)
sbi->s_group_info[i >> EXT4_DESC_PER_BLOCK_BITS(sb)];
j = i & (EXT4_DESC_PER_BLOCK(sb) - 1);

- meta_group_info[j] = kmalloc(len, GFP_KERNEL);
+ meta_group_info[j] = kzalloc(len, GFP_KERNEL);
if (meta_group_info[j] == NULL) {
printk(KERN_ERR "EXT4-fs: can't allocate buddy mem\n");
i--;
@@ -2333,7 +2348,6 @@ static int ext4_mb_init_backend(struct super_block *sb)
"EXT4-fs: can't read descriptor %u\n", i);
goto err_freebuddy;
}
- memset(meta_group_info[j], 0, len);
set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT,
&meta_group_info[j]->bb_state);

@@ -2919,9 +2933,17 @@ static int ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
ac->ac_b_ex.fe_len);

spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
+ if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
+ gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
+ gdp->bg_free_blocks_count =
+ cpu_to_le16(ext4_free_blocks_after_init(sb,
+ ac->ac_b_ex.fe_group,
+ gdp));
+ }
gdp->bg_free_blocks_count =
cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count)
- ac->ac_b_ex.fe_len);
+ gdp->bg_checksum = ext4_group_desc_csum(sbi, ac->ac_b_ex.fe_group, gdp);
spin_unlock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
percpu_counter_mod(&sbi->s_freeblocks_counter, - ac->ac_b_ex.fe_len);

@@ -4066,7 +4088,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
#if 0
static int ext4_mballoc_warning = 0;
if (ext4_mballoc_warning++ == 0)
- printk(KERN_ERR "EXT3-fs: multiblock request with "
+ printk(KERN_ERR "EXT4-fs: multiblock request with "
"mballoc disabled!\n");
ar->len = 1;
#endif
@@ -4353,6 +4375,7 @@ do_more:
spin_lock(sb_bgl_lock(sbi, block_group));
gdp->bg_free_blocks_count =
cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count) + count);
+ gdp->bg_checksum = ext4_group_desc_csum(sbi, block_group, gdp);
spin_unlock(sb_bgl_lock(sbi, block_group));
percpu_counter_mod(&sbi->s_freeblocks_counter, count);