LinuxLists.cc - [PATCH V6 RESEND] ext4: add new online resize

2011-12-30 14:41:33

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND] ext4: add new online resize

Hi all,

Here is the 6th veriosn of patch series adding new resize implementation
to ext4.

-- What's new resize implementation?
It is a new online resize interface for ext4. It can be used via
ioctl with EXT4_IOC_RESIZE_FS and a 64 bit integer indicating size
of the resized fs in block.

-- Difference between current resize and new resize.
New resize lets kernel do all work, like allocating bitmaps and
inode tables and can support flex_bg and BLOCK_UNINIT features.
Besides these, new resize is much faster than current resize.

Below are benchmarks I made on my personal computer, fses with
flex_bg size = 16 were resized to 230GB evry time. The first
row shows the size of a fs from which the fs was resized to 230GB.
The datas were collected by 'time resize2fs'.

new resize
20GB 50GB 100GB
real 0m3.558s 0m2.891s 0m0.394s
user 0m0.004s 0m0.000s 0m0.394s
sys 0m0.048s 0m0.048s 0m0.028s

current resize
20GB 50GB 100GB
real 5m2.770s 4m43.757s 3m14.840s
user 0m0.040s 0m0.032s 0m0.024s
sys 0m0.464s 0m0.432s 0m0.324s

According to data above, new resize is faster than current resize in both
user and sys time. New resize performs well in sys time, because it
supports BLOCK_UNINIT and adds multi-groups each time.

-- About supporting new features.
YES! New resize can support new feature like bigalloc and exclude bitmap
easily. Because it lets kernel do all work.

V5->V6:
add code initializing inode bitmap and inode tables back.

V4->V5:
release resizing error lock in error case of IOC_RESIZEFS
Thanks Djalal <[email protected]> for pointing it out and his
patch for IOC_GROUP_EXTEND and IOC_GROUP_ADD.

V3->V4:
rename __ext4_group_extend to ext4_group_extend_no_check

V2->V3:
initialize block bitmap of last group.
remove code initializing inode bitmap and inode tables.

Yongqiang.

2011-12-30 14:41:36

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 01/15] ext4: add a function which extends a group without checking parameters

From: Yongqiang Yang <[email protected]>

This patch added a function named ext4_group_extend_no_check() whose code
is copied from ext4_group_extend(). ext4_group_extend_no_check() assumes
the parameter is valid and has been checked by caller.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/resize.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 996780a..ac5565c 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -969,6 +969,59 @@ exit_put:
} /* ext4_group_add */

/*
+ * extend a group without checking assuming that checking has been done.
+ */
+static int ext4_group_extend_no_check(struct super_block *sb,
+ ext4_fsblk_t o_blocks_count, ext4_grpblk_t add)
+{
+ struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+ handle_t *handle;
+ int err = 0, err2;
+
+ /* We will update the superblock, one block bitmap, and
+ * one group descriptor via ext4_group_add_blocks().
+ */
+ handle = ext4_journal_start_sb(sb, 3);
+ if (IS_ERR(handle)) {
+ err = PTR_ERR(handle);
+ ext4_warning(sb, "error %d on journal start", err);
+ goto out;
+ }
+
+ err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh);
+ if (err) {
+ ext4_warning(sb, "error %d on journal write access", err);
+ ext4_journal_stop(handle);
+ goto out;
+ }
+
+ ext4_blocks_count_set(es, o_blocks_count + add);
+ ext4_debug("freeing blocks %llu through %llu\n", o_blocks_count,
+ o_blocks_count + add);
+ /* We add the blocks to the bitmap and set the group need init bit */
+ err = ext4_group_add_blocks(handle, sb, o_blocks_count, add);
+ if (err)
+ goto exit_journal;
+ ext4_handle_dirty_super(handle, sb);
+ ext4_debug("freed blocks %llu through %llu\n", o_blocks_count,
+ o_blocks_count + add);
+exit_journal:
+ err2 = ext4_journal_stop(handle);
+ if (err2 && !err)
+ err = err2;
+
+ if (!err) {
+ if (test_opt(sb, DEBUG))
+ printk(KERN_DEBUG "EXT4-fs: extended group to %llu "
+ "blocks\n", ext4_blocks_count(es));
+ update_backups(sb, EXT4_SB(sb)->s_sbh->b_blocknr, (char *)es,
+ sizeof(struct ext4_super_block));
+ }
+out:
+ return err;
+}
+
+/*
* Extend the filesystem to the new number of blocks specified. This entry
* point is only used to extend the current filesystem to the end of the last
* existing group. It can be accessed via ioctl, or by "remount,resize=<size>"
--
1.7.5.1

2011-12-30 14:41:40

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 03/15] ext4: add a function which sets up a new group desc

From: Yongqiang Yang <[email protected]>

This patch adds a function named ext4_setup_new_desc() which sets
up a new group descriptor and whose code is copied from ext4_group_add().

The function will be used by new resize implementation.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/resize.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 9a5486e..a50eab3 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -777,6 +777,61 @@ out:
return err;
}

+/*
+ * ext4_setup_new_desc() sets up group descriptors specified by @input.
+ *
+ * @handle: journal handle
+ * @sb: super block
+ */
+static int ext4_setup_new_desc(handle_t *handle, struct super_block *sb,
+ struct ext4_new_group_data *input)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ ext4_group_t group;
+ struct ext4_group_desc *gdp;
+ struct buffer_head *gdb_bh;
+ int gdb_off, gdb_num, err = 0;
+
+ group = input->group;
+
+ gdb_off = group % EXT4_DESC_PER_BLOCK(sb);
+ gdb_num = group / EXT4_DESC_PER_BLOCK(sb);
+
+ /*
+ * get_write_access() has been called on gdb_bh by ext4_add_new_desc().
+ */
+ gdb_bh = sbi->s_group_desc[gdb_num];
+ /* Update group descriptor block for new group */
+ gdp = (struct ext4_group_desc *)((char *)gdb_bh->b_data +
+ gdb_off * EXT4_DESC_SIZE(sb));
+
+ memset(gdp, 0, EXT4_DESC_SIZE(sb));
+ /* LV FIXME */
+ memset(gdp, 0, EXT4_DESC_SIZE(sb));
+ ext4_block_bitmap_set(sb, gdp, input->block_bitmap); /* LV FIXME */
+ ext4_inode_bitmap_set(sb, gdp, input->inode_bitmap); /* LV FIXME */
+ ext4_inode_table_set(sb, gdp, input->inode_table); /* LV FIXME */
+ ext4_free_group_clusters_set(sb, gdp,
+ EXT4_B2C(sbi, input->free_blocks_count));
+ ext4_free_inodes_set(sb, gdp, EXT4_INODES_PER_GROUP(sb));
+ gdp->bg_flags = cpu_to_le16(EXT4_BG_INODE_ZEROED);
+ gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);
+
+ err = ext4_handle_dirty_metadata(handle, NULL, gdb_bh);
+ if (unlikely(err)) {
+ ext4_std_error(sb, err);
+ return err;
+ }
+
+ /*
+ * We can allocate memory for mb_alloc based on the new group
+ * descriptor
+ */
+ err = ext4_mb_add_groupinfo(sb, group, gdp);
+
+ return err;
+}
+
/* Add group descriptor data to an existing or new group descriptor block.
* Ensure we handle all possible error conditions _before_ we start modifying
* the filesystem, because we cannot abort the transaction and not have it
--
1.7.5.1

2011-12-30 14:41:38

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 02/15] ext4: add a function which adds a new desc to a fs

From: Yongqiang Yang <[email protected]>

This patch adds a function named ext4_add_new_desc() which adds
a new desc to a fs and whose code is copied from ext4_group_add().

The function will be used by new resize implementation.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/resize.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index ac5565c..9a5486e 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -735,6 +735,48 @@ exit_err:
}
}

+/*
+ * ext4_add_new_desc() adds group descriptor of group @group
+ *
+ * @handle: journal handle
+ * @sb; super block
+ * @group: the group no. of the first group desc to be added
+ * @resize_inode: the resize inode
+ */
+static int ext4_add_new_desc(handle_t *handle, struct super_block *sb,
+ ext4_group_t group, struct inode *resize_inode)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+ struct buffer_head *gdb_bh;
+ int gdb_off, gdb_num, err = 0;
+ int reserved_gdb = ext4_bg_has_super(sb, group) ?
+ le16_to_cpu(es->s_reserved_gdt_blocks) : 0;
+
+ gdb_off = group % EXT4_DESC_PER_BLOCK(sb);
+ gdb_num = group / EXT4_DESC_PER_BLOCK(sb);
+
+ /*
+ * We will only either add reserved group blocks to a backup group
+ * or remove reserved blocks for the first group in a new group block.
+ * Doing both would be mean more complex code, and sane people don't
+ * use non-sparse filesystems anymore. This is already checked above.
+ */
+ if (gdb_off) {
+ gdb_bh = sbi->s_group_desc[gdb_num];
+ err = ext4_journal_get_write_access(handle, gdb_bh);
+ if (err)
+ goto out;
+
+ if (reserved_gdb && ext4_bg_num_gdb(sb, group))
+ err = reserve_backup_gdb(handle, resize_inode, group);
+ } else
+ err = add_new_gdb(handle, resize_inode, group);
+
+out:
+ return err;
+}
+
/* Add group descriptor data to an existing or new group descriptor block.
* Ensure we handle all possible error conditions _before_ we start modifying
* the filesystem, because we cannot abort the transaction and not have it
--
1.7.5.1

2011-12-30 14:41:42

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 04/15] ext4: add a function which updates super block

From: Yongqiang Yang <[email protected]>

This patch adds a function named ext4_update_super() which updates
super block and whose code is copied from ext4_group_add().

The function will be used by new resize implementation.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/resize.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 72 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index a50eab3..859b63f 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -832,6 +832,78 @@ static int ext4_setup_new_desc(handle_t *handle, struct super_block *sb,
return err;
}

+/*
+ * ext4_update_super() updates super so that new the added group can be seen
+ * by the filesystem.
+ *
+ * @sb: super block
+ */
+static void ext4_update_super(struct super_block *sb,
+ struct ext4_new_group_data *input)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+
+ /*
+ * Make the new blocks and inodes valid next. We do this before
+ * increasing the group count so that once the group is enabled,
+ * all of its blocks and inodes are already valid.
+ *
+ * We always allocate group-by-group, then block-by-block or
+ * inode-by-inode within a group, so enabling these
+ * blocks/inodes before the group is live won't actually let us
+ * allocate the new space yet.
+ */
+ ext4_blocks_count_set(es, ext4_blocks_count(es) +
+ input->blocks_count);
+ le32_add_cpu(&es->s_inodes_count, EXT4_INODES_PER_GROUP(sb));
+
+ /*
+ * We need to protect s_groups_count against other CPUs seeing
+ * inconsistent state in the superblock.
+ *
+ * The precise rules we use are:
+ *
+ * * Writers must perform a smp_wmb() after updating all dependent
+ * data and before modifying the groups count
+ *
+ * * Readers must perform an smp_rmb() after reading the groups count
+ * and before reading any dependent data.
+ *
+ * NB. These rules can be relaxed when checking the group count
+ * while freeing data, as we can only allocate from a block
+ * group after serialising against the group count, and we can
+ * only then free after serialising in turn against that
+ * allocation.
+ */
+ smp_wmb();
+
+ /* Update the global fs size fields */
+ sbi->s_groups_count++;
+
+ /* Update the reserved block counts only once the new group is
+ * active. */
+ ext4_r_blocks_count_set(es, ext4_r_blocks_count(es) +
+ input->reserved_blocks);
+
+ /* Update the free space counts */
+ percpu_counter_add(&sbi->s_freeclusters_counter,
+ EXT4_B2C(sbi, input->free_blocks_count));
+ percpu_counter_add(&sbi->s_freeinodes_counter,
+ EXT4_INODES_PER_GROUP(sb));
+
+ if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG) &&
+ sbi->s_log_groups_per_flex) {
+ ext4_group_t flex_group;
+ flex_group = ext4_flex_group(sbi, input->group);
+ atomic_add(EXT4_B2C(sbi, input->free_blocks_count),
+ &sbi->s_flex_groups[flex_group].free_clusters);
+ atomic_add(EXT4_INODES_PER_GROUP(sb),
+ &sbi->s_flex_groups[flex_group].free_inodes);
+ }
+
+}
+
/* Add group descriptor data to an existing or new group descriptor block.
* Ensure we handle all possible error conditions _before_ we start modifying
* the filesystem, because we cannot abort the transaction and not have it
--
1.7.5.1

2011-12-30 14:41:44

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 05/15] ext4: add a structure which will be used by 64bit-resize interface

From: Yongqiang Yang <[email protected]>

This patch adds a structure which will be used by 64bit-resize interface.
Two functions which allocate and destroy the structure respectively are
added.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/resize.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 859b63f..78f3bab 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -134,6 +134,61 @@ static int verify_group_input(struct super_block *sb,
return err;
}

+/*
+ * ext4_new_flex_group_data is used by 64bit-resize interface to add a flex
+ * group each time.
+ */
+struct ext4_new_flex_group_data {
+ struct ext4_new_group_data *groups; /* new_group_data for groups
+ in the flex group */
+ __u16 *bg_flags; /* block group flags of groups
+ in @groups */
+ ext4_group_t count; /* number of groups in @groups
+ */
+};
+
+/*
+ * alloc_flex_gd() allocates a ext4_new_flex_group_data with size of
+ * @flexbg_size.
+ *
+ * Returns NULL on failure otherwise address of the allocated structure.
+ */
+static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned long flexbg_size)
+{
+ struct ext4_new_flex_group_data *flex_gd;
+
+ flex_gd = kmalloc(sizeof(*flex_gd), GFP_NOFS);
+ if (flex_gd == NULL)
+ goto out3;
+
+ flex_gd->count = flexbg_size;
+
+ flex_gd->groups = kmalloc(sizeof(struct ext4_new_group_data) *
+ flexbg_size, GFP_NOFS);
+ if (flex_gd->groups == NULL)
+ goto out2;
+
+ flex_gd->bg_flags = kmalloc(flexbg_size * sizeof(__u16), GFP_NOFS);
+ if (flex_gd->bg_flags == NULL)
+ goto out1;
+
+ return flex_gd;
+
+out1:
+ kfree(flex_gd->groups);
+out2:
+ kfree(flex_gd);
+out3:
+ return NULL;
+}
+
+void free_flex_gd(struct ext4_new_flex_group_data *flex_gd)
+{
+ kfree(flex_gd->bg_flags);
+ kfree(flex_gd->groups);
+ kfree(flex_gd);
+}
+
static struct buffer_head *bclean(handle_t *handle, struct super_block *sb,
ext4_fsblk_t blk)
{
--
1.7.5.1

2011-12-30 14:41:46

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 06/15] ext4: add a function which sets up group blocks of a flex groups

From: Yongqiang Yang <[email protected]>

This patch adds a function named setup_new_flex_group_blocks() which
sets up group blocks of a flex groups.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/ext4.h | 8 ++
fs/ext4/resize.c | 250 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 258 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0e43bba..05058e2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -511,6 +511,14 @@ struct ext4_new_group_data {
__u32 free_blocks_count;
};

+/* Indexes used to index group tables in ext4_new_group_data */
+enum {
+ BLOCK_BITMAP = 0, /* block bitmap */
+ INODE_BITMAP, /* inode bitmap */
+ INODE_TABLE, /* inode tables */
+ GROUP_TABLE_COUNT,
+};
+
/*
* Flags used by ext4_map_blocks()
*/
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 78f3bab..3024769 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -234,6 +234,256 @@ static int extend_or_restart_transaction(handle_t *handle, int thresh)
}

/*
+ * set_flexbg_block_bitmap() mark @count blocks starting from @block used.
+ *
+ * Helper function for ext4_setup_new_group_blocks() which set .
+ *
+ * @sb: super block
+ * @handle: journal handle
+ * @flex_gd: flex group data
+ */
+static int set_flexbg_block_bitmap(struct super_block *sb, handle_t *handle,
+ struct ext4_new_flex_group_data *flex_gd,
+ ext4_fsblk_t block, ext4_group_t count)
+{
+ ext4_group_t count2;
+
+ ext4_debug("mark blocks [%llu/%u] used\n", block, count);
+ for (count2 = count; count > 0; count -= count2, block += count2) {
+ ext4_fsblk_t start;
+ struct buffer_head *bh;
+ ext4_group_t group;
+ int err;
+
+ ext4_get_group_no_and_offset(sb, block, &group, NULL);
+ start = ext4_group_first_block_no(sb, group);
+ group -= flex_gd->groups[0].group;
+
+ count2 = sb->s_blocksize * 8 - (block - start);
+ if (count2 > count)
+ count2 = count;
+
+ if (flex_gd->bg_flags[group] & EXT4_BG_BLOCK_UNINIT) {
+ BUG_ON(flex_gd->count > 1);
+ continue;
+ }
+
+ err = extend_or_restart_transaction(handle, 1);
+ if (err)
+ return err;
+
+ bh = sb_getblk(sb, flex_gd->groups[group].block_bitmap);
+ if (!bh)
+ return -EIO;
+
+ err = ext4_journal_get_write_access(handle, bh);
+ if (err)
+ return err;
+ ext4_debug("mark block bitmap %#04llx (+%llu/%u)\n", block,
+ block - start, count2);
+ ext4_set_bits(bh->b_data, block - start, count2);
+
+ err = ext4_handle_dirty_metadata(handle, NULL, bh);
+ if (unlikely(err))
+ return err;
+ brelse(bh);
+ }
+
+ return 0;
+}
+
+/*
+ * Set up the block and inode bitmaps, and the inode table for the new groups.
+ * This doesn't need to be part of the main transaction, since we are only
+ * changing blocks outside the actual filesystem. We still do journaling to
+ * ensure the recovery is correct in case of a failure just after resize.
+ * If any part of this fails, we simply abort the resize.
+ *
+ * setup_new_flex_group_blocks handles a flex group as follow:
+ * 1. copy super block and GDT, and initialize group tables if necessary.
+ * In this step, we only set bits in blocks bitmaps for blocks taken by
+ * super block and GDT.
+ * 2. allocate group tables in block bitmaps, that is, set bits in block
+ * bitmap for blocks taken by group tables.
+ */
+static int setup_new_flex_group_blocks(struct super_block *sb,
+ struct ext4_new_flex_group_data *flex_gd)
+{
+ int group_table_count[] = {1, 1, EXT4_SB(sb)->s_itb_per_group};
+ ext4_fsblk_t start;
+ ext4_fsblk_t block;
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+ struct ext4_new_group_data *group_data = flex_gd->groups;
+ __u16 *bg_flags = flex_gd->bg_flags;
+ handle_t *handle;
+ ext4_group_t group, count;
+ struct buffer_head *bh = NULL;
+ int reserved_gdb, i, j, err = 0, err2;
+
+ BUG_ON(!flex_gd->count || !group_data ||
+ group_data[0].group != sbi->s_groups_count);
+
+ reserved_gdb = le16_to_cpu(es->s_reserved_gdt_blocks);
+
+ /* This transaction may be extended/restarted along the way */
+ handle = ext4_journal_start_sb(sb, EXT4_MAX_TRANS_DATA);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+
+ group = group_data[0].group;
+ for (i = 0; i < flex_gd->count; i++, group++) {
+ unsigned long gdblocks;
+
+ gdblocks = ext4_bg_num_gdb(sb, group);
+ start = ext4_group_first_block_no(sb, group);
+
+ /* Copy all of the GDT blocks into the backup in this group */
+ for (j = 0, block = start + 1; j < gdblocks; j++, block++) {
+ struct buffer_head *gdb;
+
+ ext4_debug("update backup group %#04llx\n", block);
+ err = extend_or_restart_transaction(handle, 1);
+ if (err)
+ goto out;
+
+ gdb = sb_getblk(sb, block);
+ if (!gdb) {
+ err = -EIO;
+ goto out;
+ }
+
+ err = ext4_journal_get_write_access(handle, gdb);
+ if (err) {
+ brelse(gdb);
+ goto out;
+ }
+ memcpy(gdb->b_data, sbi->s_group_desc[j]->b_data,
+ gdb->b_size);
+ set_buffer_uptodate(gdb);
+
+ err = ext4_handle_dirty_metadata(handle, NULL, gdb);
+ if (unlikely(err)) {
+ brelse(gdb);
+ goto out;
+ }
+ brelse(gdb);
+ }
+
+ /* Zero out all of the reserved backup group descriptor
+ * table blocks
+ */
+ if (ext4_bg_has_super(sb, group)) {
+ err = sb_issue_zeroout(sb, gdblocks + start + 1,
+ reserved_gdb, GFP_NOFS);
+ if (err)
+ goto out;
+ }
+
+ /* Initialize group tables of the grop @group */
+ if (!(bg_flags[i] & EXT4_BG_INODE_ZEROED))
+ goto handle_bb;
+
+ /* Zero out all of the inode table blocks */
+ block = group_data[i].inode_table;
+ ext4_debug("clear inode table blocks %#04llx -> %#04lx\n",
+ block, sbi->s_itb_per_group);
+ err = sb_issue_zeroout(sb, block, sbi->s_itb_per_group,
+ GFP_NOFS);
+ if (err)
+ goto out;
+
+handle_bb:
+ if (bg_flags[i] & EXT4_BG_BLOCK_UNINIT)
+ goto handle_ib;
+
+ /* Initialize block bitmap of the @group */
+ block = group_data[i].block_bitmap;
+ err = extend_or_restart_transaction(handle, 1);
+ if (err)
+ goto out;
+
+ bh = bclean(handle, sb, block);
+ if (IS_ERR(bh)) {
+ err = PTR_ERR(bh);
+ goto out;
+ }
+ if (ext4_bg_has_super(sb, group)) {
+ ext4_debug("mark backup superblock %#04llx (+0)\n",
+ start);
+ ext4_set_bits(bh->b_data, 0, gdblocks + reserved_gdb +
+ 1);
+ }
+ ext4_mark_bitmap_end(group_data[i].blocks_count,
+ sb->s_blocksize * 8, bh->b_data);
+ err = ext4_handle_dirty_metadata(handle, NULL, bh);
+ if (err)
+ goto out;
+ brelse(bh);
+
+handle_ib:
+ if (bg_flags[i] & EXT4_BG_INODE_UNINIT)
+ continue;
+
+ /* Initialize inode bitmap of the @group */
+ block = group_data[i].inode_bitmap;
+ err = extend_or_restart_transaction(handle, 1);
+ if (err)
+ goto out;
+ /* Mark unused entries in inode bitmap used */
+ bh = bclean(handle, sb, block);
+ if (IS_ERR(bh)) {
+ err = PTR_ERR(bh);
+ goto out;
+ }
+
+ ext4_mark_bitmap_end(EXT4_INODES_PER_GROUP(sb),
+ sb->s_blocksize * 8, bh->b_data);
+ err = ext4_handle_dirty_metadata(handle, NULL, bh);
+ if (err)
+ goto out;
+ brelse(bh);
+ }
+ bh = NULL;
+
+ /* Mark group tables in block bitmap */
+ for (j = 0; j < GROUP_TABLE_COUNT; j++) {
+ count = group_table_count[j];
+ start = (&group_data[0].block_bitmap)[j];
+ block = start;
+ for (i = 1; i < flex_gd->count; i++) {
+ block += group_table_count[j];
+ if (block == (&group_data[i].block_bitmap)[j]) {
+ count += group_table_count[j];
+ continue;
+ }
+ err = set_flexbg_block_bitmap(sb, handle,
+ flex_gd, start, count);
+ if (err)
+ goto out;
+ count = group_table_count[j];
+ start = group_data[i].block_bitmap;
+ block = start;
+ }
+
+ if (count) {
+ err = set_flexbg_block_bitmap(sb, handle,
+ flex_gd, start, count);
+ if (err)
+ goto out;
+ }
+ }
+
+out:
+ brelse(bh);
+ err2 = ext4_journal_stop(handle);
+ if (err2 && !err)
+ err = err2;
+
+ return err;
+}
+
+/*
* Set up the block and inode bitmaps, and the inode table for the new group.
* This doesn't need to be part of the main transaction, since we are only
* changing blocks outside the actual filesystem. We still do journaling to
--
1.7.5.1

2011-12-30 14:41:48

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 07/15] ext4: add a function which adds several group descriptors

From: Yongqiang Yang <[email protected]>

This patch adds a functon named ext4_add_new_descs() which adds
several group descriptors each time.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/resize.c | 25 +++++++++++++++++++++++++
1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 3024769..d0d0ade 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1083,6 +1083,31 @@ out:
}

/*
+ * ext4_add_new_descs() adds @count group descriptor of groups
+ * starting at @group
+ *
+ * @handle: journal handle
+ * @sb; super block
+ * @group: the group no. of the first group desc to be added
+ * @resize_inode: the resize inode
+ * @count: number of group descriptors to be added
+ */
+static int ext4_add_new_descs(handle_t *handle, struct super_block *sb,
+ ext4_group_t group, struct inode *resize_inode,
+ ext4_group_t count)
+{
+ int i, err = 0;
+
+ for (i = 0; i < count; i++) {
+ err = ext4_add_new_desc(handle, sb, group + i, resize_inode);
+ if (err)
+ return err;
+ }
+
+ return err;
+}
+
+/*
* ext4_setup_new_desc() sets up group descriptors specified by @input.
*
* @handle: journal handle
--
1.7.5.1

2011-12-30 14:41:51

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 08/15] ext4: add a function which sets up a flex groups each time

From: Yongqiang Yang <[email protected]>

This patch adds a function named ext4_setup_new_descs() which sets up
a flex groups each time.

Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 25 +++++++++++++++++++++++--
1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index d0d0ade..f82cbee 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1114,7 +1114,8 @@ static int ext4_add_new_descs(handle_t *handle, struct super_block *sb,
* @sb: super block
*/
static int ext4_setup_new_desc(handle_t *handle, struct super_block *sb,
- struct ext4_new_group_data *input)
+ struct ext4_new_group_data *input,
+ __u16 bg_flags)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
ext4_group_t group;
@@ -1144,7 +1145,7 @@ static int ext4_setup_new_desc(handle_t *handle, struct super_block *sb,
ext4_free_group_clusters_set(sb, gdp,
EXT4_B2C(sbi, input->free_blocks_count));
ext4_free_inodes_set(sb, gdp, EXT4_INODES_PER_GROUP(sb));
- gdp->bg_flags = cpu_to_le16(EXT4_BG_INODE_ZEROED);
+ gdp->bg_flags = cpu_to_le16(bg_flags);
gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);

err = ext4_handle_dirty_metadata(handle, NULL, gdb_bh);
@@ -1163,6 +1164,26 @@ static int ext4_setup_new_desc(handle_t *handle, struct super_block *sb,
}

/*
+ * ext4_setup_new_descs setups group descriptors of a flex groups
+ */
+static int ext4_setup_new_descs(handle_t *handle, struct super_block *sb,
+ struct ext4_new_flex_group_data *flex_gd)
+{
+ struct ext4_new_group_data *group_data = flex_gd->groups;
+ __u16 *bg_flags = flex_gd->bg_flags;
+ int i, err = 0;
+
+ for (i = 0; i < flex_gd->count; i++) {
+ err = ext4_setup_new_desc(handle, sb, group_data + i,
+ bg_flags[i]);
+ if (err)
+ return err;
+ }
+
+ return err;
+}
+
+/*
* ext4_update_super() updates super so that new the added group can be seen
* by the filesystem.
*
--
1.7.5.1

2011-12-30 14:41:53

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 09/15] ext4: enable ext4_update_super() to handle a flex groups

From: Yongqiang Yang <[email protected]>

This patch enables ext4_update_super() to handle a flex groups.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/resize.c | 58 +++++++++++++++++++++++++++++++++++++----------------
1 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index f82cbee..bcc236f 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1184,17 +1184,24 @@ static int ext4_setup_new_descs(handle_t *handle, struct super_block *sb,
}

/*
- * ext4_update_super() updates super so that new the added group can be seen
- * by the filesystem.
+ * ext4_update_super() updates super block so that new added groups can be seen
+ * by the filesystem.
*
* @sb: super block
+ * @flex_gd: new added groups
*/
static void ext4_update_super(struct super_block *sb,
- struct ext4_new_group_data *input)
+ struct ext4_new_flex_group_data *flex_gd)
{
+ ext4_fsblk_t blocks_count = 0;
+ ext4_fsblk_t free_blocks = 0;
+ ext4_fsblk_t reserved_blocks = 0;
+ struct ext4_new_group_data *group_data = flex_gd->groups;
struct ext4_sb_info *sbi = EXT4_SB(sb);
struct ext4_super_block *es = sbi->s_es;
+ int i;

+ BUG_ON(flex_gd->count == 0 || group_data == NULL);
/*
* Make the new blocks and inodes valid next. We do this before
* increasing the group count so that once the group is enabled,
@@ -1205,9 +1212,19 @@ static void ext4_update_super(struct super_block *sb,
* blocks/inodes before the group is live won't actually let us
* allocate the new space yet.
*/
- ext4_blocks_count_set(es, ext4_blocks_count(es) +
- input->blocks_count);
- le32_add_cpu(&es->s_inodes_count, EXT4_INODES_PER_GROUP(sb));
+ for (i = 0; i < flex_gd->count; i++) {
+ blocks_count += group_data[i].blocks_count;
+ free_blocks += group_data[i].free_blocks_count;
+ }
+
+ reserved_blocks = ext4_r_blocks_count(es) * 100;
+ do_div(reserved_blocks, ext4_blocks_count(es));
+ reserved_blocks *= blocks_count;
+ do_div(reserved_blocks, 100);
+
+ ext4_blocks_count_set(es, ext4_blocks_count(es) + blocks_count);
+ le32_add_cpu(&es->s_inodes_count, EXT4_INODES_PER_GROUP(sb) *
+ flex_gd->count);

/*
* We need to protect s_groups_count against other CPUs seeing
@@ -1215,11 +1232,11 @@ static void ext4_update_super(struct super_block *sb,
*
* The precise rules we use are:
*
- * * Writers must perform a smp_wmb() after updating all dependent
- * data and before modifying the groups count
+ * * Writers must perform a smp_wmb() after updating all
+ * dependent data and before modifying the groups count
*
- * * Readers must perform an smp_rmb() after reading the groups count
- * and before reading any dependent data.
+ * * Readers must perform an smp_rmb() after reading the groups
+ * count and before reading any dependent data.
*
* NB. These rules can be relaxed when checking the group count
* while freeing data, as we can only allocate from a block
@@ -1230,29 +1247,34 @@ static void ext4_update_super(struct super_block *sb,
smp_wmb();

/* Update the global fs size fields */
- sbi->s_groups_count++;
+ sbi->s_groups_count += flex_gd->count;

/* Update the reserved block counts only once the new group is
* active. */
ext4_r_blocks_count_set(es, ext4_r_blocks_count(es) +
- input->reserved_blocks);
+ reserved_blocks);

/* Update the free space counts */
percpu_counter_add(&sbi->s_freeclusters_counter,
- EXT4_B2C(sbi, input->free_blocks_count));
+ EXT4_B2C(sbi, free_blocks));
percpu_counter_add(&sbi->s_freeinodes_counter,
- EXT4_INODES_PER_GROUP(sb));
+ EXT4_INODES_PER_GROUP(sb) * flex_gd->count);

- if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG) &&
+ if (EXT4_HAS_INCOMPAT_FEATURE(sb,
+ EXT4_FEATURE_INCOMPAT_FLEX_BG) &&
sbi->s_log_groups_per_flex) {
ext4_group_t flex_group;
- flex_group = ext4_flex_group(sbi, input->group);
- atomic_add(EXT4_B2C(sbi, input->free_blocks_count),
+ flex_group = ext4_flex_group(sbi, group_data[0].group);
+ atomic_add(EXT4_B2C(sbi, free_blocks),
&sbi->s_flex_groups[flex_group].free_clusters);
- atomic_add(EXT4_INODES_PER_GROUP(sb),
+ atomic_add(EXT4_INODES_PER_GROUP(sb) * flex_gd->count,
&sbi->s_flex_groups[flex_group].free_inodes);
}

+ if (test_opt(sb, DEBUG))
+ printk(KERN_DEBUG "EXT4-fs: added group %u:"
+ "%llu blocks(%llu free %llu reserved)\n", flex_gd->count,
+ blocks_count, free_blocks, reserved_blocks);
}

/* Add group descriptor data to an existing or new group descriptor block.
--
1.7.5.1

2011-12-30 14:41:55

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 10/15] ext4: pass verify_reserved_gdb() the number of group decriptors

From: Yongqiang Yang <[email protected]>

The 64bit resizer adds a flex group each time, so verify_reserved_gdb
can not use s_groups_count directly, it should use the number of group
decriptors before the added group.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/resize.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index bcc236f..f094373 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -656,10 +656,10 @@ static unsigned ext4_list_backups(struct super_block *sb, unsigned *three,
* groups in current filesystem that have BACKUPS, or -ve error code.
*/
static int verify_reserved_gdb(struct super_block *sb,
+ ext4_group_t end,
struct buffer_head *primary)
{
const ext4_fsblk_t blk = primary->b_blocknr;
- const ext4_group_t end = EXT4_SB(sb)->s_groups_count;
unsigned three = 1;
unsigned five = 5;
unsigned seven = 7;
@@ -734,7 +734,7 @@ static int add_new_gdb(handle_t *handle, struct inode *inode,
if (!gdb_bh)
return -EIO;

- gdbackups = verify_reserved_gdb(sb, gdb_bh);
+ gdbackups = verify_reserved_gdb(sb, group, gdb_bh);
if (gdbackups < 0) {
err = gdbackups;
goto exit_bh;
@@ -897,7 +897,8 @@ static int reserve_backup_gdb(handle_t *handle, struct inode *inode,
err = -EIO;
goto exit_bh;
}
- if ((gdbackups = verify_reserved_gdb(sb, primary[res])) < 0) {
+ gdbackups = verify_reserved_gdb(sb, group, primary[res]);
+ if (gdbackups < 0) {
brelse(primary[res]);
err = gdbackups;
goto exit_bh;
--
1.7.5.1

2011-12-30 14:41:57

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 11/15] ext4: add a new function which allocates bitmaps and inode tables

From: Yongqiang Yang <[email protected]>

This patch adds a new function named ext4_allocates_group_table() which
allcoates block bitmaps, inode bitmaps and inode tables for a flex groups and is
used by resize code.

Signed-off-by: Yongqiang Yang <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
---
fs/ext4/resize.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 111 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index f094373..e3da3bf 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -189,6 +189,117 @@ void free_flex_gd(struct ext4_new_flex_group_data *flex_gd)
kfree(flex_gd);
}

+/*
+ * ext4_alloc_group_tables() allocates block bitmaps, inode bitmaps
+ * and inode tables for a flex group.
+ *
+ * This function is used by 64bit-resize. Note that this function allocates
+ * group tables from the 1st group of groups contained by @flexgd, which may
+ * be a partial of a flex group.
+ *
+ * @sb: super block of fs to which the groups belongs
+ */
+static void ext4_alloc_group_tables(struct super_block *sb,
+ struct ext4_new_flex_group_data *flex_gd,
+ int flexbg_size)
+{
+ struct ext4_new_group_data *group_data = flex_gd->groups;
+ struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+ ext4_fsblk_t start_blk;
+ ext4_fsblk_t last_blk;
+ ext4_group_t src_group;
+ ext4_group_t bb_index = 0;
+ ext4_group_t ib_index = 0;
+ ext4_group_t it_index = 0;
+ ext4_group_t group;
+ ext4_group_t last_group;
+ unsigned overhead;
+
+ BUG_ON(flex_gd->count == 0 || group_data == NULL);
+
+ src_group = group_data[0].group;
+ last_group = src_group + flex_gd->count - 1;
+
+ BUG_ON((flexbg_size > 1) && ((src_group & ~(flexbg_size - 1)) !=
+ (last_group & ~(flexbg_size - 1))));
+next_group:
+ group = group_data[0].group;
+ start_blk = ext4_group_first_block_no(sb, src_group);
+ last_blk = start_blk + group_data[src_group - group].blocks_count;
+
+ overhead = ext4_bg_has_super(sb, src_group) ?
+ (1 + ext4_bg_num_gdb(sb, src_group) +
+ le16_to_cpu(es->s_reserved_gdt_blocks)) : 0;
+
+ start_blk += overhead;
+
+ BUG_ON(src_group >= group_data[0].group + flex_gd->count);
+ /* We collect contiguous blocks as much as possible. */
+ src_group++;
+ for (; src_group <= last_group; src_group++)
+ if (!ext4_bg_has_super(sb, src_group))
+ last_blk += group_data[src_group - group].blocks_count;
+ else
+ break;
+
+ /* Allocate block bitmaps */
+ for (; bb_index < flex_gd->count; bb_index++) {
+ if (start_blk >= last_blk)
+ goto next_group;
+ group_data[bb_index].block_bitmap = start_blk++;
+ ext4_get_group_no_and_offset(sb, start_blk - 1, &group, NULL);
+ group -= group_data[0].group;
+ group_data[group].free_blocks_count--;
+ if (flexbg_size > 1)
+ flex_gd->bg_flags[group] &= ~EXT4_BG_BLOCK_UNINIT;
+ }
+
+ /* Allocate inode bitmaps */
+ for (; ib_index < flex_gd->count; ib_index++) {
+ if (start_blk >= last_blk)
+ goto next_group;
+ group_data[ib_index].inode_bitmap = start_blk++;
+ ext4_get_group_no_and_offset(sb, start_blk - 1, &group, NULL);
+ group -= group_data[0].group;
+ group_data[group].free_blocks_count--;
+ if (flexbg_size > 1)
+ flex_gd->bg_flags[group] &= ~EXT4_BG_BLOCK_UNINIT;
+ }
+
+ /* Allocate inode tables */
+ for (; it_index < flex_gd->count; it_index++) {
+ if (start_blk + EXT4_SB(sb)->s_itb_per_group > last_blk)
+ goto next_group;
+ group_data[it_index].inode_table = start_blk;
+ ext4_get_group_no_and_offset(sb, start_blk, &group, NULL);
+ group -= group_data[0].group;
+ group_data[group].free_blocks_count -=
+ EXT4_SB(sb)->s_itb_per_group;
+ if (flexbg_size > 1)
+ flex_gd->bg_flags[group] &= ~EXT4_BG_BLOCK_UNINIT;
+
+ start_blk += EXT4_SB(sb)->s_itb_per_group;
+ }
+
+ if (test_opt(sb, DEBUG)) {
+ int i;
+ group = group_data[0].group;
+
+ printk(KERN_DEBUG "EXT4-fs: adding a flex group with "
+ "%d groups, flexbg size is %d:\n", flex_gd->count,
+ flexbg_size);
+
+ for (i = 0; i < flex_gd->count; i++) {
+ printk(KERN_DEBUG "adding %s group %u: %u "
+ "blocks (%d free)\n",
+ ext4_bg_has_super(sb, group + i) ? "normal" :
+ "no-super", group + i,
+ group_data[i].blocks_count,
+ group_data[i].free_blocks_count);
+ }
+ }
+}
+
static struct buffer_head *bclean(handle_t *handle, struct super_block *sb,
ext4_fsblk_t blk)
{
--
1.7.5.1

2011-12-30 14:41:59

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 12/15] ext4: add a new function which adds a flex group to a fs

From: Yongqiang Yang <[email protected]>

This patch adds a new function named ext4_flex_group_add() which adds a
flex group to a fs. The function is used by 64bit-resize interface.

Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 82 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index e3da3bf..3d56e70 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1389,6 +1389,88 @@ static void ext4_update_super(struct super_block *sb,
blocks_count, free_blocks, reserved_blocks);
}

+/* Add a flex group to an fs. Ensure we handle all possible error conditions
+ * _before_ we start modifying the filesystem, because we cannot abort the
+ * transaction and not have it write the data to disk.
+ */
+static int ext4_flex_group_add(struct super_block *sb,
+ struct inode *resize_inode,
+ struct ext4_new_flex_group_data *flex_gd)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+ ext4_fsblk_t o_blocks_count;
+ ext4_grpblk_t last;
+ ext4_group_t group;
+ handle_t *handle;
+ unsigned reserved_gdb;
+ int err = 0, err2 = 0, credit;
+
+ BUG_ON(!flex_gd->count || !flex_gd->groups || !flex_gd->bg_flags);
+
+ reserved_gdb = le16_to_cpu(es->s_reserved_gdt_blocks);
+ o_blocks_count = ext4_blocks_count(es);
+ ext4_get_group_no_and_offset(sb, o_blocks_count, &group, &last);
+ BUG_ON(last);
+
+ err = setup_new_flex_group_blocks(sb, flex_gd);
+ if (err)
+ goto exit;
+ /*
+ * We will always be modifying at least the superblock and GDT
+ * block. If we are adding a group past the last current GDT block,
+ * we will also modify the inode and the dindirect block. If we
+ * are adding a group with superblock/GDT backups we will also
+ * modify each of the reserved GDT dindirect blocks.
+ */
+ credit = flex_gd->count * 4 + reserved_gdb;
+ handle = ext4_journal_start_sb(sb, credit);
+ if (IS_ERR(handle)) {
+ err = PTR_ERR(handle);
+ goto exit;
+ }
+
+ err = ext4_journal_get_write_access(handle, sbi->s_sbh);
+ if (err)
+ goto exit_journal;
+
+ group = flex_gd->groups[0].group;
+ BUG_ON(group != EXT4_SB(sb)->s_groups_count);
+ err = ext4_add_new_descs(handle, sb, group,
+ resize_inode, flex_gd->count);
+ if (err)
+ goto exit_journal;
+
+ err = ext4_setup_new_descs(handle, sb, flex_gd);
+ if (err)
+ goto exit_journal;
+
+ ext4_update_super(sb, flex_gd);
+
+ err = ext4_handle_dirty_super(handle, sb);
+
+exit_journal:
+ err2 = ext4_journal_stop(handle);
+ if (!err)
+ err = err2;
+
+ if (!err) {
+ int i;
+ update_backups(sb, sbi->s_sbh->b_blocknr, (char *)es,
+ sizeof(struct ext4_super_block));
+ for (i = 0; i < flex_gd->count; i++, group++) {
+ struct buffer_head *gdb_bh;
+ int gdb_num;
+ gdb_num = group / EXT4_BLOCKS_PER_GROUP(sb);
+ gdb_bh = sbi->s_group_desc[gdb_num];
+ update_backups(sb, gdb_bh->b_blocknr, gdb_bh->b_data,
+ gdb_bh->b_size);
+ }
+ }
+exit:
+ return err;
+}
+
/* Add group descriptor data to an existing or new group descriptor block.
* Ensure we handle all possible error conditions _before_ we start modifying
* the filesystem, because we cannot abort the transaction and not have it
--
1.7.5.1

2011-12-30 14:42:03

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 14/15] ext4: let ext4_group_extend() use common code

From: Yongqiang Yang <[email protected]>

ext4_group_extend_no_check() is moved out from ext4_group_extend(),
this patch lets ext4_group_extend() call ext4_group_extentd_no_check()
instead.

Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 41 ++---------------------------------------
1 files changed, 2 insertions(+), 39 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 29abd66..c862c6b 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1838,8 +1838,7 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
ext4_grpblk_t last;
ext4_grpblk_t add;
struct buffer_head *bh;
- handle_t *handle;
- int err, err2;
+ int err;
ext4_group_t group;

o_blocks_count = ext4_blocks_count(es);
@@ -1895,43 +1894,7 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
}
brelse(bh);

- /* We will update the superblock, one block bitmap, and
- * one group descriptor via ext4_free_blocks().
- */
- handle = ext4_journal_start_sb(sb, 3);
- if (IS_ERR(handle)) {
- err = PTR_ERR(handle);
- ext4_warning(sb, "error %d on journal start", err);
- goto exit_put;
- }
-
- if ((err = ext4_journal_get_write_access(handle,
- EXT4_SB(sb)->s_sbh))) {
- ext4_warning(sb, "error %d on journal write access", err);
- ext4_journal_stop(handle);
- goto exit_put;
- }
- ext4_blocks_count_set(es, o_blocks_count + add);
- ext4_debug("freeing blocks %llu through %llu\n", o_blocks_count,
- o_blocks_count + add);
- /* We add the blocks to the bitmap and set the group need init bit */
- err = ext4_group_add_blocks(handle, sb, o_blocks_count, add);
- ext4_handle_dirty_super(handle, sb);
- ext4_debug("freed blocks %llu through %llu\n", o_blocks_count,
- o_blocks_count + add);
- err2 = ext4_journal_stop(handle);
- if (!err && err2)
- err = err2;
-
- if (err)
- goto exit_put;

2011-12-30 14:42:01

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 13/15] ext4: add new online resize interface

From: Yongqiang Yang <[email protected]>

This patch adds new online resize interface, whose input argument is a
64-bit integer indicating how many blocks there are in the resized fs.

In new resize impelmentation, all work like allocating group tables are done
by kernel side, so the new resize interface can support flex_bg feature and
prepares ground for suppoting resize with features like bigalloc and exclude
bitmap. Besides these, user-space tools just passes in the new number of blocks.

We delay initializing the bitmaps and inode tables of added groups if possible
and add multi groups (a flex groups) each time, so new resize is very fast like
mkfs.

Signed-off-by: Yongqiang Yang <[email protected]>
---
Documentation/filesystems/ext4.txt | 7 ++
fs/ext4/ext4.h | 2 +
fs/ext4/ioctl.c | 42 +++++++++
fs/ext4/resize.c | 177 ++++++++++++++++++++++++++++++++++++
4 files changed, 228 insertions(+), 0 deletions(-)

diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 4917cf2..10ec463 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -581,6 +581,13 @@ Table of Ext4 specific ioctls
behaviour may change in the future as it is
not necessary and has been done this way only
for sake of simplicity.
+
+ EXT4_IOC_RESIZE_FS Resize the filesystem to a new size. The number
+ of blocks of resized filesystem is passed in via
+ 64 bit integer argument. The kernel allocates
+ bitmaps and inode table, the userspace tool thus
+ just passes the new number of blocks.
+
..............................................................................

References
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 05058e2..4bc0e82 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -583,6 +583,7 @@ enum {
/* note ioctl 11 reserved for filesystem-independent FIEMAP ioctl */
#define EXT4_IOC_ALLOC_DA_BLKS _IO('f', 12)
#define EXT4_IOC_MOVE_EXT _IOWR('f', 15, struct move_extent)
+#define EXT4_IOC_RESIZE_FS _IOW('f', 16, __u64)

#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
/*
@@ -1929,6 +1930,7 @@ extern int ext4_group_add(struct super_block *sb,
extern int ext4_group_extend(struct super_block *sb,
struct ext4_super_block *es,
ext4_fsblk_t n_blocks_count);
+extern int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count);

/* super.c */
extern void *ext4_kvmalloc(size_t size, gfp_t flags);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index ff1aab7..d639132 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -329,6 +329,47 @@ mext_out:
return err;
}

+ case EXT4_IOC_RESIZE_FS: {
+ ext4_fsblk_t n_blocks_count;
+ struct super_block *sb = inode->i_sb;
+ int err = 0, err2 = 0;
+
+ if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_BIGALLOC)) {
+ ext4_msg(sb, KERN_ERR,
+ "Online resizing not supported with bigalloc");
+ return -EOPNOTSUPP;
+ }
+
+ err = ext4_resize_begin(sb);
+ if (err)
+ return err;
+
+ if (copy_from_user(&n_blocks_count, (__u64 __user *)arg,
+ sizeof(__u64))) {
+ err = -EFAULT;
+ goto resizefs_out;
+ }
+
+ err = mnt_want_write(filp->f_path.mnt);
+ if (err)
+ goto resizefs_out;
+
+ err = ext4_resize_fs(sb, n_blocks_count);
+ if (EXT4_SB(sb)->s_journal) {
+ jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal);
+ err2 = jbd2_journal_flush(EXT4_SB(sb)->s_journal);
+ jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
+ }
+ if (err == 0)
+ err = err2;
+ mnt_drop_write(filp->f_path.mnt);
+resizefs_out:
+ ext4_resize_end(sb);
+
+ return err;
+ }
+
case FITRIM:
{
struct request_queue *q = bdev_get_queue(sb->s_bdev);
@@ -427,6 +468,7 @@ long ext4_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
}
case EXT4_IOC_MOVE_EXT:
case FITRIM:
+ case EXT4_IOC_RESIZE_FS:
break;
default:
return -ENOIOCTLCMD;
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 3d56e70..29abd66 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1471,6 +1471,70 @@ exit:
return err;
}

+static int ext4_setup_next_flex_gd(struct super_block *sb,
+ struct ext4_new_flex_group_data *flex_gd,
+ ext4_fsblk_t n_blocks_count,
+ unsigned long flexbg_size)
+{
+ struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+ struct ext4_new_group_data *group_data = flex_gd->groups;
+ ext4_fsblk_t o_blocks_count;
+ ext4_group_t n_group;
+ ext4_group_t group;
+ ext4_group_t last_group;
+ ext4_grpblk_t last;
+ ext4_grpblk_t blocks_per_group;
+ unsigned long i;
+
+ blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb);
+
+ o_blocks_count = ext4_blocks_count(es);
+
+ if (o_blocks_count == n_blocks_count)
+ return 0;
+
+ ext4_get_group_no_and_offset(sb, o_blocks_count, &group, &last);
+ BUG_ON(last);
+ ext4_get_group_no_and_offset(sb, n_blocks_count - 1, &n_group, &last);
+
+ last_group = group | (flexbg_size - 1);
+ if (last_group > n_group)
+ last_group = n_group;
+
+ flex_gd->count = last_group - group + 1;
+
+ for (i = 0; i < flex_gd->count; i++) {
+ int overhead;
+
+ group_data[i].group = group + i;
+ group_data[i].blocks_count = blocks_per_group;
+ overhead = ext4_bg_has_super(sb, group + i) ?
+ (1 + ext4_bg_num_gdb(sb, group + i) +
+ le16_to_cpu(es->s_reserved_gdt_blocks)) : 0;
+ group_data[i].free_blocks_count = blocks_per_group - overhead;
+ if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+ flex_gd->bg_flags[i] = EXT4_BG_BLOCK_UNINIT |
+ EXT4_BG_INODE_UNINIT;
+ else
+ flex_gd->bg_flags[i] = EXT4_BG_INODE_ZEROED;
+ }
+
+ if (last_group == n_group &&
+ EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+ /* We need to initialize block bitmap of last group. */
+ flex_gd->bg_flags[i - 1] &= ~EXT4_BG_BLOCK_UNINIT;
+
+ if ((last_group == n_group) && (last != blocks_per_group - 1)) {
+ group_data[i - 1].blocks_count = last + 1;
+ group_data[i - 1].free_blocks_count -= blocks_per_group-
+ last - 1;
+ }
+
+ return 1;
+}
+
/* Add group descriptor data to an existing or new group descriptor block.
* Ensure we handle all possible error conditions _before_ we start modifying
* the filesystem, because we cannot abort the transaction and not have it
@@ -1870,3 +1934,116 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
exit_put:
return err;
} /* ext4_group_extend */
+
+/*
+ * ext4_resize_fs() resizes a fs to new size specified by @n_blocks_count
+ *
+ * @sb: super block of the fs to be resized
+ * @n_blocks_count: the number of blocks resides in the resized fs
+ */
+int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count)
+{
+ struct ext4_new_flex_group_data *flex_gd = NULL;
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+ struct buffer_head *bh;
+ struct inode *resize_inode;
+ ext4_fsblk_t o_blocks_count;
+ ext4_group_t o_group;
+ ext4_group_t n_group;
+ ext4_grpblk_t offset;
+ unsigned long n_desc_blocks;
+ unsigned long o_desc_blocks;
+ unsigned long desc_blocks;
+ int err = 0, flexbg_size = 1;
+
+ o_blocks_count = ext4_blocks_count(es);
+
+ if (test_opt(sb, DEBUG))
+ printk(KERN_DEBUG "EXT4-fs: resizing filesystem from %llu "
+ "upto %llu blocks\n", o_blocks_count, n_blocks_count);
+
+ if (n_blocks_count < o_blocks_count) {
+ /* On-line shrinking not supported */
+ ext4_warning(sb, "can't shrink FS - resize aborted");
+ return -EINVAL;
+ }
+
+ if (n_blocks_count == o_blocks_count)
+ /* Nothing need to do */
+ return 0;
+
+ ext4_get_group_no_and_offset(sb, n_blocks_count - 1, &n_group, &offset);
+ ext4_get_group_no_and_offset(sb, o_blocks_count, &o_group, &offset);
+
+ n_desc_blocks = (n_group + EXT4_DESC_PER_BLOCK(sb)) /
+ EXT4_DESC_PER_BLOCK(sb);
+ o_desc_blocks = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
+ EXT4_DESC_PER_BLOCK(sb);
+ desc_blocks = n_desc_blocks - o_desc_blocks;
+
+ if (desc_blocks &&
+ (!EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_RESIZE_INODE) ||
+ le16_to_cpu(es->s_reserved_gdt_blocks) < desc_blocks)) {
+ ext4_warning(sb, "No reserved GDT blocks, can't resize");
+ return -EPERM;
+ }
+
+ resize_inode = ext4_iget(sb, EXT4_RESIZE_INO);
+ if (IS_ERR(resize_inode)) {
+ ext4_warning(sb, "Error opening resize inode");
+ return PTR_ERR(resize_inode);
+ }
+
+ /* See if the device is actually as big as what was requested */
+ bh = sb_bread(sb, n_blocks_count - 1);
+ if (!bh) {
+ ext4_warning(sb, "can't read last block, resize aborted");
+ return -ENOSPC;
+ }
+ brelse(bh);
+
+ if (offset != 0) {
+ /* extend the last group */
+ ext4_grpblk_t add;
+ add = EXT4_BLOCKS_PER_GROUP(sb) - offset;
+ err = ext4_group_extend_no_check(sb, o_blocks_count, add);
+ if (err)
+ goto out;
+ }
+
+ if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG) &&
+ es->s_log_groups_per_flex)
+ flexbg_size = 1 << es->s_log_groups_per_flex;
+
+ o_blocks_count = ext4_blocks_count(es);
+ if (o_blocks_count == n_blocks_count)
+ goto out;
+
+ flex_gd = alloc_flex_gd(flexbg_size);
+ if (flex_gd == NULL) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ /* Add flex groups. Note that a regular group is a
+ * flex group with 1 group.
+ */
+ while (ext4_setup_next_flex_gd(sb, flex_gd, n_blocks_count,
+ flexbg_size)) {
+ ext4_alloc_group_tables(sb, flex_gd, flexbg_size);
+ err = ext4_flex_group_add(sb, resize_inode, flex_gd);
+ if (unlikely(err))
+ break;
+ }
+
+out:
+ if (flex_gd)
+ free_flex_gd(flex_gd);
+
+ iput(resize_inode);
+ if (test_opt(sb, DEBUG))
+ printk(KERN_DEBUG "EXT4-fs: resized filesystem from %llu "
+ "upto %llu blocks\n", o_blocks_count, n_blocks_count);
+ return err;
+}
--
1.7.5.1

2011-12-30 14:42:06

by Yongqiang Yang

[permalink] [raw]

Subject: [PATCH V6 RESEND 15/15] ext4: let ext4_group_add() use common code

From: Yongqiang Yang <[email protected]>

This patch lets ext4_group_add() call ext4_flex_group_add().

Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 309 ++----------------------------------------------------
1 files changed, 10 insertions(+), 299 deletions(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index c862c6b..2c78f09 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -595,137 +595,6 @@ out:
}

/*
- * Set up the block and inode bitmaps, and the inode table for the new group.
- * This doesn't need to be part of the main transaction, since we are only
- * changing blocks outside the actual filesystem. We still do journaling to
- * ensure the recovery is correct in case of a failure just after resize.
- * If any part of this fails, we simply abort the resize.
- */
-static int setup_new_group_blocks(struct super_block *sb,
- struct ext4_new_group_data *input)
-{
- struct ext4_sb_info *sbi = EXT4_SB(sb);
- ext4_fsblk_t start = ext4_group_first_block_no(sb, input->group);
- int reserved_gdb = ext4_bg_has_super(sb, input->group) ?
- le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks) : 0;
- unsigned long gdblocks = ext4_bg_num_gdb(sb, input->group);
- struct buffer_head *bh;
- handle_t *handle;
- ext4_fsblk_t block;
- ext4_grpblk_t bit;
- int i;
- int err = 0, err2;
-
- /* This transaction may be extended/restarted along the way */
- handle = ext4_journal_start_sb(sb, EXT4_MAX_TRANS_DATA);
-
- if (IS_ERR(handle))
- return PTR_ERR(handle);
-
- BUG_ON(input->group != sbi->s_groups_count);
-
- /* Copy all of the GDT blocks into the backup in this group */
- for (i = 0, bit = 1, block = start + 1;
- i < gdblocks; i++, block++, bit++) {
- struct buffer_head *gdb;
-
- ext4_debug("update backup group %#04llx (+%d)\n", block, bit);
- err = extend_or_restart_transaction(handle, 1);
- if (err)
- goto exit_journal;
-
- gdb = sb_getblk(sb, block);
- if (!gdb) {
- err = -EIO;
- goto exit_journal;
- }
- if ((err = ext4_journal_get_write_access(handle, gdb))) {
- brelse(gdb);
- goto exit_journal;
- }
- memcpy(gdb->b_data, sbi->s_group_desc[i]->b_data, gdb->b_size);
- set_buffer_uptodate(gdb);
- err = ext4_handle_dirty_metadata(handle, NULL, gdb);
- if (unlikely(err)) {
- brelse(gdb);
- goto exit_journal;
- }
- brelse(gdb);
- }
-
- /* Zero out all of the reserved backup group descriptor table blocks */
- ext4_debug("clear inode table blocks %#04llx -> %#04lx\n",
- block, sbi->s_itb_per_group);
- err = sb_issue_zeroout(sb, gdblocks + start + 1, reserved_gdb,
- GFP_NOFS);
- if (err)
- goto exit_journal;
-
- err = extend_or_restart_transaction(handle, 2);
- if (err)
- goto exit_journal;
-
- bh = bclean(handle, sb, input->block_bitmap);
- if (IS_ERR(bh)) {
- err = PTR_ERR(bh);
- goto exit_journal;
- }
-
- if (ext4_bg_has_super(sb, input->group)) {
- ext4_debug("mark backup group tables %#04llx (+0)\n", start);
- ext4_set_bits(bh->b_data, 0, gdblocks + reserved_gdb + 1);
- }
-
- ext4_debug("mark block bitmap %#04llx (+%llu)\n", input->block_bitmap,
- input->block_bitmap - start);
- ext4_set_bit(input->block_bitmap - start, bh->b_data);
- ext4_debug("mark inode bitmap %#04llx (+%llu)\n", input->inode_bitmap,
- input->inode_bitmap - start);
- ext4_set_bit(input->inode_bitmap - start, bh->b_data);
-
- /* Zero out all of the inode table blocks */
- block = input->inode_table;
- ext4_debug("clear inode table blocks %#04llx -> %#04lx\n",
- block, sbi->s_itb_per_group);
- err = sb_issue_zeroout(sb, block, sbi->s_itb_per_group, GFP_NOFS);
- if (err)
- goto exit_bh;
- ext4_set_bits(bh->b_data, input->inode_table - start,
- sbi->s_itb_per_group);
-
-
- ext4_mark_bitmap_end(input->blocks_count, sb->s_blocksize * 8,
- bh->b_data);
- err = ext4_handle_dirty_metadata(handle, NULL, bh);
- if (unlikely(err)) {
- ext4_std_error(sb, err);
- goto exit_bh;
- }
- brelse(bh);
- /* Mark unused entries in inode bitmap used */
- ext4_debug("clear inode bitmap %#04llx (+%llu)\n",
- input->inode_bitmap, input->inode_bitmap - start);
- if (IS_ERR(bh = bclean(handle, sb, input->inode_bitmap))) {
- err = PTR_ERR(bh);
- goto exit_journal;
- }
-
- ext4_mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), sb->s_blocksize * 8,
- bh->b_data);
- err = ext4_handle_dirty_metadata(handle, NULL, bh);
- if (unlikely(err))
- ext4_std_error(sb, err);
-exit_bh:
- brelse(bh);
-
-exit_journal:
- if ((err2 = ext4_journal_stop(handle)) && !err)
- err = err2;
-
- return err;
-}
-
-/*
* Iterate through the groups which hold BACKUP superblock/GDT copies in an
* ext4 filesystem. The counters should be initialized to 1, 5, and 7 before
* calling this for the first time. In a sparse filesystem it will be the
@@ -1550,16 +1419,15 @@ static int ext4_setup_next_flex_gd(struct super_block *sb,
*/
int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
{
+ struct ext4_new_flex_group_data flex_gd;
struct ext4_sb_info *sbi = EXT4_SB(sb);
struct ext4_super_block *es = sbi->s_es;
int reserved_gdb = ext4_bg_has_super(sb, input->group) ?
le16_to_cpu(es->s_reserved_gdt_blocks) : 0;
- struct buffer_head *primary = NULL;
- struct ext4_group_desc *gdp;
struct inode *inode = NULL;
- handle_t *handle;
int gdb_off, gdb_num;
- int err, err2;
+ int err;
+ __u16 bg_flags = 0;

gdb_num = input->group / EXT4_DESC_PER_BLOCK(sb);
gdb_off = input->group % EXT4_DESC_PER_BLOCK(sb);
@@ -1598,172 +1466,15 @@ int ext4_group_add(struct super_block *sb, struct ext4_new_group_data *input)
}

- if ((err = verify_group_input(sb, input)))
- goto exit_put;
-
- if ((err = setup_new_group_blocks(sb, input)))
- goto exit_put;
-
- /*
- * We will always be modifying at least the superblock and a GDT
- * block. If we are adding a group past the last current GDT block,
- * we will also modify the inode and the dindirect block. If we
- * are adding a group with superblock/GDT backups we will also
- * modify each of the reserved GDT dindirect blocks.
- */
- handle = ext4_journal_start_sb(sb,
- ext4_bg_has_super(sb, input->group) ?
- 3 + reserved_gdb : 4);
- if (IS_ERR(handle)) {
- err = PTR_ERR(handle);
- goto exit_put;
- }
-
- if ((err = ext4_journal_get_write_access(handle, sbi->s_sbh)))
- goto exit_journal;
-
- /*
- * We will only either add reserved group blocks to a backup group
- * or remove reserved blocks for the first group in a new group block.
- * Doing both would be mean more complex code, and sane people don't
- * use non-sparse filesystems anymore. This is already checked above.
- */
- if (gdb_off) {
- primary = sbi->s_group_desc[gdb_num];
- if ((err = ext4_journal_get_write_access(handle, primary)))
- goto exit_journal;
-
- if (reserved_gdb && ext4_bg_num_gdb(sb, input->group)) {
- err = reserve_backup_gdb(handle, inode, input->group);
- if (err)
- goto exit_journal;
- }
- } else {
- /*
- * Note that we can access new group descriptor block safely
- * only if add_new_gdb() succeeds.
- */
- err = add_new_gdb(handle, inode, input->group);
- if (err)
- goto exit_journal;
- primary = sbi->s_group_desc[gdb_num];
- }
-
- /*
- * OK, now we've set up the new group. Time to make it active.
- *
- * so we have to be safe wrt. concurrent accesses the group
- * data. So we need to be careful to set all of the relevant
- * group descriptor data etc. *before* we enable the group.
- *
- * The key field here is sbi->s_groups_count: as long as
- * that retains its old value, nobody is going to access the new
- * group.
- *
- * So first we update all the descriptor metadata for the new
- * group; then we update the total disk blocks count; then we
- * update the groups count to enable the group; then finally we
- * update the free space counts so that the system can start
- * using the new disk blocks.
- */
-
- /* Update group descriptor block for new group */
- gdp = (struct ext4_group_desc *)((char *)primary->b_data +
- gdb_off * EXT4_DESC_SIZE(sb));
-
- memset(gdp, 0, EXT4_DESC_SIZE(sb));
- ext4_block_bitmap_set(sb, gdp, input->block_bitmap); /* LV FIXME */
- ext4_inode_bitmap_set(sb, gdp, input->inode_bitmap); /* LV FIXME */
- ext4_inode_table_set(sb, gdp, input->inode_table); /* LV FIXME */
- ext4_free_group_clusters_set(sb, gdp, input->free_blocks_count);
- ext4_free_inodes_set(sb, gdp, EXT4_INODES_PER_GROUP(sb));
- gdp->bg_flags = cpu_to_le16(EXT4_BG_INODE_ZEROED);
- gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);
-
- /*
- * We can allocate memory for mb_alloc based on the new group
- * descriptor
- */
- err = ext4_mb_add_groupinfo(sb, input->group, gdp);
+ err = verify_group_input(sb, input);
if (err)
- goto exit_journal;
-
- /*
- * Make the new blocks and inodes valid next. We do this before
- * increasing the group count so that once the group is enabled,
- * all of its blocks and inodes are already valid.
- *
- * We always allocate group-by-group, then block-by-block or
- * inode-by-inode within a group, so enabling these
- * blocks/inodes before the group is live won't actually let us
- * allocate the new space yet.
- */
- ext4_blocks_count_set(es, ext4_blocks_count(es) +
- input->blocks_count);
- le32_add_cpu(&es->s_inodes_count, EXT4_INODES_PER_GROUP(sb));
-
- /*
- * We need to protect s_groups_count against other CPUs seeing
- * inconsistent state in the superblock.
- *
- * The precise rules we use are:
- *
- * * Writers must perform a smp_wmb() after updating all dependent
- * data and before modifying the groups count
- *
- * * Readers must perform an smp_rmb() after reading the groups count
- * and before reading any dependent data.
- *
- * NB. These rules can be relaxed when checking the group count
- * while freeing data, as we can only allocate from a block
- * group after serialising against the group count, and we can
- * only then free after serialising in turn against that
- * allocation.
- */
- smp_wmb();
-
- /* Update the global fs size fields */
- sbi->s_groups_count++;
-
- err = ext4_handle_dirty_metadata(handle, NULL, primary);
- if (unlikely(err)) {
- ext4_std_error(sb, err);
- goto exit_journal;
- }
-
- /* Update the reserved block counts only once the new group is
- * active. */
- ext4_r_blocks_count_set(es, ext4_r_blocks_count(es) +
- input->reserved_blocks);
-
- /* Update the free space counts */
- percpu_counter_add(&sbi->s_freeclusters_counter,
- EXT4_B2C(sbi, input->free_blocks_count));
- percpu_counter_add(&sbi->s_freeinodes_counter,
- EXT4_INODES_PER_GROUP(sb));
-
- if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG) &&
- sbi->s_log_groups_per_flex) {
- ext4_group_t flex_group;
- flex_group = ext4_flex_group(sbi, input->group);
- atomic_add(EXT4_B2C(sbi, input->free_blocks_count),
- &sbi->s_flex_groups[flex_group].free_clusters);
- atomic_add(EXT4_INODES_PER_GROUP(sb),
- &sbi->s_flex_groups[flex_group].free_inodes);
- }

2011-12-31 20:04:02

by Andreas Dilger

[permalink] [raw]

Subject: Re: [PATCH V6 RESEND] ext4: add new online resize

The main issue I have with this patch is that the series starts by adding a bunch of unused functions, and only wires them I to the existing resize code at the very end.

A better approach would be to have the early patches refactor the existing functions so that they are used by the current resize code, and the new resize functionality is enabled at the end.

That allows verifying that the existing code continues to work through the entire series, and that the functions being added actually work properly even before the new ioctl is added.

Cheers, Andreas

On 2011-12-30, at 3:59, Yongqiang Yang <[email protected]> wrote:

> Hi all,
>
> Here is the 6th veriosn of patch series adding new resize implementation
> to ext4.
>
> -- What's new resize implementation?
> It is a new online resize interface for ext4. It can be used via
> ioctl with EXT4_IOC_RESIZE_FS and a 64 bit integer indicating size
> of the resized fs in block.
>
> -- Difference between current resize and new resize.
> New resize lets kernel do all work, like allocating bitmaps and
> inode tables and can support flex_bg and BLOCK_UNINIT features.
> Besides these, new resize is much faster than current resize.
>
> Below are benchmarks I made on my personal computer, fses with
> flex_bg size = 16 were resized to 230GB evry time. The first
> row shows the size of a fs from which the fs was resized to 230GB.
> The datas were collected by 'time resize2fs'.
>
> new resize
> 20GB 50GB 100GB
> real 0m3.558s 0m2.891s 0m0.394s
> user 0m0.004s 0m0.000s 0m0.394s
> sys 0m0.048s 0m0.048s 0m0.028s
>
> current resize
> 20GB 50GB 100GB
> real 5m2.770s 4m43.757s 3m14.840s
> user 0m0.040s 0m0.032s 0m0.024s
> sys 0m0.464s 0m0.432s 0m0.324s
>
> According to data above, new resize is faster than current resize in both
> user and sys time. New resize performs well in sys time, because it
> supports BLOCK_UNINIT and adds multi-groups each time.
>
> -- About supporting new features.
> YES! New resize can support new feature like bigalloc and exclude bitmap
> easily. Because it lets kernel do all work.
>
> V5->V6:
> add code initializing inode bitmap and inode tables back.
>
> V4->V5:
> release resizing error lock in error case of IOC_RESIZEFS
> Thanks Djalal <[email protected]> for pointing it out and his
> patch for IOC_GROUP_EXTEND and IOC_GROUP_ADD.
>
> V3->V4:
> rename __ext4_group_extend to ext4_group_extend_no_check
>
> V2->V3:
> initialize block bitmap of last group.
> remove code initializing inode bitmap and inode tables.
>
> Yongqiang.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-01-04 03:49:33

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH V6 RESEND 07/15] ext4: add a function which adds several group descriptors

On Fri, Dec 30, 2011 at 07:00:04PM +0800, Yongqiang Yang wrote:
> From: Yongqiang Yang <[email protected]>
>
> This patch adds a functon named ext4_add_new_descs() which adds
> several group descriptors each time.

This is why it's less desirable to add functions one at a time. I had
to look through all of the patches to determine that
ext4_add_new_desc() is _only_ being called by ext4_add_new_descs().
And since ext4_add_new_descs() adds so little new value, it's better
to combine these two functions together into single function.

I'll add a new patch which folds these two functions together.

- Ted

2012-01-04 03:49:33

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH V6 RESEND 01/15] ext4: add a function which extends a group without checking parameters

On Fri, Dec 30, 2011 at 06:59:58PM +0800, Yongqiang Yang wrote:
> +static int ext4_group_extend_no_check(struct super_block *sb,
> + ext4_fsblk_t o_blocks_count, ext4_grpblk_t add)

I fixed the whitespace here (nit-picky, I know)

> + handle = ext4_journal_start_sb(sb, 3);
> + if (IS_ERR(handle)) {
> + err = PTR_ERR(handle);
> + ext4_warning(sb, "error %d on journal start", err);
> + goto out;
> + }

There's only two calls "goto out", and out: currently is just "return
err;". So I just changed this to be "return err", which is clearer.
I tend to place more imporance for clarity of the code than rigid
rules which say that there should only be one control flow such that
"return err" must only occur once at the end of the function.

> + err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh);
> + if (err) {
> + ext4_warning(sb, "error %d on journal write access", err);
> + ext4_journal_stop(handle);
> + goto out;
> + }

Instead of calling ext4_journal_stop() here, it's simpler just to jump
to the "exit_journal" label, which is consistent with what we do
everywhere else. Now all of the error gotos are to "exit_journal",
which I've renamed to "errout".

These are minor stylistic changes, but I thought I would mention them
so that other people understand what is considered the best way to do
things, I've just made these changes myself to avoid asking you to
respin the patches.

Regards,

- Ted

2012-01-04 03:49:33

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH V6 RESEND] ext4: add new online resize

On Sat, Dec 31, 2011 at 10:40:24AM -0700, Andreas Dilger wrote:
> The main issue I have with this patch is that the series starts by
> adding a bunch of unused functions, and only wires them I to the
> existing resize code at the very end.
>
> A better approach would be to have the early patches refactor the
> existing functions so that they are used by the current resize code,
> and the new resize functionality is enabled at the end.
>
> That allows verifying that the existing code continues to work
> through the entire series, and that the functions being added
> actually work properly even before the new ioctl is added.

I agree with Andreas that in general this is better way to ago,
although as long as regression testing is done with both the new
(v1.42) resize2fs as well as the old (v1.41.x) resize2fs, it's OK.
(Regressions with the v1.41.x resize2fs are just as important because
many people will be upgrading to the new kernel without necessarily
upgrading their e2fsprogs userspace programs, and we can't let them
break.)

I've done some quick regression testing using both the old and new
resize2fs programs, however, and it looks good. I'm going to take a
quick final scan over the patches, but I think they are ready for
inclusion.

- Ted

2012-01-04 03:49:33

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH V6 RESEND 05/15] ext4: add a structure which will be used by 64bit-resize interface

On Fri, Dec 30, 2011 at 07:00:02PM +0800, Yongqiang Yang wrote:
> +void free_flex_gd(struct ext4_new_flex_group_data *flex_gd)

This should be static, so we don't leak symbols that don't begin with
"ext4_" into kernel namespace. I've checked and future users of this
function are in fs/ext4/resize.c, so there's no reason for this not to
be non-static.

(A good thing to do while doing development is to run "nm ext4.ko" and
check for exported symbols that you aren't expecting.)

Again, I'll fix this as a minor nit.

- Ted

2012-01-04 03:49:37

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH V6 RESEND 08/15] ext4: add a function which sets up a flex groups each time

On Fri, Dec 30, 2011 at 07:00:05PM +0800, Yongqiang Yang wrote:
> From: Yongqiang Yang <[email protected]>
>
> This patch adds a function named ext4_setup_new_descs() which sets up
> a flex groups each time.

Similarly, ext4_setup_new_desc() and ext4_setup_new_descs() should be
folded together.

- Ted

2012-01-04 03:49:37

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH V6 RESEND 03/15] ext4: add a function which sets up a new group desc

On Fri, Dec 30, 2011 at 07:00:00PM +0800, Yongqiang Yang wrote:
> +
> + memset(gdp, 0, EXT4_DESC_SIZE(sb));
> + /* LV FIXME */
> + memset(gdp, 0, EXT4_DESC_SIZE(sb));
> + ext4_block_bitmap_set(sb, gdp, input->block_bitmap); /* LV FIXME */
> + ext4_inode_bitmap_set(sb, gdp, input->inode_bitmap); /* LV FIXME */
> + ext4_inode_table_set(sb, gdp, input->inode_table); /* LV FIXME */

There is a duplicated (and unneeded) call to memset above. Also, the
LV FIXME was a comment introduced by Laurent Vivier in commit bd81d8ee
because at the time input->block_bitmap was a 32-bit value. In the
new ext4_new_group_data structure, input->block_bitmap is now a 64-bit
field, so the need for "LV FIXME" is gone, and should be removed.

I'll remove the duplicate memset() calls and the "LV FIXME".

For future reference, this is why it's better to use something
descriptive about the FIXME "64-bit FIXME", as opposed to your
initials. And it's best if there's an explicit comment explaining why
the fixme is there (and what the consequences of leaving partially
broken code in the kernel :-), so that future code maintainers won't
have to do code archeology to figure out what the FIXME is all about.

- Ted

2012-01-04 03:59:25

by Yongqiang Yang

[permalink] [raw]

Subject: Re: [PATCH V6 RESEND 01/15] ext4: add a function which extends a group without checking parameters

On Wed, Jan 4, 2012 at 5:07 AM, Ted Ts'o <[email protected]> wrote:
> On Fri, Dec 30, 2011 at 06:59:58PM +0800, Yongqiang Yang wrote:
>> +static int ext4_group_extend_no_check(struct super_block *sb,
>> + ? ? ? ? ? ? ? ? ? ? ext4_fsblk_t o_blocks_count, ext4_grpblk_t add)
>
> I fixed the whitespace here (nit-picky, I know)
>
>> + ? ? handle = ext4_journal_start_sb(sb, 3);
>> + ? ? if (IS_ERR(handle)) {
>> + ? ? ? ? ? ? err = PTR_ERR(handle);
>> + ? ? ? ? ? ? ext4_warning(sb, "error %d on journal start", err);
>> + ? ? ? ? ? ? goto out;
>> + ? ? }
>
> There's only two calls "goto out", and out: currently is just "return
> err;". ?So I just changed this to be "return err", which is clearer.
> I tend to place more imporance for clarity of the code than rigid
> rules which say that there should only be one control flow such that
> "return err" must only occur once at the end of the function.
>
>> + ? ? err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh);
>> + ? ? if (err) {
>> + ? ? ? ? ? ? ext4_warning(sb, "error %d on journal write access", err);
>> + ? ? ? ? ? ? ext4_journal_stop(handle);
>> + ? ? ? ? ? ? goto out;
>> + ? ? }
>
> Instead of calling ext4_journal_stop() here, it's simpler just to jump
> to the "exit_journal" label, which is consistent with what we do
> everywhere else. ?Now all of the error gotos are to "exit_journal",
> which I've renamed to "errout".
>
> These are minor stylistic changes, but I thought I would mention them
> so that other people understand what is considered the best way to do
> things, I've just made these changes myself to avoid asking you to
> respin the patches.
Thanks!!!

Yongqiang.
>
> Regards,
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted

--
Best Wishes
Yongqiang Yang

2012-01-04 04:02:52

by Yongqiang Yang

[permalink] [raw]

Subject: Re: [PATCH V6 RESEND 03/15] ext4: add a function which sets up a new group desc

On Wed, Jan 4, 2012 at 10:24 AM, Ted Ts'o <[email protected]> wrote:
> On Fri, Dec 30, 2011 at 07:00:00PM +0800, Yongqiang Yang wrote:
>> +
>> + ? ? memset(gdp, 0, EXT4_DESC_SIZE(sb));
>> + ? ? ?/* LV FIXME */
>> + ? ? memset(gdp, 0, EXT4_DESC_SIZE(sb));
>> + ? ? ext4_block_bitmap_set(sb, gdp, input->block_bitmap); /* LV FIXME */
>> + ? ? ext4_inode_bitmap_set(sb, gdp, input->inode_bitmap); /* LV FIXME */
>> + ? ? ext4_inode_table_set(sb, gdp, input->inode_table); /* LV FIXME */
>
> There is a duplicated (and unneeded) call to memset above. ?Also, the
> LV FIXME was a comment introduced by Laurent Vivier in commit bd81d8ee
> because at the time input->block_bitmap was a 32-bit value. ?In the
> new ext4_new_group_data structure, input->block_bitmap is now a 64-bit
> field, so the need for "LV FIXME" is gone, and should be removed.
>
> I'll remove the duplicate memset() calls and the "LV FIXME".
>
> For future reference, this is why it's better to use something
> descriptive about the FIXME "64-bit FIXME", as opposed to your
> initials. ?And it's best if there's an explicit comment explaining why
> the fixme is there (and what the consequences of leaving partially
> broken code in the kernel :-), so that future code maintainers won't
> have to do code archeology to figure out what the FIXME is all about.
Thanks for your explanation. I did not figure out what 'LV FIXME'
means, so I left them. Now I am understood.

Yongqiang.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? - Ted

--
Best Wishes
Yongqiang Yang

2012-01-04 05:03:35

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH V6 RESEND 13/15] ext4: add new online resize interface

On Fri, Dec 30, 2011 at 07:00:10PM +0800, Yongqiang Yang wrote:
> + if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
> + EXT4_FEATURE_RO_COMPAT_BIGALLOC)) {
> + ext4_msg(sb, KERN_ERR,
> + "Online resizing not supported with bigalloc");
> + return -EOPNOTSUPP;
> + }

I've also added checks to disable on-line resizing if the meta_bg file
system feature is set, since the on-line resizing code in this patch
series does not support the meta_bg block group descriptor layout
(which is required for file systems > 32-bits).

In addition, I've added a check so that if the requested number of
blocks is greater than 2**32-1, and the 64-bit option is not set, to
bail out with an error.

In theory the resize2fs userspace code will be doing these checks, but
(a) at some point in the future the meta_bg support will be added, and
so if the kernel doesn't support meta_bg, it needs to fail, and not
depend on userspace to do this check. Also (b) we want to make sure
this code doesn't do something crazy in case someone calls the ioctl
directly.

- Ted