Hi all,
This patch series adds new resize implementation to ext4.
-- What's new resize implementation?
It is a new online resize interface for ext4. It can be used via
ioctl with EXT4_IOC_RESIZE_FS and a 64 bit integer indicating size
of the resized fs in block.
-- Difference between current resize and new resize.
New resize lets kernel do all work, like allocating bitmaps and
inode tables and can support flex_bg and BLOCK_UNINIT features.
Besides these, new resize is much faster than current resize.
Below are benchmarks I made on my personal computer, fses with
flex_bg size = 16 were resized to 230GB evry time. The first
row shows the size of a fs from which the fs was resized to 230GB.
The datas were collected by 'time resize2fs'.
new resize
20GB 50GB 100GB
real 0m3.558s 0m2.891s 0m0.394s
user 0m0.004s 0m0.000s 0m0.394s
sys 0m0.048s 0m0.048s 0m0.028s
current resize
20GB 50GB 100GB
real 5m2.770s 4m43.757s 3m14.840s
user 0m0.040s 0m0.032s 0m0.024s
sys 0m0.464s 0m0.432s 0m0.324s
According to data above, new resize is faster than current resize in both
user and sys time. New resize performs well in sys time, because it
supports BLOCK_UNINIT and adds multi-groups each time.
-- About supporting new features.
YES! New resize can support new feature like bigalloc and exclude bitmap
easily. Because it lets kernel do all work.
[PATCH 01/13] ext4: add a function which extends a group without
[PATCH 02/13] ext4: add a function which adds a new desc to a fs
[PATCH 03/13] ext4: add a function which sets up a new group desc
[PATCH 04/13] ext4: add a function which updates super block
[PATCH 05/13] ext4: add a structure which will be used by
[PATCH 06/13] ext4: add a function which sets up group blocks of a
[PATCH 07/13] ext4: add a function which adds several group
[PATCH 08/13] ext4: add a function which sets up a flex groups each
[PATCH 09/13] ext4: enable ext4_update_super() to handle a flex
[PATCH 10/13] ext4: pass verify_reserved_gdb() the number of group
[PATCH 11/13] ext4: add a new function which allocates bitmaps and
[PATCH 12/13] ext4: add a new function which adds a flex group to a
[PATCH 13/13] ext4: add new online resize interface
This patch added a function named __ext4_group_extend() whose code
is copied from ext4_group_extend(). __ext4_group_extend() assumes
the parameter is valid and has been checked by caller.
__ext4_group_extend() will be used by new resize implementation. It
can also be used by ext4_group_extend(), but this patch series does
not do this.
Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 53 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 707d3f1..6ffbdb6 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -969,6 +969,59 @@ exit_put:
} /* ext4_group_add */
/*
+ * extend a group without checking assuming that checking has been done.
+ */
+static int __ext4_group_extend(struct super_block *sb,
+ ext4_fsblk_t o_blocks_count, ext4_grpblk_t add)
+{
+ struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+ handle_t *handle;
+ int err = 0, err2;
+
+ /* We will update the superblock, one block bitmap, and
+ * one group descriptor via ext4_ext4_group_add_blocks().
+ */
+ handle = ext4_journal_start_sb(sb, 3);
+ if (IS_ERR(handle)) {
+ err = PTR_ERR(handle);
+ ext4_warning(sb, "error %d on journal start", err);
+ goto out;
+ }
+
+ err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh);
+ if (err) {
+ ext4_warning(sb, "error %d on journal write access", err);
+ ext4_journal_stop(handle);
+ goto out;
+ }
+
+ ext4_blocks_count_set(es, o_blocks_count + add);
+ ext4_debug("freeing blocks %llu through %llu\n", o_blocks_count,
+ o_blocks_count + add);
+ /* We add the blocks to the bitmap and set the group need init bit */
+ err = ext4_group_add_blocks(handle, sb, o_blocks_count, add);
+ if (err)
+ goto exit_journal;
+ ext4_handle_dirty_super(handle, sb);
+ ext4_debug("freed blocks %llu through %llu\n", o_blocks_count,
+ o_blocks_count + add);
+exit_journal:
+ err2 = ext4_journal_stop(handle);
+ if (err2 && !err)
+ err = err2;
+
+ if (!err) {
+ if (test_opt(sb, DEBUG))
+ printk(KERN_DEBUG "EXT4-fs: extended group to %llu "
+ "blocks\n", ext4_blocks_count(es));
+ update_backups(sb, EXT4_SB(sb)->s_sbh->b_blocknr, (char *)es,
+ sizeof(struct ext4_super_block));
+ }
+out:
+ return err;
+}
+
+/*
* Extend the filesystem to the new number of blocks specified. This entry
* point is only used to extend the current filesystem to the end of the last
* existing group. It can be accessed via ioctl, or by "remount,resize=<size>"
--
1.7.5.1
This patch adds a function named ext4_add_new_desc() which adds
a new desc to a fs and whose code is copied from ext4_group_add().
The function will be used by new resize implementation.
Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 files changed, 42 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 6ffbdb6..4fcd515 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -735,6 +735,48 @@ exit_err:
}
}
+/*
+ * ext4_add_new_desc() adds group descriptor of group @group
+ *
+ * @handle: journal handle
+ * @sb; super block
+ * @group: the group no. of the first group desc to be added
+ * @resize_inode: the resize inode
+ */
+static int ext4_add_new_desc(handle_t *handle, struct super_block *sb,
+ ext4_group_t group, struct inode *resize_inode)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+ struct buffer_head *gdb_bh;
+ int gdb_off, gdb_num, err = 0;
+ int reserved_gdb = ext4_bg_has_super(sb, group) ?
+ le16_to_cpu(es->s_reserved_gdt_blocks) : 0;
+
+ gdb_off = group % EXT4_DESC_PER_BLOCK(sb);
+ gdb_num = group / EXT4_DESC_PER_BLOCK(sb);
+
+ /*
+ * We will only either add reserved group blocks to a backup group
+ * or remove reserved blocks for the first group in a new group block.
+ * Doing both would be mean more complex code, and sane people don't
+ * use non-sparse filesystems anymore. This is already checked above.
+ */
+ if (gdb_off) {
+ gdb_bh = sbi->s_group_desc[gdb_num];
+ err = ext4_journal_get_write_access(handle, gdb_bh);
+ if (err)
+ goto out;
+
+ if (reserved_gdb && ext4_bg_num_gdb(sb, group))
+ err = reserve_backup_gdb(handle, resize_inode, group);
+ } else
+ err = add_new_gdb(handle, resize_inode, group);
+
+out:
+ return err;
+}
+
/* Add group descriptor data to an existing or new group descriptor block.
* Ensure we handle all possible error conditions _before_ we start modifying
* the filesystem, because we cannot abort the transaction and not have it
--
1.7.5.1
This patch adds a function named ext4_setup_new_desc() which sets
up a new group descriptor and whose code is sopied from ext4_group_add().
The function will be used by new resize implementation.
Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 54 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 4fcd515..6320baa 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -777,6 +777,60 @@ out:
return err;
}
+/*
+ * ext4_setup_new_desc() sets up group descriptors specified by @input.
+ *
+ * @handle: journal handle
+ * @sb: super block
+ */
+static int ext4_setup_new_desc(handle_t *handle, struct super_block *sb,
+ struct ext4_new_group_data *input)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ ext4_group_t group;
+ struct ext4_group_desc *gdp;
+ struct buffer_head *gdb_bh;
+ int gdb_off, gdb_num, err = 0;
+
+ group = input->group;
+
+ gdb_off = group % EXT4_DESC_PER_BLOCK(sb);
+ gdb_num = group / EXT4_DESC_PER_BLOCK(sb);
+
+ /*
+ * get_write_access() has been called on gdb_bh by ext4_add_new_desc().
+ */
+ gdb_bh = sbi->s_group_desc[gdb_num];
+ /* Update group descriptor block for new group */
+ gdp = (struct ext4_group_desc *)((char *)gdb_bh->b_data +
+ gdb_off * EXT4_DESC_SIZE(sb));
+
+ memset(gdp, 0, EXT4_DESC_SIZE(sb));
+ /* LV FIXME */
+ memset(gdp, 0, EXT4_DESC_SIZE(sb));
+ ext4_block_bitmap_set(sb, gdp, input->block_bitmap); /* LV FIXME */
+ ext4_inode_bitmap_set(sb, gdp, input->inode_bitmap); /* LV FIXME */
+ ext4_inode_table_set(sb, gdp, input->inode_table); /* LV FIXME */
+ ext4_free_blks_set(sb, gdp, input->free_blocks_count);
+ ext4_free_inodes_set(sb, gdp, EXT4_INODES_PER_GROUP(sb));
+ gdp->bg_flags = cpu_to_le16(EXT4_BG_INODE_ZEROED);
+ gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);
+
+ err = ext4_handle_dirty_metadata(handle, NULL, gdb_bh);
+ if (unlikely(err)) {
+ ext4_std_error(sb, err);
+ return err;
+ }
+
+ /*
+ * We can allocate memory for mb_alloc based on the new group
+ * descriptor
+ */
+ err = ext4_mb_add_groupinfo(sb, group, gdp);
+
+ return err;
+}
+
/* Add group descriptor data to an existing or new group descriptor block.
* Ensure we handle all possible error conditions _before_ we start modifying
* the filesystem, because we cannot abort the transaction and not have it
--
1.7.5.1
This patch adds a function named ext4_update_super() which updates
super block and whose code is copied from ext4_group_add().
The function will be used by new resize implementation.
Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 72 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 6320baa..14be865 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -831,6 +831,78 @@ static int ext4_setup_new_desc(handle_t *handle, struct super_block *sb,
return err;
}
+/*
+ * ext4_update_super() updates super so that new the added group can be seen
+ * by the filesystem.
+ *
+ * @sb: super block
+ */
+static void ext4_update_super(struct super_block *sb,
+ struct ext4_new_group_data *input)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+
+ /*
+ * Make the new blocks and inodes valid next. We do this before
+ * increasing the group count so that once the group is enabled,
+ * all of its blocks and inodes are already valid.
+ *
+ * We always allocate group-by-group, then block-by-block or
+ * inode-by-inode within a group, so enabling these
+ * blocks/inodes before the group is live won't actually let us
+ * allocate the new space yet.
+ */
+ ext4_blocks_count_set(es, ext4_blocks_count(es) +
+ input->blocks_count);
+ le32_add_cpu(&es->s_inodes_count, EXT4_INODES_PER_GROUP(sb));
+
+ /*
+ * We need to protect s_groups_count against other CPUs seeing
+ * inconsistent state in the superblock.
+ *
+ * The precise rules we use are:
+ *
+ * * Writers must perform a smp_wmb() after updating all dependent
+ * data and before modifying the groups count
+ *
+ * * Readers must perform an smp_rmb() after reading the groups count
+ * and before reading any dependent data.
+ *
+ * NB. These rules can be relaxed when checking the group count
+ * while freeing data, as we can only allocate from a block
+ * group after serialising against the group count, and we can
+ * only then free after serialising in turn against that
+ * allocation.
+ */
+ smp_wmb();
+
+ /* Update the global fs size fields */
+ sbi->s_groups_count++;
+
+ /* Update the reserved block counts only once the new group is
+ * active. */
+ ext4_r_blocks_count_set(es, ext4_r_blocks_count(es) +
+ input->reserved_blocks);
+
+ /* Update the free space counts */
+ percpu_counter_add(&sbi->s_freeblocks_counter,
+ input->free_blocks_count);
+ percpu_counter_add(&sbi->s_freeinodes_counter,
+ EXT4_INODES_PER_GROUP(sb));
+
+ if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG) &&
+ sbi->s_log_groups_per_flex) {
+ ext4_group_t flex_group;
+ flex_group = ext4_flex_group(sbi, input->group);
+ atomic_add(input->free_blocks_count,
+ &sbi->s_flex_groups[flex_group].free_blocks);
+ atomic_add(EXT4_INODES_PER_GROUP(sb),
+ &sbi->s_flex_groups[flex_group].free_inodes);
+ }
+
+}
+
/* Add group descriptor data to an existing or new group descriptor block.
* Ensure we handle all possible error conditions _before_ we start modifying
* the filesystem, because we cannot abort the transaction and not have it
--
1.7.5.1
This patch adds a function named setup_new_flex_group_blocks() which
sets up group blocks of a flex groups.
Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/ext4.h | 8 ++
fs/ext4/resize.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 257 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index e717dfd..334525d 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -491,6 +491,14 @@ struct ext4_new_group_data {
__u32 free_blocks_count;
};
+/* Indexes used to index group tables in ext4_new_group_data */
+enum {
+ BLOCK_BITMAP = 0, /* block bitmap */
+ INODE_BITMAP, /* inode bitmap */
+ INODE_TABLE, /* inode tables */
+ GROUP_TABLE_COUNT,
+};
+
/*
* Flags used by ext4_map_blocks()
*/
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index c586e51..4acf7a8 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -235,6 +235,255 @@ static int extend_or_restart_transaction(handle_t *handle, int thresh)
}
/*
+ * set_flexbg_block_bitmap() mark @count blocks starting from @block used.
+ *
+ * Helper function for ext4_setup_new_group_blocks() which set .
+ *
+ * @sb: super block
+ * @handle: journal handle
+ * @flex_gd: flex group data
+ */
+static int set_flexbg_block_bitmap(struct super_block *sb, handle_t *handle,
+ struct ext4_new_flex_group_data *flex_gd,
+ ext4_fsblk_t block, ext4_group_t count)
+{
+ ext4_group_t count2;
+
+ ext4_debug("mark blocks [%llu/%u] used\n", block, count);
+ for (count2 = count; count > 0; count -= count2, block += count2) {
+ ext4_fsblk_t start;
+ struct buffer_head *bh;
+ ext4_group_t group;
+ int err;
+
+ ext4_get_group_no_and_offset(sb, block, &group, NULL);
+ start = ext4_group_first_block_no(sb, group);
+ group -= flex_gd->groups[0].group;
+
+ count2 = sb->s_blocksize * 8 - (block - start);
+ if (count2 > count)
+ count2 = count;
+
+ if (flex_gd->bg_flags[group] & EXT4_BG_BLOCK_UNINIT) {
+ BUG_ON(flex_gd->count > 1);
+ continue;
+ }
+
+ err = extend_or_restart_transaction(handle, 1);
+ if (err)
+ return err;
+
+ bh = sb_getblk(sb, flex_gd->groups[group].block_bitmap);
+ if (!bh)
+ return -EIO;
+
+ err = ext4_journal_get_write_access(handle, bh);
+ if (err)
+ return err;
+ ext4_debug("mark block bitmap %#04llx (+%llu/%u)\n", block,
+ block - start, count2);
+ ext4_set_bits(bh->b_data, block - start, count2);
+
+ err = ext4_handle_dirty_metadata(handle, NULL, bh);
+ if (unlikely(err))
+ return err;
+ brelse(bh);
+ }
+
+ return 0;
+}
+
+/*
+ * Set up the block and inode bitmaps, and the inode table for the new groups.
+ * This doesn't need to be part of the main transaction, since we are only
+ * changing blocks outside the actual filesystem. We still do journaling to
+ * ensure the recovery is correct in case of a failure just after resize.
+ * If any part of this fails, we simply abort the resize.
+ *
+ * setup_new_flex_group_blocks handles a flex group as follow:
+ * 1. copy super block and GDT, and initialize group tables if necessary.
+ * In this step, we only set bits in blocks bitmaps for blocks taken by
+ * super block and GDT.
+ * 2. allocate group tables in block bitmaps, that is, set bits in block
+ * bitmap for blocks taken by group tables.
+ */
+static int setup_new_flex_group_blocks(struct super_block *sb,
+ struct ext4_new_flex_group_data *flex_gd)
+{
+ int group_table_count[] = {1, 1, EXT4_SB(sb)->s_itb_per_group};
+ ext4_fsblk_t start;
+ ext4_fsblk_t block;
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+ struct ext4_new_group_data *group_data = flex_gd->groups;
+ __u16 *bg_flags = flex_gd->bg_flags;
+ handle_t *handle;
+ ext4_group_t group, count;
+ struct buffer_head *bh = NULL;
+ int reserved_gdb, i, j, err = 0, err2;
+
+ BUG_ON(!flex_gd->count || !group_data ||
+ group_data[0].group != sbi->s_groups_count);
+
+ reserved_gdb = le16_to_cpu(es->s_reserved_gdt_blocks);
+
+ /* This transaction may be extended/restarted along the way */
+ handle = ext4_journal_start_sb(sb, EXT4_MAX_TRANS_DATA);
+ if (IS_ERR(handle))
+ return PTR_ERR(handle);
+
+ group = group_data[0].group;
+ for (i = 0; i < flex_gd->count; i++, group++) {
+ unsigned long gdblocks;
+
+ gdblocks = ext4_bg_num_gdb(sb, group);
+ start = ext4_group_first_block_no(sb, group);
+
+ /* Copy all of the GDT blocks into the backup in this group */
+ for (j = 0, block = start + 1; j < gdblocks; j++, block++) {
+ struct buffer_head *gdb;
+
+ ext4_debug("update backup group %#04llx\n", block);
+ err = extend_or_restart_transaction(handle, 1);
+ if (err)
+ goto out;
+
+ gdb = sb_getblk(sb, block);
+ if (!gdb) {
+ err = -EIO;
+ goto out;
+ }
+
+ err = ext4_journal_get_write_access(handle, gdb);
+ if (err) {
+ brelse(gdb);
+ goto out;
+ }
+ memcpy(gdb->b_data, sbi->s_group_desc[j]->b_data,
+ gdb->b_size);
+ set_buffer_uptodate(gdb);
+
+ err = ext4_handle_dirty_metadata(handle, NULL, gdb);
+ if (unlikely(err)) {
+ brelse(gdb);
+ goto out;
+ }
+ brelse(gdb);
+ }
+
+ /* Zero out all of the reserved backup group descriptor
+ * table blocks
+ */
+ if (ext4_bg_has_super(sb, group)) {
+ err = sb_issue_zeroout(sb, gdblocks + start + 1,
+ reserved_gdb, GFP_NOFS);
+ if (err)
+ goto out;
+ }
+
+ /* Initialize group tables of the grop @group */
+ if (!(bg_flags[i] & EXT4_BG_INODE_ZEROED))
+ goto handle_bb;
+
+ /* Zero out all of the inode table blocks */
+ block = group_data[i].inode_table;
+ ext4_debug("clear inode table blocks %#04llx -> %#04lx\n",
+ block, sbi->s_itb_per_group);
+ err = sb_issue_zeroout(sb, block, sbi->s_itb_per_group,
+ GFP_NOFS);
+ if (err)
+ goto out;
+
+handle_bb:
+ if (bg_flags[i] & EXT4_BG_BLOCK_UNINIT)
+ goto handle_ib;
+
+ /* Initialize block bitmap of the @group */
+ block = group_data[i].block_bitmap;
+ err = extend_or_restart_transaction(handle, 1);
+ if (err)
+ goto out;
+
+ bh = bclean(handle, sb, block);
+ if (IS_ERR(bh)) {
+ err = PTR_ERR(bh);
+ goto out;
+ }
+ if (ext4_bg_has_super(sb, group)) {
+ ext4_debug("mark backup superblock %#04llx (+0)\n",
+ start);
+ ext4_set_bits(bh->b_data, 0, gdblocks + reserved_gdb +
+ 1);
+ }
+ ext4_mark_bitmap_end(group_data[0].blocks_count,
+ sb->s_blocksize * 8, bh->b_data);
+ err = ext4_handle_dirty_metadata(handle, NULL, bh);
+ if (err)
+ goto out;
+ brelse(bh);
+
+handle_ib:
+ if (bg_flags[i] & EXT4_BG_INODE_UNINIT)
+ continue;
+
+ /* Initialize inode bitmap of the @group */
+ block = group_data[i].inode_bitmap;
+ err = extend_or_restart_transaction(handle, 1);
+ if (err)
+ goto out;
+ /* Mark unused entries in inode bitmap used */
+ bh = bclean(handle, sb, block);
+ if (IS_ERR(bh)) {
+ err = PTR_ERR(bh);
+ goto out;
+ }
+ ext4_mark_bitmap_end(EXT4_INODES_PER_GROUP(sb),
+ sb->s_blocksize * 8, bh->b_data);
+ err = ext4_handle_dirty_metadata(handle, NULL, bh);
+ if (err)
+ goto out;
+ brelse(bh);
+ }
+ bh = NULL;
+
+ /* Mark group tables in block bitmap */
+ for (j = 0; j < GROUP_TABLE_COUNT; j++) {
+ count = group_table_count[j];
+ start = (&group_data[0].block_bitmap)[j];
+ block = start;
+ for (i = 1; i < flex_gd->count; i++) {
+ block += group_table_count[j];
+ if (block == (&group_data[i].block_bitmap)[j]) {
+ count += group_table_count[j];
+ continue;
+ }
+ err = set_flexbg_block_bitmap(sb, handle,
+ flex_gd, start, count);
+ if (err)
+ goto out;
+ count = group_table_count[j];
+ start = group_data[i].block_bitmap;
+ block = start;
+ }
+
+ if (count) {
+ err = set_flexbg_block_bitmap(sb, handle,
+ flex_gd, start, count);
+ if (err)
+ goto out;
+ }
+ }
+
+out:
+ brelse(bh);
+ err2 = ext4_journal_stop(handle);
+ if (err2 && !err)
+ err = err2;
+
+ return err;
+}
+
+/*
* Set up the block and inode bitmaps, and the inode table for the new group.
* This doesn't need to be part of the main transaction, since we are only
* changing blocks outside the actual filesystem. We still do journaling to
--
1.7.5.1
This patch adds a structure which will be used by 64bit-resize interface.
Two functions which allocate and destroy the structure respectively are
added.
Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 56 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 14be865..c586e51 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -134,6 +134,62 @@ static int verify_group_input(struct super_block *sb,
return err;
}
+/*
+ * ext4_new_flex_group_data is used by 64bit-resize interface to add a flex
+ * group each time.
+ */
+struct ext4_new_flex_group_data {
+ struct ext4_new_group_data *groups; /* new_group_data for groups
+ in the flex group */
+ __u16 *bg_flags; /* block group flags of groups
+ in @groups */
+ ext4_group_t count; /* number of groups in @groups
+ */
+};
+
+/*
+ * alloc_flex_gd() allocates a ext4_new_flex_group_data with size of
+ * @flexbg_size.
+ *
+ * Returns NULL on failure otherwise address of the allocated structure.
+ */
+static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned long flexbg_size)
+{
+ struct ext4_new_flex_group_data *flex_gd;
+
+ flex_gd = kmalloc(sizeof(*flex_gd), GFP_NOFS);
+ if (flex_gd == NULL)
+ goto out3;
+
+ flex_gd->count = flexbg_size;
+
+ flex_gd->groups = kmalloc(sizeof(struct ext4_new_group_data) *
+ flexbg_size, GFP_NOFS);
+ if (flex_gd->groups == NULL)
+ goto out2;
+
+ flex_gd->bg_flags = kmalloc(flexbg_size * sizeof(__u16), GFP_NOFS);
+ if (flex_gd->bg_flags == NULL)
+ goto out1;
+
+ return flex_gd;
+
+out1:
+ kfree(flex_gd->bg_flags);
+out2:
+ kfree(flex_gd->groups);
+out3:
+ kfree(flex_gd);
+ return NULL;
+}
+
+void free_flex_gd(struct ext4_new_flex_group_data *flex_gd)
+{
+ kfree(flex_gd->bg_flags);
+ kfree(flex_gd->groups);
+ kfree(flex_gd);
+}
+
static struct buffer_head *bclean(handle_t *handle, struct super_block *sb,
ext4_fsblk_t blk)
{
--
1.7.5.1
The 64bit resizer adds a flex group each time, so verify_reserved_gdb can
not use s_groups_count directly, it should use the number of group decriptors
before the added group.
Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 606de3a..86edf19 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -656,10 +656,10 @@ static unsigned ext4_list_backups(struct super_block *sb, unsigned *three,
* groups in current filesystem that have BACKUPS, or -ve error code.
*/
static int verify_reserved_gdb(struct super_block *sb,
+ ext4_group_t end,
struct buffer_head *primary)
{
const ext4_fsblk_t blk = primary->b_blocknr;
- const ext4_group_t end = EXT4_SB(sb)->s_groups_count;
unsigned three = 1;
unsigned five = 5;
unsigned seven = 7;
@@ -734,7 +734,7 @@ static int add_new_gdb(handle_t *handle, struct inode *inode,
if (!gdb_bh)
return -EIO;
- gdbackups = verify_reserved_gdb(sb, gdb_bh);
+ gdbackups = verify_reserved_gdb(sb, group, gdb_bh);
if (gdbackups < 0) {
err = gdbackups;
goto exit_bh;
@@ -897,7 +897,8 @@ static int reserve_backup_gdb(handle_t *handle, struct inode *inode,
err = -EIO;
goto exit_bh;
}
- if ((gdbackups = verify_reserved_gdb(sb, primary[res])) < 0) {
+ gdbackups = verify_reserved_gdb(sb, group, primary[res]);
+ if (gdbackups < 0) {
brelse(primary[res]);
err = gdbackups;
goto exit_bh;
--
1.7.5.1
This patch enables ext4_update_super() to handle a flex groups.
Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 58 +++++++++++++++++++++++++++++++++++++----------------
1 files changed, 40 insertions(+), 18 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 5939b62..606de3a 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1183,17 +1183,24 @@ static int ext4_setup_new_descs(handle_t *handle, struct super_block *sb,
}
/*
- * ext4_update_super() updates super so that new the added group can be seen
- * by the filesystem.
+ * ext4_update_super() updates super block so that new added groups can be seen
+ * by the filesystem.
*
* @sb: super block
+ * @flex_gd: new added groups
*/
static void ext4_update_super(struct super_block *sb,
- struct ext4_new_group_data *input)
+ struct ext4_new_flex_group_data *flex_gd)
{
+ ext4_fsblk_t blocks_count = 0;
+ ext4_fsblk_t free_blocks = 0;
+ ext4_fsblk_t reserved_blocks = 0;
+ struct ext4_new_group_data *group_data = flex_gd->groups;
struct ext4_sb_info *sbi = EXT4_SB(sb);
struct ext4_super_block *es = sbi->s_es;
+ int i;
+ BUG_ON(flex_gd->count == 0 || group_data == NULL);
/*
* Make the new blocks and inodes valid next. We do this before
* increasing the group count so that once the group is enabled,
@@ -1204,9 +1211,19 @@ static void ext4_update_super(struct super_block *sb,
* blocks/inodes before the group is live won't actually let us
* allocate the new space yet.
*/
- ext4_blocks_count_set(es, ext4_blocks_count(es) +
- input->blocks_count);
- le32_add_cpu(&es->s_inodes_count, EXT4_INODES_PER_GROUP(sb));
+ for (i = 0; i < flex_gd->count; i++) {
+ blocks_count += group_data[i].blocks_count;
+ free_blocks += group_data[i].free_blocks_count;
+ }
+
+ reserved_blocks = ext4_r_blocks_count(es) * 100;
+ do_div(reserved_blocks, ext4_blocks_count(es));
+ reserved_blocks *= blocks_count;
+ do_div(reserved_blocks, 100);
+
+ ext4_blocks_count_set(es, ext4_blocks_count(es) + blocks_count);
+ le32_add_cpu(&es->s_inodes_count, EXT4_INODES_PER_GROUP(sb) *
+ flex_gd->count);
/*
* We need to protect s_groups_count against other CPUs seeing
@@ -1214,11 +1231,11 @@ static void ext4_update_super(struct super_block *sb,
*
* The precise rules we use are:
*
- * * Writers must perform a smp_wmb() after updating all dependent
- * data and before modifying the groups count
+ * * Writers must perform a smp_wmb() after updating all
+ * dependent data and before modifying the groups count
*
- * * Readers must perform an smp_rmb() after reading the groups count
- * and before reading any dependent data.
+ * * Readers must perform an smp_rmb() after reading the groups
+ * count and before reading any dependent data.
*
* NB. These rules can be relaxed when checking the group count
* while freeing data, as we can only allocate from a block
@@ -1229,29 +1246,34 @@ static void ext4_update_super(struct super_block *sb,
smp_wmb();
/* Update the global fs size fields */
- sbi->s_groups_count++;
+ sbi->s_groups_count += flex_gd->count;
/* Update the reserved block counts only once the new group is
* active. */
ext4_r_blocks_count_set(es, ext4_r_blocks_count(es) +
- input->reserved_blocks);
+ reserved_blocks);
/* Update the free space counts */
percpu_counter_add(&sbi->s_freeblocks_counter,
- input->free_blocks_count);
+ free_blocks);
percpu_counter_add(&sbi->s_freeinodes_counter,
- EXT4_INODES_PER_GROUP(sb));
+ EXT4_INODES_PER_GROUP(sb) * flex_gd->count);
- if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG) &&
+ if (EXT4_HAS_INCOMPAT_FEATURE(sb,
+ EXT4_FEATURE_INCOMPAT_FLEX_BG) &&
sbi->s_log_groups_per_flex) {
ext4_group_t flex_group;
- flex_group = ext4_flex_group(sbi, input->group);
- atomic_add(input->free_blocks_count,
+ flex_group = ext4_flex_group(sbi, group_data[0].group);
+ atomic_add(free_blocks,
&sbi->s_flex_groups[flex_group].free_blocks);
- atomic_add(EXT4_INODES_PER_GROUP(sb),
+ atomic_add(EXT4_INODES_PER_GROUP(sb) * flex_gd->count,
&sbi->s_flex_groups[flex_group].free_inodes);
}
+ if (test_opt(sb, DEBUG))
+ printk(KERN_DEBUG "EXT4-fs: added group %u:"
+ "%llu blocks(%llu free %llu reserved)\n", flex_gd->count,
+ blocks_count, free_blocks, reserved_blocks);
}
/* Add group descriptor data to an existing or new group descriptor block.
--
1.7.5.1
This patch adds new online resize interface, whose input argument is a
64-bit integer indicating how many blocks there are in the resized fs.
In new resize impelmentation, all work like allocating group tables are done
by kernel side, so the new resize interface can support flex_bg feature and
prepares ground for suppoting resize with features like bigalloc and exclude
bitmap. Besides these, user-space tools just passes in the new number of blocks.
We delay initializing the bitmaps and inode tables of added groups if possible
and add multi groups (a flex groups) each time, so new resize is very fast like
mkfs.
Signed-off-by: Yongqiang Yang <[email protected]>
---
Documentation/filesystems/ext4.txt | 7 ++
fs/ext4/ext4.h | 2 +
fs/ext4/ioctl.c | 31 +++++++
fs/ext4/resize.c | 171 ++++++++++++++++++++++++++++++++++++
4 files changed, 211 insertions(+), 0 deletions(-)
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 232a575..d1548ab 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -590,6 +590,13 @@ Table of Ext4 specific ioctls
behaviour may change in the future as it is
not necessary and has been done this way only
for sake of simplicity.
+
+ EXT4_IOC_RESIZE_FS Resize the filesystem to a new size. The number
+ of blocks of resized filesystem is passed in via
+ 64 bit integer argument. The kernel allocates
+ bitmaps and inode table, the userspace tool thus
+ just passes the new number of blocks.
+
..............................................................................
References
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 334525d..4c36b92 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -557,6 +557,7 @@ enum {
/* note ioctl 11 reserved for filesystem-independent FIEMAP ioctl */
#define EXT4_IOC_ALLOC_DA_BLKS _IO('f', 12)
#define EXT4_IOC_MOVE_EXT _IOWR('f', 15, struct move_extent)
+#define EXT4_IOC_RESIZE_FS _IOW('f', 16, __u64)
#if defined(__KERNEL__) && defined(CONFIG_COMPAT)
/*
@@ -1880,6 +1881,7 @@ extern int ext4_group_add(struct super_block *sb,
extern int ext4_group_extend(struct super_block *sb,
struct ext4_super_block *es,
ext4_fsblk_t n_blocks_count);
+extern int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count);
/* super.c */
extern void *ext4_kvmalloc(size_t size, gfp_t flags);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index f18bfe3..eeaa1a4 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -335,6 +335,37 @@ mext_out:
return err;
}
+ case EXT4_IOC_RESIZE_FS: {
+ ext4_fsblk_t n_blocks_count;
+ struct super_block *sb = inode->i_sb;
+ int err = 0, err2 = 0;
+
+ err = ext4_resize_begin(sb);
+ if (err)
+ return err;
+
+ if (copy_from_user(&n_blocks_count, (__u64 __user *)arg,
+ sizeof(__u64)))
+ return -EFAULT;
+
+ err = mnt_want_write(filp->f_path.mnt);
+ if (err)
+ return err;
+
+ err = ext4_resize_fs(sb, n_blocks_count);
+ if (EXT4_SB(sb)->s_journal) {
+ jbd2_journal_lock_updates(EXT4_SB(sb)->s_journal);
+ err2 = jbd2_journal_flush(EXT4_SB(sb)->s_journal);
+ jbd2_journal_unlock_updates(EXT4_SB(sb)->s_journal);
+ }
+ if (err == 0)
+ err = err2;
+ mnt_drop_write(filp->f_path.mnt);
+ ext4_resize_end(sb);
+
+ return err;
+ }
+
case FITRIM:
{
struct super_block *sb = inode->i_sb;
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 14082c0..84e5ea3 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1869,3 +1869,174 @@ exit_journal:
exit:
return err;
}
+
+static int ext4_setup_next_flex_gd(struct super_block *sb,
+ struct ext4_new_flex_group_data *flex_gd,
+ ext4_fsblk_t n_blocks_count,
+ unsigned long flexbg_size)
+{
+ struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+ struct ext4_new_group_data *group_data = flex_gd->groups;
+ ext4_fsblk_t o_blocks_count;
+ ext4_group_t n_group;
+ ext4_group_t group;
+ ext4_group_t last_group;
+ ext4_grpblk_t last;
+ ext4_grpblk_t blocks_per_group;
+ unsigned long i;
+
+ blocks_per_group = EXT4_BLOCKS_PER_GROUP(sb);
+
+ o_blocks_count = ext4_blocks_count(es);
+
+ if (o_blocks_count == n_blocks_count)
+ return 0;
+
+ ext4_get_group_no_and_offset(sb, o_blocks_count, &group, &last);
+ BUG_ON(last);
+ ext4_get_group_no_and_offset(sb, n_blocks_count - 1, &n_group, &last);
+
+ last_group = group | (flexbg_size - 1);
+ if (last_group > n_group)
+ last_group = n_group;
+
+ flex_gd->count = last_group - group + 1;
+
+ for (i = 0; i < flex_gd->count; i++) {
+ int overhead;
+
+ group_data[i].group = group + i;
+ group_data[i].blocks_count = blocks_per_group;
+ overhead = ext4_bg_has_super(sb, group + i) ?
+ (1 + ext4_bg_num_gdb(sb, group + i) +
+ le16_to_cpu(es->s_reserved_gdt_blocks)) : 0;
+ group_data[i].free_blocks_count = blocks_per_group - overhead;
+ if (EXT4_HAS_RO_COMPAT_FEATURE(sb,
+ EXT4_FEATURE_RO_COMPAT_GDT_CSUM))
+ flex_gd->bg_flags[i] = EXT4_BG_BLOCK_UNINIT |
+ EXT4_BG_INODE_UNINIT;
+ else
+ flex_gd->bg_flags[i] = EXT4_BG_INODE_ZEROED;
+ }
+
+ if ((last_group == n_group) && (last != blocks_per_group - 1)) {
+ group_data[i - 1].blocks_count = last + 1;
+ group_data[i - 1].free_blocks_count -= blocks_per_group-
+ last - 1;
+ }
+
+ return 1;
+}
+
+/*
+ * ext4_resize_fs() resizes a fs to new size specified by @n_blocks_count
+ *
+ * @sb: super block of the fs to be resized
+ * @n_blocks_count: the number of blocks resides in the resized fs
+ */
+int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count)
+{
+ struct ext4_new_flex_group_data *flex_gd = NULL;
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+ struct buffer_head *bh;
+ struct inode *resize_inode;
+ ext4_fsblk_t o_blocks_count;
+ ext4_group_t o_group;
+ ext4_group_t n_group;
+ ext4_grpblk_t offset;
+ unsigned long n_desc_blocks;
+ unsigned long o_desc_blocks;
+ unsigned long desc_blocks;
+ int err = 0, flexbg_size = 1;
+
+ o_blocks_count = ext4_blocks_count(es);
+
+ if (test_opt(sb, DEBUG))
+ printk(KERN_DEBUG "EXT4-fs: resizing filesystem from %llu "
+ "upto %llu blocks\n", o_blocks_count, n_blocks_count);
+
+ if (n_blocks_count < o_blocks_count) {
+ /* On-line shrinking not supported */
+ ext4_warning(sb, "can't shrink FS - resize aborted");
+ return -EINVAL;
+ }
+
+ if (n_blocks_count == o_blocks_count)
+ /* Nothing need to do */
+ return 0;
+
+ ext4_get_group_no_and_offset(sb, n_blocks_count - 1, &n_group, &offset);
+ ext4_get_group_no_and_offset(sb, o_blocks_count, &o_group, &offset);
+
+ n_desc_blocks = (n_group + EXT4_DESC_PER_BLOCK(sb)) /
+ EXT4_DESC_PER_BLOCK(sb);
+ o_desc_blocks = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
+ EXT4_DESC_PER_BLOCK(sb);
+ desc_blocks = n_desc_blocks - o_desc_blocks;
+
+ if (desc_blocks &&
+ (!EXT4_HAS_COMPAT_FEATURE(sb, EXT4_FEATURE_COMPAT_RESIZE_INODE) ||
+ le16_to_cpu(es->s_reserved_gdt_blocks) < desc_blocks)) {
+ ext4_warning(sb, "No reserved GDT blocks, can't resize");
+ return -EPERM;
+ }
+
+ resize_inode = ext4_iget(sb, EXT4_RESIZE_INO);
+ if (IS_ERR(resize_inode)) {
+ ext4_warning(sb, "Error opening resize inode");
+ return PTR_ERR(resize_inode);
+ }
+
+ /* See if the device is actually as big as what was requested */
+ bh = sb_bread(sb, n_blocks_count - 1);
+ if (!bh) {
+ ext4_warning(sb, "can't read last block, resize aborted");
+ return -ENOSPC;
+ }
+ brelse(bh);
+
+ if (offset != 0) {
+ /* extend the last group */
+ ext4_grpblk_t add;
+ add = EXT4_BLOCKS_PER_GROUP(sb) - offset;
+ err = __ext4_group_extend(sb, o_blocks_count, add);
+ if (err)
+ goto out;
+ }
+
+ if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FLEX_BG) &&
+ es->s_log_groups_per_flex)
+ flexbg_size = 1 << es->s_log_groups_per_flex;
+
+ o_blocks_count = ext4_blocks_count(es);
+ if (o_blocks_count == n_blocks_count)
+ goto out;
+
+ flex_gd = alloc_flex_gd(flexbg_size);
+ if (flex_gd == NULL) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ /* Add flex groups. Note that a regular group is a
+ * flex group with 1 group.
+ */
+ while (ext4_setup_next_flex_gd(sb, flex_gd, n_blocks_count,
+ flexbg_size)) {
+ ext4_alloc_group_tables(sb, flex_gd, flexbg_size);
+ err = ext4_flex_group_add(sb, resize_inode, flex_gd);
+ if (unlikely(err))
+ break;
+ }
+
+out:
+ if (flex_gd)
+ free_flex_gd(flex_gd);
+
+ iput(resize_inode);
+ if (test_opt(sb, DEBUG))
+ printk(KERN_DEBUG "EXT4-fs: resized filesystem from %llu "
+ "upto %llu blocks\n", o_blocks_count, n_blocks_count);
+ return err;
+}
--
1.7.5.1
This patch adds a functon named ext4_add_new_descs() which adds
several group descriptors each time.
Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 25 +++++++++++++++++++++++++
1 files changed, 25 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 4acf7a8..7e91d58 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1083,6 +1083,31 @@ out:
}
/*
+ * ext4_add_new_descs() adds @count group descriptor of groups
+ * starting at @group
+ *
+ * @handle: journal handle
+ * @sb; super block
+ * @group: the group no. of the first group desc to be added
+ * @resize_inode: the resize inode
+ * @count: number of group descriptors to be added
+ */
+static int ext4_add_new_descs(handle_t *handle, struct super_block *sb,
+ ext4_group_t group, struct inode *resize_inode,
+ ext4_group_t count)
+{
+ int i, err = 0;
+
+ for (i = 0; i < count; i++) {
+ err = ext4_add_new_desc(handle, sb, group + i, resize_inode);
+ if (err)
+ return err;
+ }
+
+ return err;
+}
+
+/*
* ext4_setup_new_desc() sets up group descriptors specified by @input.
*
* @handle: journal handle
--
1.7.5.1
This patch adds a new function named ext4_allocates_group_table() which
allcoates block bitmaps, inode bitmaps and inode tables for a flex groups and is
used by resize code.
Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 111 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 86edf19..d4b892f 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -190,6 +190,117 @@ void free_flex_gd(struct ext4_new_flex_group_data *flex_gd)
kfree(flex_gd);
}
+/*
+ * ext4_allcoate_group_table() allocates block bitmaps, inode bitmaps and
+ * inode tables for a flex group.
+ *
+ * This function is used by 64bit-resize. Note that this function allocates
+ * group tables from the 1st group of groups contained by @flexgd, which may
+ * be a partial of a flex group.
+ *
+ * @sb: super block of fs to which the groups belongs
+ */
+static void ext4_alloc_group_tables(struct super_block *sb,
+ struct ext4_new_flex_group_data *flex_gd,
+ int flexbg_size)
+{
+ struct ext4_new_group_data *group_data = flex_gd->groups;
+ struct ext4_super_block *es = EXT4_SB(sb)->s_es;
+ ext4_fsblk_t start_blk;
+ ext4_fsblk_t last_blk;
+ ext4_group_t src_group;
+ ext4_group_t bb_index = 0;
+ ext4_group_t ib_index = 0;
+ ext4_group_t it_index = 0;
+ ext4_group_t group;
+ ext4_group_t last_group;
+ unsigned overhead;
+
+ BUG_ON(flex_gd->count == 0 || group_data == NULL);
+
+ src_group = group_data[0].group;
+ last_group = src_group + flex_gd->count - 1;
+
+ BUG_ON((flexbg_size > 1) && ((src_group & ~(flexbg_size - 1)) !=
+ (last_group & ~(flexbg_size - 1))));
+next_group:
+ group = group_data[0].group;
+ start_blk = ext4_group_first_block_no(sb, src_group);
+ last_blk = start_blk + group_data[src_group - group].blocks_count;
+
+ overhead = ext4_bg_has_super(sb, src_group) ?
+ (1 + ext4_bg_num_gdb(sb, src_group) +
+ le16_to_cpu(es->s_reserved_gdt_blocks)) : 0;
+
+ start_blk += overhead;
+
+ BUG_ON(src_group >= group_data[0].group + flex_gd->count);
+ /* We collect contiguous blocks as much as possible. */
+ src_group++;
+ for (; src_group <= last_group; src_group++)
+ if (!ext4_bg_has_super(sb, src_group))
+ last_blk += group_data[src_group - group].blocks_count;
+ else
+ break;
+
+ /* Allocate block bitmaps */
+ for (; bb_index < flex_gd->count; bb_index++) {
+ if (start_blk >= last_blk)
+ goto next_group;
+ group_data[bb_index].block_bitmap = start_blk++;
+ ext4_get_group_no_and_offset(sb, start_blk - 1, &group, NULL);
+ group -= group_data[0].group;
+ group_data[group].free_blocks_count--;
+ if (flexbg_size > 1)
+ flex_gd->bg_flags[group] &= ~EXT4_BG_BLOCK_UNINIT;
+ }
+
+ /* Allocate inode bitmaps */
+ for (; ib_index < flex_gd->count; ib_index++) {
+ if (start_blk >= last_blk)
+ goto next_group;
+ group_data[ib_index].inode_bitmap = start_blk++;
+ ext4_get_group_no_and_offset(sb, start_blk - 1, &group, NULL);
+ group -= group_data[0].group;
+ group_data[group].free_blocks_count--;
+ if (flexbg_size > 1)
+ flex_gd->bg_flags[group] &= ~EXT4_BG_BLOCK_UNINIT;
+ }
+
+ /* Allocate inode tables */
+ for (; it_index < flex_gd->count; it_index++) {
+ if (start_blk + EXT4_SB(sb)->s_itb_per_group > last_blk)
+ goto next_group;
+ group_data[it_index].inode_table = start_blk;
+ ext4_get_group_no_and_offset(sb, start_blk, &group, NULL);
+ group -= group_data[0].group;
+ group_data[group].free_blocks_count -=
+ EXT4_SB(sb)->s_itb_per_group;
+ if (flexbg_size > 1)
+ flex_gd->bg_flags[group] &= ~EXT4_BG_BLOCK_UNINIT;
+
+ start_blk += EXT4_SB(sb)->s_itb_per_group;
+ }
+
+ if (test_opt(sb, DEBUG)) {
+ int i;
+ group = group_data[0].group;
+
+ printk(KERN_DEBUG "EXT4-fs: adding a flex group with "
+ "%d groups, flexbg size is %d:\n", flex_gd->count,
+ flexbg_size);
+
+ for (i = 0; i < flex_gd->count; i++) {
+ printk(KERN_DEBUG "adding %s group %u: %u "
+ "blocks (%d free)\n",
+ ext4_bg_has_super(sb, group + i) ? "normal" :
+ "no-super", group + i,
+ group_data[i].blocks_count,
+ group_data[i].free_blocks_count);
+ }
+ }
+}
+
static struct buffer_head *bclean(handle_t *handle, struct super_block *sb,
ext4_fsblk_t blk)
{
--
1.7.5.1
This patch adds a new function named ext4_flex_group_add() which adds a
flex group to a fs. The function is used by 64bit-resize interface.
Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 82 insertions(+), 0 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index d4b892f..14082c0 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1787,3 +1787,85 @@ int ext4_group_extend(struct super_block *sb, struct ext4_super_block *es,
exit_put:
return err;
} /* ext4_group_extend */
+
+/* Add a flex group to an fs. Ensure we handle all possible error conditions
+ * _before_ we start modifying the filesystem, because we cannot abort the
+ * transaction and not have it write the data to disk.
+ */
+static int ext4_flex_group_add(struct super_block *sb,
+ struct inode *resize_inode,
+ struct ext4_new_flex_group_data *flex_gd)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+ ext4_fsblk_t o_blocks_count;
+ ext4_grpblk_t last;
+ ext4_group_t group;
+ handle_t *handle;
+ unsigned reserved_gdb;
+ int err = 0, err2 = 0, credit;
+
+ BUG_ON(!flex_gd->count || !flex_gd->groups || !flex_gd->bg_flags);
+
+ reserved_gdb = le16_to_cpu(es->s_reserved_gdt_blocks);
+ o_blocks_count = ext4_blocks_count(es);
+ ext4_get_group_no_and_offset(sb, o_blocks_count, &group, &last);
+ BUG_ON(last);
+
+ err = setup_new_flex_group_blocks(sb, flex_gd);
+ if (err)
+ goto exit;
+ /*
+ * We will always be modifying at least the superblock and GDT
+ * block. If we are adding a group past the last current GDT block,
+ * we will also modify the inode and the dindirect block. If we
+ * are adding a group with superblock/GDT backups we will also
+ * modify each of the reserved GDT dindirect blocks.
+ */
+ credit = flex_gd->count * 4 + reserved_gdb;
+ handle = ext4_journal_start_sb(sb, credit);
+ if (IS_ERR(handle)) {
+ err = PTR_ERR(handle);
+ goto exit;
+ }
+
+ err = ext4_journal_get_write_access(handle, sbi->s_sbh);
+ if (err)
+ goto exit_journal;
+
+ group = flex_gd->groups[0].group;
+ BUG_ON(group != EXT4_SB(sb)->s_groups_count);
+ err = ext4_add_new_descs(handle, sb, group,
+ resize_inode, flex_gd->count);
+ if (err)
+ goto exit_journal;
+
+ err = ext4_setup_new_descs(handle, sb, flex_gd);
+ if (err)
+ goto exit_journal;
+
+ ext4_update_super(sb, flex_gd);
+
+ err = ext4_handle_dirty_super(handle, sb);
+
+exit_journal:
+ err2 = ext4_journal_stop(handle);
+ if (!err)
+ err = err2;
+
+ if (!err) {
+ int i;
+ update_backups(sb, sbi->s_sbh->b_blocknr, (char *)es,
+ sizeof(struct ext4_super_block));
+ for (i = 0; i < flex_gd->count; i++, group++) {
+ struct buffer_head *gdb_bh;
+ int gdb_num;
+ gdb_num = group / EXT4_BLOCKS_PER_GROUP(sb);
+ gdb_bh = sbi->s_group_desc[gdb_num];
+ update_backups(sb, gdb_bh->b_blocknr, gdb_bh->b_data,
+ gdb_bh->b_size);
+ }
+ }
+exit:
+ return err;
+}
--
1.7.5.1
This patch adds a function named ext4_setup_new_descs() which sets up
a flex groups each time.
Signed-off-by: Yongqiang Yang <[email protected]>
---
fs/ext4/resize.c | 25 +++++++++++++++++++++++--
1 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 7e91d58..5939b62 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1114,7 +1114,8 @@ static int ext4_add_new_descs(handle_t *handle, struct super_block *sb,
* @sb: super block
*/
static int ext4_setup_new_desc(handle_t *handle, struct super_block *sb,
- struct ext4_new_group_data *input)
+ struct ext4_new_group_data *input,
+ __u16 bg_flags)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
ext4_group_t group;
@@ -1143,7 +1144,7 @@ static int ext4_setup_new_desc(handle_t *handle, struct super_block *sb,
ext4_inode_table_set(sb, gdp, input->inode_table); /* LV FIXME */
ext4_free_blks_set(sb, gdp, input->free_blocks_count);
ext4_free_inodes_set(sb, gdp, EXT4_INODES_PER_GROUP(sb));
- gdp->bg_flags = cpu_to_le16(EXT4_BG_INODE_ZEROED);
+ gdp->bg_flags = cpu_to_le16(bg_flags);
gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);
err = ext4_handle_dirty_metadata(handle, NULL, gdb_bh);
@@ -1162,6 +1163,26 @@ static int ext4_setup_new_desc(handle_t *handle, struct super_block *sb,
}
/*
+ * ext4_setup_new_descs setups group descriptors of a flex groups
+ */
+static int ext4_setup_new_descs(handle_t *handle, struct super_block *sb,
+ struct ext4_new_flex_group_data *flex_gd)
+{
+ struct ext4_new_group_data *group_data = flex_gd->groups;
+ __u16 *bg_flags = flex_gd->bg_flags;
+ int i, err = 0;
+
+ for (i = 0; i < flex_gd->count; i++) {
+ err = ext4_setup_new_desc(handle, sb, group_data + i,
+ bg_flags[i]);
+ if (err)
+ return err;
+ }
+
+ return err;
+}
+
+/*
* ext4_update_super() updates super so that new the added group can be seen
* by the filesystem.
*
--
1.7.5.1
On 2011-08-10, at 9:28 PM, Yongqiang Yang wrote:
> This patch added a function named __ext4_group_extend() whose code
> is copied from ext4_group_extend(). __ext4_group_extend() assumes
> the parameter is valid and has been checked by caller.
>
> __ext4_group_extend() will be used by new resize implementation. It
> can also be used by ext4_group_extend(), but this patch series does
> not do this.
Since this is duplicating a lot of code from ext4_group_extend(), this
patch should be written in such a way that this new function is added,
and the duplicate code is removed from ext4_group_extend() and calls
the new function instead.
It looks like all of these patches are adding a completely duplicate set
of functions for doing the resizing, even though they are largely the
same as the existing code, and it will mean duplicate efforts to maintain
both copies of the code.
> Signed-off-by: Yongqiang Yang <[email protected]>
> ---
> fs/ext4/resize.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 53 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 707d3f1..6ffbdb6 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -969,6 +969,59 @@ exit_put:
> } /* ext4_group_add */
>
> /*
> + * extend a group without checking assuming that checking has been done.
> + */
> +static int __ext4_group_extend(struct super_block *sb,
> + ext4_fsblk_t o_blocks_count, ext4_grpblk_t add)
> +{
> + struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> + handle_t *handle;
> + int err = 0, err2;
> +
> + /* We will update the superblock, one block bitmap, and
> + * one group descriptor via ext4_ext4_group_add_blocks().
Typo here: "ext4_ext4"
> + */
> + handle = ext4_journal_start_sb(sb, 3);
> + if (IS_ERR(handle)) {
> + err = PTR_ERR(handle);
> + ext4_warning(sb, "error %d on journal start", err);
> + goto out;
> + }
> +
> + err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh);
> + if (err) {
> + ext4_warning(sb, "error %d on journal write access", err);
> + ext4_journal_stop(handle);
> + goto out;
> + }
> +
> + ext4_blocks_count_set(es, o_blocks_count + add);
> + ext4_debug("freeing blocks %llu through %llu\n", o_blocks_count,
> + o_blocks_count + add);
> + /* We add the blocks to the bitmap and set the group need init bit */
> + err = ext4_group_add_blocks(handle, sb, o_blocks_count, add);
> + if (err)
> + goto exit_journal;
> + ext4_handle_dirty_super(handle, sb);
> + ext4_debug("freed blocks %llu through %llu\n", o_blocks_count,
> + o_blocks_count + add);
> +exit_journal:
> + err2 = ext4_journal_stop(handle);
> + if (err2 && !err)
> + err = err2;
> +
> + if (!err) {
> + if (test_opt(sb, DEBUG))
> + printk(KERN_DEBUG "EXT4-fs: extended group to %llu "
> + "blocks\n", ext4_blocks_count(es));
> + update_backups(sb, EXT4_SB(sb)->s_sbh->b_blocknr, (char *)es,
> + sizeof(struct ext4_super_block));
> + }
> +out:
> + return err;
> +}
> +
> +/*
> * Extend the filesystem to the new number of blocks specified. This entry
> * point is only used to extend the current filesystem to the end of the last
> * existing group. It can be accessed via ioctl, or by "remount,resize=<size>"
> --
> 1.7.5.1
>
Cheers, Andreas
On 2011-08-10, at 9:28 PM, Yongqiang Yang wrote:
> This patch adds a function named ext4_add_new_desc() which adds
> a new desc to a fs and whose code is copied from ext4_group_add().
>
> The function will be used by new resize implementation.
Similarly, this is duplicating a hunk of code from the middle of
ext4_group_add(), and instead of just adding a second copy of that
code, ext4_group_add() should be changed to call this new function
to avoid the code duplication.
> Signed-off-by: Yongqiang Yang <[email protected]>
> ---
> fs/ext4/resize.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 42 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 6ffbdb6..4fcd515 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -735,6 +735,48 @@ exit_err:
> }
> }
>
> +/*
> + * ext4_add_new_desc() adds group descriptor of group @group
> + *
> + * @handle: journal handle
> + * @sb; super block
> + * @group: the group no. of the first group desc to be added
> + * @resize_inode: the resize inode
> + */
> +static int ext4_add_new_desc(handle_t *handle, struct super_block *sb,
> + ext4_group_t group, struct inode *resize_inode)
> +{
> + struct ext4_sb_info *sbi = EXT4_SB(sb);
> + struct ext4_super_block *es = sbi->s_es;
> + struct buffer_head *gdb_bh;
> + int gdb_off, gdb_num, err = 0;
> + int reserved_gdb = ext4_bg_has_super(sb, group) ?
> + le16_to_cpu(es->s_reserved_gdt_blocks) : 0;
> +
> + gdb_off = group % EXT4_DESC_PER_BLOCK(sb);
> + gdb_num = group / EXT4_DESC_PER_BLOCK(sb);
> +
> + /*
> + * We will only either add reserved group blocks to a backup group
> + * or remove reserved blocks for the first group in a new group block.
> + * Doing both would be mean more complex code, and sane people don't
> + * use non-sparse filesystems anymore. This is already checked above.
> + */
> + if (gdb_off) {
> + gdb_bh = sbi->s_group_desc[gdb_num];
> + err = ext4_journal_get_write_access(handle, gdb_bh);
> + if (err)
> + goto out;
> +
> + if (reserved_gdb && ext4_bg_num_gdb(sb, group))
> + err = reserve_backup_gdb(handle, resize_inode, group);
> + } else
> + err = add_new_gdb(handle, resize_inode, group);
> +
> +out:
> + return err;
> +}
> +
> /* Add group descriptor data to an existing or new group descriptor block.
> * Ensure we handle all possible error conditions _before_ we start modifying
> * the filesystem, because we cannot abort the transaction and not have it
> --
> 1.7.5.1
>
Cheers, Andreas
On Thu, Aug 11, 2011 at 1:47 PM, Andreas Dilger <[email protected]> wrote:
> On 2011-08-10, at 9:28 PM, Yongqiang Yang wrote:
>> This patch added a function named __ext4_group_extend() whose code
>> is copied from ext4_group_extend(). ?__ext4_group_extend() assumes
>> the parameter is valid and has been checked by caller.
>>
>> __ext4_group_extend() will be used by new resize implementation. It
>> can also be used by ext4_group_extend(), but this patch series does
>> not do this.
>
> Since this is duplicating a lot of code from ext4_group_extend(), this
> patch should be written in such a way that this new function is added,
> and the duplicate code is removed from ext4_group_extend() and calls
> the new function instead.
>
> It looks like all of these patches are adding a completely duplicate set
> of functions for doing the resizing, even though they are largely the
> same as the existing code, and it will mean duplicate efforts to maintain
> both copies of the code.
YES! This needs some feedbacks, I thought the old resize interface
will be removed after the new resize interface goes into upstream. So
I did not touch
old interface. If we will remain the old interface, I can make some
patches which let old resize use common code instead.
Thank you for your review.
Yongqiang.
>
>> Signed-off-by: Yongqiang Yang <[email protected]>
>> ---
>> fs/ext4/resize.c | ? 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 53 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
>> index 707d3f1..6ffbdb6 100644
>> --- a/fs/ext4/resize.c
>> +++ b/fs/ext4/resize.c
>> @@ -969,6 +969,59 @@ exit_put:
>> } /* ext4_group_add */
>>
>> /*
>> + * extend a group without checking assuming that checking has been done.
>> + */
>> +static int __ext4_group_extend(struct super_block *sb,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?ext4_fsblk_t o_blocks_count, ext4_grpblk_t add)
>> +{
>> + ? ? struct ext4_super_block *es = EXT4_SB(sb)->s_es;
>> + ? ? handle_t *handle;
>> + ? ? int err = 0, err2;
>> +
>> + ? ? /* We will update the superblock, one block bitmap, and
>> + ? ? ?* one group descriptor via ext4_ext4_group_add_blocks().
>
> Typo here: "ext4_ext4"
>
>> + ? ? ?*/
>> + ? ? handle = ext4_journal_start_sb(sb, 3);
>> + ? ? if (IS_ERR(handle)) {
>> + ? ? ? ? ? ? err = PTR_ERR(handle);
>> + ? ? ? ? ? ? ext4_warning(sb, "error %d on journal start", err);
>> + ? ? ? ? ? ? goto out;
>> + ? ? }
>> +
>> + ? ? err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh);
>> + ? ? if (err) {
>> + ? ? ? ? ? ? ext4_warning(sb, "error %d on journal write access", err);
>> + ? ? ? ? ? ? ext4_journal_stop(handle);
>> + ? ? ? ? ? ? goto out;
>> + ? ? }
>> +
>> + ? ? ext4_blocks_count_set(es, o_blocks_count + add);
>> + ? ? ext4_debug("freeing blocks %llu through %llu\n", o_blocks_count,
>> + ? ? ? ? ? ? ? ?o_blocks_count + add);
>> + ? ? /* We add the blocks to the bitmap and set the group need init bit */
>> + ? ? err = ext4_group_add_blocks(handle, sb, o_blocks_count, add);
>> + ? ? if (err)
>> + ? ? ? ? ? ? goto exit_journal;
>> + ? ? ext4_handle_dirty_super(handle, sb);
>> + ? ? ext4_debug("freed blocks %llu through %llu\n", o_blocks_count,
>> + ? ? ? ? ? ? ? ?o_blocks_count + add);
>> +exit_journal:
>> + ? ? err2 = ext4_journal_stop(handle);
>> + ? ? if (err2 && !err)
>> + ? ? ? ? ? ? err = err2;
>> +
>> + ? ? if (!err) {
>> + ? ? ? ? ? ? if (test_opt(sb, DEBUG))
>> + ? ? ? ? ? ? ? ? ? ? printk(KERN_DEBUG "EXT4-fs: extended group to %llu "
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?"blocks\n", ext4_blocks_count(es));
>> + ? ? ? ? ? ? update_backups(sb, EXT4_SB(sb)->s_sbh->b_blocknr, (char *)es,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?sizeof(struct ext4_super_block));
>> + ? ? }
>> +out:
>> + ? ? return err;
>> +}
>> +
>> +/*
>> ?* Extend the filesystem to the new number of blocks specified. ?This entry
>> ?* point is only used to extend the current filesystem to the end of the last
>> ?* existing group. ?It can be accessed via ioctl, or by "remount,resize=<size>"
>> --
>> 1.7.5.1
>>
>
>
> Cheers, Andreas
>
>
>
>
>
>
--
Best Wishes
Yongqiang Yang
On 2011-08-10, at 9:28 PM, Yongqiang Yang wrote:
> This patch adds a function named ext4_setup_new_desc() which sets
> up a new group descriptor and whose code is sopied from ext4_group_add().
>
> The function will be used by new resize implementation.
Again, duplicating a big hunk of ext4_group_add(). Similar comments apply.
Another question is whether this new resize code is safe from crashes?
One of the original design goals of the resize code is that it would never
leave a filesystem inconsistent if it crashed in the middle.
The way that these patches are looking, it seems that they may not be safe
in this regard, and possibly leave the filesystem in an inconsistent state
if they crash in the middle. Maybe I'm missing something?
> Signed-off-by: Yongqiang Yang <[email protected]>
> ---
> fs/ext4/resize.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 54 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 4fcd515..6320baa 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -777,6 +777,60 @@ out:
> return err;
> }
>
> +/*
> + * ext4_setup_new_desc() sets up group descriptors specified by @input.
> + *
> + * @handle: journal handle
> + * @sb: super block
> + */
> +static int ext4_setup_new_desc(handle_t *handle, struct super_block *sb,
> + struct ext4_new_group_data *input)
> +{
> + struct ext4_sb_info *sbi = EXT4_SB(sb);
> + ext4_group_t group;
> + struct ext4_group_desc *gdp;
> + struct buffer_head *gdb_bh;
> + int gdb_off, gdb_num, err = 0;
> +
> + group = input->group;
> +
> + gdb_off = group % EXT4_DESC_PER_BLOCK(sb);
> + gdb_num = group / EXT4_DESC_PER_BLOCK(sb);
> +
> + /*
> + * get_write_access() has been called on gdb_bh by ext4_add_new_desc().
> + */
> + gdb_bh = sbi->s_group_desc[gdb_num];
> + /* Update group descriptor block for new group */
> + gdp = (struct ext4_group_desc *)((char *)gdb_bh->b_data +
> + gdb_off * EXT4_DESC_SIZE(sb));
> +
> + memset(gdp, 0, EXT4_DESC_SIZE(sb));
> + /* LV FIXME */
> + memset(gdp, 0, EXT4_DESC_SIZE(sb));
> + ext4_block_bitmap_set(sb, gdp, input->block_bitmap); /* LV FIXME */
> + ext4_inode_bitmap_set(sb, gdp, input->inode_bitmap); /* LV FIXME */
> + ext4_inode_table_set(sb, gdp, input->inode_table); /* LV FIXME */
> + ext4_free_blks_set(sb, gdp, input->free_blocks_count);
> + ext4_free_inodes_set(sb, gdp, EXT4_INODES_PER_GROUP(sb));
> + gdp->bg_flags = cpu_to_le16(EXT4_BG_INODE_ZEROED);
> + gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);
> +
> + err = ext4_handle_dirty_metadata(handle, NULL, gdb_bh);
> + if (unlikely(err)) {
> + ext4_std_error(sb, err);
> + return err;
> + }
> +
> + /*
> + * We can allocate memory for mb_alloc based on the new group
> + * descriptor
> + */
> + err = ext4_mb_add_groupinfo(sb, group, gdp);
> +
> + return err;
> +}
> +
> /* Add group descriptor data to an existing or new group descriptor block.
> * Ensure we handle all possible error conditions _before_ we start modifying
> * the filesystem, because we cannot abort the transaction and not have it
> --
> 1.7.5.1
>
Cheers, Andreas
TO: Yongqiang
2011/8/11 Yongqiang Yang <[email protected]>:
> This patch adds a structure which will be used by 64bit-resize interface.
> Two functions which allocate and destroy the structure respectively are
> added.
>
> Signed-off-by: Yongqiang Yang <[email protected]>
> ---
> ?fs/ext4/resize.c | ? 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ?1 files changed, 56 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 14be865..c586e51 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -134,6 +134,62 @@ static int verify_group_input(struct super_block *sb,
> ? ? ? ?return err;
> ?}
>
> +/*
> + * ext4_new_flex_group_data is used by 64bit-resize interface to add a flex
> + * group each time.
> + */
> +struct ext4_new_flex_group_data {
> + ? ? ? struct ext4_new_group_data *groups; ? ? /* new_group_data for groups
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?in the flex group */
> + ? ? ? __u16 *bg_flags; ? ? ? ? ? ? ? ? ? ? ? ?/* block group flags of groups
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?in @groups */
> + ? ? ? ext4_group_t count; ? ? ? ? ? ? ? ? ? ? /* number of groups in @groups
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?*/
> +};
> +
> +/*
> + * alloc_flex_gd() allocates a ext4_new_flex_group_data with size of
> + * @flexbg_size.
> + *
> + * Returns NULL on failure otherwise address of the allocated structure.
> + */
> +static struct ext4_new_flex_group_data *alloc_flex_gd(unsigned long flexbg_size)
> +{
> + ? ? ? struct ext4_new_flex_group_data *flex_gd;
> +
> + ? ? ? flex_gd = kmalloc(sizeof(*flex_gd), GFP_NOFS);
> + ? ? ? if (flex_gd == NULL) {
printk( KERN_WARNING "not enough memory for flex_gd\n" );
> + ? ? ? ? ? ? ? goto out3;
}
> +
> + ? ? ? flex_gd->count = flexbg_size;
> +
> + ? ? ? flex_gd->groups = kmalloc(sizeof(struct ext4_new_group_data) *
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? flexbg_size, GFP_NOFS);
> + ? ? ? if (flex_gd->groups == NULL) {
printk( KERN_WARNING "not enough memory for
flex_gd->groups\n" );
> + ? ? ? ? ? ? ? goto out2;
}
> +
> + ? ? ? flex_gd->bg_flags = kmalloc(flexbg_size * sizeof(__u16), GFP_NOFS);
> + ? ? ? if (flex_gd->bg_flags == NULL) {
printk( KERN_WARNING "not enough memory for
flex_gd->bg_flags\n" );
> + ? ? ? ? ? ? ? goto out1;
}
> +
> + ? ? ? return flex_gd;
> +
out1:
? ? kfree(flex_gd->groups);
> +out2:
> + ? ? ? kfree(flex_gd);
out3:
> + ? ? ? return NULL;
> +}
> +
> +void free_flex_gd(struct ext4_new_flex_group_data *flex_gd)
> +{
> + ? ? ? kfree(flex_gd->bg_flags);
> + ? ? ? kfree(flex_gd->groups);
> + ? ? ? kfree(flex_gd);
> +}
> +
> ?static struct buffer_head *bclean(handle_t *handle, struct super_block *sb,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?ext4_fsblk_t blk)
> ?{
What about add some message for kmalloc failure, and goto the label
looks like the above ?
On Thu, Aug 11, 2011 at 2:42 PM, Andreas Dilger <[email protected]> wrote:
> On 2011-08-10, at 9:28 PM, Yongqiang Yang wrote:
>> This patch adds a function named ext4_setup_new_desc() which sets
>> up a new group descriptor and whose code is sopied from ext4_group_add().
>>
>> The function will be used by new resize implementation.
>
> Again, duplicating a big hunk of ext4_group_add(). ?Similar comments apply.
>
> Another question is whether this new resize code is safe from crashes?
> One of the original design goals of the resize code is that it would never
> leave a filesystem inconsistent if it crashed in the middle.
>
> The way that these patches are looking, it seems that they may not be safe
> in this regard, and possibly leave the filesystem in an inconsistent state
> if they crash in the middle. ?Maybe I'm missing something?
If journal is used, journal can bring the crashed fs to consistent
state like old resize. The logic of new resize is the same as old
resize except adding multi groups each time.
I will check it further.
Thanks!
Yongqiang.
>
>> Signed-off-by: Yongqiang Yang <[email protected]>
>> ---
>> fs/ext4/resize.c | ? 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 54 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
>> index 4fcd515..6320baa 100644
>> --- a/fs/ext4/resize.c
>> +++ b/fs/ext4/resize.c
>> @@ -777,6 +777,60 @@ out:
>> ? ? ? return err;
>> }
>>
>> +/*
>> + * ext4_setup_new_desc() sets up group descriptors specified by @input.
>> + *
>> + * @handle: journal handle
>> + * @sb: super block
>> + */
>> +static int ext4_setup_new_desc(handle_t *handle, struct super_block *sb,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ?struct ext4_new_group_data *input)
>> +{
>> + ? ? struct ext4_sb_info *sbi = EXT4_SB(sb);
>> + ? ? ext4_group_t group;
>> + ? ? struct ext4_group_desc *gdp;
>> + ? ? struct buffer_head *gdb_bh;
>> + ? ? int gdb_off, gdb_num, err = 0;
>> +
>> + ? ? group = input->group;
>> +
>> + ? ? gdb_off = group % EXT4_DESC_PER_BLOCK(sb);
>> + ? ? gdb_num = group / EXT4_DESC_PER_BLOCK(sb);
>> +
>> + ? ? /*
>> + ? ? ?* get_write_access() has been called on gdb_bh by ext4_add_new_desc().
>> + ? ? ?*/
>> + ? ? gdb_bh = sbi->s_group_desc[gdb_num];
>> + ? ? /* Update group descriptor block for new group */
>> + ? ? gdp = (struct ext4_group_desc *)((char *)gdb_bh->b_data +
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?gdb_off * EXT4_DESC_SIZE(sb));
>> +
>> + ? ? memset(gdp, 0, EXT4_DESC_SIZE(sb));
>> + ? ? ?/* LV FIXME */
>> + ? ? memset(gdp, 0, EXT4_DESC_SIZE(sb));
>> + ? ? ext4_block_bitmap_set(sb, gdp, input->block_bitmap); /* LV FIXME */
>> + ? ? ext4_inode_bitmap_set(sb, gdp, input->inode_bitmap); /* LV FIXME */
>> + ? ? ext4_inode_table_set(sb, gdp, input->inode_table); /* LV FIXME */
>> + ? ? ext4_free_blks_set(sb, gdp, input->free_blocks_count);
>> + ? ? ext4_free_inodes_set(sb, gdp, EXT4_INODES_PER_GROUP(sb));
>> + ? ? gdp->bg_flags = cpu_to_le16(EXT4_BG_INODE_ZEROED);
>> + ? ? gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);
>> +
>> + ? ? err = ext4_handle_dirty_metadata(handle, NULL, gdb_bh);
>> + ? ? if (unlikely(err)) {
>> + ? ? ? ? ? ? ext4_std_error(sb, err);
>> + ? ? ? ? ? ? return err;
>> + ? ? }
>> +
>> + ? ? /*
>> + ? ? ?* We can allocate memory for mb_alloc based on the new group
>> + ? ? ?* descriptor
>> + ? ? ?*/
>> + ? ? err = ext4_mb_add_groupinfo(sb, group, gdp);
>> +
>> + ? ? return err;
>> +}
>> +
>> /* Add group descriptor data to an existing or new group descriptor block.
>> ?* Ensure we handle all possible error conditions _before_ we start modifying
>> ?* the filesystem, because we cannot abort the transaction and not have it
>> --
>> 1.7.5.1
>>
>
>
> Cheers, Andreas
>
>
>
>
>
>
--
Best Wishes
Yongqiang Yang
On Tue, Aug 16, 2011 at 8:22 AM, Justin Maggard <[email protected]> wrote:
> Hi,
Hi Justin,
> I saw your patch, and I am excited to see online resize support of very
> large filesystems. ?I was hoping you could answer a few additional questions
> for me.
My pleasure.
> Does this patch set combined with your e2fsprogs patch add 64-bit resize
> support now, or does it just make it easier to add later?
YES. e2fsprgos's patch is ready too.
> If I am making a 64-bit ext4 filesystem today (20TB), and hoping to resize
> it next year to 30TB what features should I set? ?In my searching it sounded
> like maybe I would need meta_bg, but it is not compatible with the default
> resize_inode.
You can understand meta_bg here http://linuxsoftware.co.nz/wiki/ext4.
Now, ext4 with meta_bg does not support resize. It is in ext4's TODO list.
The feature you should set is resize_inode.
> Also, if I am making a <16TB filesystem today, should I turn on the 64-bit
> flag in order to expand to >16TB in the future?
Yes. You should turn on 64 bit feature. If the block number is 32
bit, the size it can support is 2^32 * 2^(log blocksize), 4K
blocksize as an example, it maximum size of a filesystem is 2^32 *
2^12 = 2^44 = 16TB.
> Thank you for your time,
You are welcome.
> -Justin
> On Wed, Aug 10, 2011 at 8:28 PM, Yongqiang Yang <[email protected]>
> wrote:
>>
>> Hi all,
>>
>> This patch series adds new resize implementation to ext4.
>>
>> -- What's new resize implementation?
>> ? It is a new online resize interface for ext4. ?It can be used via
>> ? ioctl with EXT4_IOC_RESIZE_FS and a 64 bit integer indicating size
>> ? of the resized fs in block.
>>
>> -- Difference between current resize and new resize.
>> ? New resize lets kernel do all work, like allocating bitmaps and
>> ? inode tables and can support flex_bg and BLOCK_UNINIT features.
>> ? Besides these, new resize is much faster than current resize.
>>
>> ? Below are benchmarks I made on my personal computer, fses with
>> ? flex_bg size = 16 were resized to 230GB evry time. The first
>> ? row shows the size of a fs from which the fs was resized to 230GB.
>> ? The datas were collected by 'time resize2fs'.
>>
>> ? ? ? ? ? ? ? ? ? ? ?new resize
>> ? ? ? ? ? ? ? ?20GB ? ? ? ? ?50GB ? ? ?100GB
>> ? ? ?real ? ?0m3.558s ? ? 0m2.891s ? ?0m0.394s
>> ? ? ?user ? ?0m0.004s ? ? 0m0.000s ? ?0m0.394s
>> ? ? ?sys ? ? 0m0.048s ? ? 0m0.048s ? ?0m0.028s
>>
>> ? ? ? ? ? ? ? ? ? ? ?current resize
>> ? ? ? ? ? ? ? ?20GB ? ? ? ? ?50GB ? ? ?100GB
>> ? ? ?real ? ?5m2.770s ? ? 4m43.757s ?3m14.840s
>> ? ? ?user ? ?0m0.040s ? ? 0m0.032s ? 0m0.024s
>> ? ? ?sys ? ? 0m0.464s ? ? 0m0.432s ? 0m0.324s
>>
>> ? According to data above, new resize is faster than current resize in
>> both
>> ? user and sys time. ?New resize performs well in sys time, because it
>> ? supports BLOCK_UNINIT and adds multi-groups each time.
>>
>> -- About supporting new features.
>> ? YES! New resize can support new feature like bigalloc and exclude bitmap
>> ? easily. ?Because it lets kernel do all work.
>>
>> [PATCH 01/13] ext4: add a function which extends a group without
>> [PATCH 02/13] ext4: add a function which adds a new desc to a fs
>> [PATCH 03/13] ext4: add a function which sets up a new group desc
>> [PATCH 04/13] ext4: add a function which updates super block
>> [PATCH 05/13] ext4: add a structure which will be used by
>> [PATCH 06/13] ext4: add a function which sets up group blocks of a
>> [PATCH 07/13] ext4: add a function which adds several group
>> [PATCH 08/13] ext4: add a function which sets up a flex groups each
>> [PATCH 09/13] ext4: enable ext4_update_super() to handle a flex
>> [PATCH 10/13] ext4: pass verify_reserved_gdb() the number of group
>> [PATCH 11/13] ext4: add a new function which allocates bitmaps and
>> [PATCH 12/13] ext4: add a new function which adds a flex group to a
>> [PATCH 13/13] ext4: add new online resize interface
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>
>
--
Best Wishes
Yongqiang Yang
On Thu, Aug 11, 2011 at 02:27:41PM +0800, Yongqiang Yang wrote:
> YES! This needs some feedbacks, I thought the old resize interface
> will be removed after the new resize interface goes into upstream. So
> I did not touch
> old interface. If we will remain the old interface, I can make some
> patches which let old resize use common code instead.
Yes, the backwards compatibility requirements for the kernel means
that we have to keep the old ioctl's working for at least 2-3 years...
- Ted
On 2011-08-21, at 11:07 AM, Ted Ts'o <[email protected]> wrote:
> On Thu, Aug 11, 2011 at 02:27:41PM +0800, Yongqiang Yang wrote:
>> YES! This needs some feedbacks, I thought the old resize interface
>> will be removed after the new resize interface goes into upstream. So
>> I did not touch
>> old interface. If we will remain the old interface, I can make some
>> patches which let old resize use common code instead.
>
> Yes, the backwards compatibility requirements for the kernel means
> that we have to keep the old ioctl's working for at least 2-3 years...
I'd be happy with just wiring up the old ioctls to the new code, and having the kernel ignore the passed parameters and just do it's own thing to resize to the end of the group.
Cheers, Andreas
On Wed, Aug 17, 2011 at 12:28 AM, Yongqiang Yang <[email protected]> wrote:
> On Tue, Aug 16, 2011 at 8:22 AM, Justin Maggard <[email protected]> wrote:
> > Does this patch set combined with your e2fsprogs patch add 64-bit resize
> > support now, or does it just make it easier to add later?
> YES. e2fsprgos's patch is ready too.
So I finally got around to gather the hardware and patching all the
software components to try out this 64-bit expansion code. ?The first
thing I noticed is that there is still a check to make sure the block
count is 32 bits. ?However, I can get around it by specifying a size
string (something like "20T") rather than a block count, in which case
it will actually try the expansion.
> > If I am making a 64-bit ext4 filesystem today (20TB), and hoping to resize
> > it next year to 30TB what features should I set? ?In my searching it sounded
> > like maybe I would need meta_bg, but it is not compatible with the default
> > resize_inode.
> You can understand meta_bg here http://linuxsoftware.co.nz/wiki/ext4.
> Now, ext4 with meta_bg does not support resize. ?It is in ext4's TODO list.
> The feature you should set is resize_inode.
>
> > Also, if I am making a <16TB filesystem today, should I turn on the 64-bit
> > flag in order to expand to >16TB in the future?
> Yes. ?You should turn on 64 bit feature. ?If the block number is 32
> bit, the size it can support is 2^32 * 2^(log blocksize), ?4K
> blocksize as an example, it maximum size of a filesystem is 2^32 *
> 2^12 = 2^44 = 16TB.
I think this is where the real problem is with this 64-bit resize
support. ?With the 64-bit flag set, the most I can expand by online is
just 8TB over the life of the filesystem, because my reserved GDT
blocks get used up twice as fast as with a 32-bit filesystem. ?Is
there any way around this?
-Justin
On Fri, Aug 26, 2011 at 10:06 PM, Justin Maggard <[email protected]> wrote:
> On Wed, Aug 17, 2011 at 12:28 AM, Yongqiang Yang <[email protected]> wrote:
>> On Tue, Aug 16, 2011 at 8:22 AM, Justin Maggard <[email protected]> wrote:
>> > Does this patch set combined with your e2fsprogs patch add 64-bit resize
>> > support now, or does it just make it easier to add later?
>> YES. e2fsprgos's patch is ready too.
>
> So I finally got around to gather the hardware and patching all the
> software components to try out this 64-bit expansion code. ?The first
> thing I noticed is that there is still a check to make sure the block
> count is 32 bits. ?However, I can get around it by specifying a size
> string (something like "20T") rather than a block count, in which case
> it will actually try the expansion.
The fact that you can get around this check is a bug.
As you have observed, things won't be pretty if you try to resize over 16TB
using resize inode and I don't think it is intended to work.
>
>> > If I am making a 64-bit ext4 filesystem today (20TB), and hoping to resize
>> > it next year to 30TB what features should I set? ?In my searching it sounded
>> > like maybe I would need meta_bg, but it is not compatible with the default
>> > resize_inode.
>> You can understand meta_bg here http://linuxsoftware.co.nz/wiki/ext4.
>> Now, ext4 with meta_bg does not support resize. ?It is in ext4's TODO list.
>> The feature you should set is resize_inode.
>>
>> > Also, if I am making a <16TB filesystem today, should I turn on the 64-bit
>> > flag in order to expand to >16TB in the future?
>> Yes. ?You should turn on 64 bit feature. ?If the block number is 32
>> bit, the size it can support is 2^32 * 2^(log blocksize), ?4K
>> blocksize as an example, it maximum size of a filesystem is 2^32 *
>> 2^12 = 2^44 = 16TB.
>
> I think this is where the real problem is with this 64-bit resize
> support. ?With the 64-bit flag set, the most I can expand by online is
> just 8TB over the life of the filesystem, because my reserved GDT
> blocks get used up twice as fast as with a 32-bit filesystem. ?Is
> there any way around this?
The maximum reserved GDT blocks is EXT2_ADDR_PER_BLOCK(sb),
which is 1024 by default, just enough for expanding the 64-bit fs by 8TB,
as you have observed.
But also, resize inode cannot store 64-bit block addresses of GDT backups
beyond 16TB, so your fs (resize inode in particular) are most likely corrupted.
There is no point in getting around these issues.
We should get on top of them and implement online resize of meta_bg.
If your intention is to create a 20TB fs today and resize it in the
(not very near) future
then you should probably use meta_bg instead of resize_inode.
Amir.