2021-09-14 19:12:02

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 1/3] resize2fs: attempt to keep the # of inodes valid by removing the last bg

If a the 10GB file system (with the default inode ratio size of 16k)
is resized to 64TB, the number of inodes will become 2**32 --- one
above the maximum allowed number of inodes of 2**32-1. In
adjust_fs_info(), we already try drop the last block group if there
isn't sufficient space in the last block group to support the metadata
for that block group. So if dropping the last block group allows the
number of inodes to valid, we should try that as well. In some cases
this will mean resizing a file system to 64TB will result in it be
resized to a size of 64TB - 128MB, which is close enough for
government work.

Addresses-Google-Bug: 199105099
Signed-off-by: Theodore Ts'o <[email protected]>
---
resize/resize2fs.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index daaa3d49..770d2d06 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -757,6 +757,15 @@ retry:
*/
new_inodes =(unsigned long long) fs->super->s_inodes_per_group * fs->group_desc_count;
if (new_inodes > ~0U) {
+ new_inodes = (unsigned long long) fs->super->s_inodes_per_group * (fs->group_desc_count - 1);
+ if (new_inodes <= ~0U) {
+ unsigned long long new_blocks =
+ ((unsigned long long) fs->super->s_blocks_per_group *
+ (fs->group_desc_count - 1)) + fs->super->s_first_data_block;
+
+ ext2fs_blocks_count_set(fs->super, new_blocks);
+ goto retry;
+ }
fprintf(stderr, _("inodes (%llu) must be less than %u\n"),
(unsigned long long) new_inodes, ~0U);
return EXT2_ET_TOO_MANY_INODES;
--
2.31.0


2021-09-14 19:12:02

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 2/3] resize2fs: adjust new size of the file system to allow a successful resize

The previous commit in this series (commit 50088b1996cc: "resize2fs:
attempt to keep the # of inodes valid by removing the last bg") allows
a successful off-line resize of a file system with the default 16k
inode ratio to be grown to support a 64TB storage device by dropping
the last block group so the number of inodes is just below the maximum
2**32-1 number of inodes.

However, this is not a complete solution, for two reasons. First,
this adjustment happens after resize2fs has started potentially making
changes to the file system in the off-line (unmounted) case, which
means resize2fs will do a lot of unnecessary work. Secondly, in the
on-line resize case, passing the original requested size to the kernel
causes the kernel fail the online resize request.

So teach resize2fs to adjust the new size of the file system much
earlier, which avoids both problems.

Signed-off-by: Theodore Ts'o <[email protected]>
---
resize/main.c | 16 +++++---
resize/resize2fs.c | 79 ++++++++++++++++++++++++++++++++++++++
resize/resize2fs.h | 1 +
tests/r_move_itable/expect | 6 +--
4 files changed, 93 insertions(+), 9 deletions(-)

diff --git a/resize/main.c b/resize/main.c
index 8621d101..bceaa167 100644
--- a/resize/main.c
+++ b/resize/main.c
@@ -611,12 +611,16 @@ int main (int argc, char ** argv)
"feature.\n"));
goto errout;
}
- } else if (new_size == ext2fs_blocks_count(fs->super)) {
- fprintf(stderr, _("The filesystem is already %llu (%dk) "
- "blocks long. Nothing to do!\n\n"),
- (unsigned long long) new_size,
- blocksize / 1024);
- goto success_exit;
+ } else {
+ adjust_new_size(fs, &new_size);
+ if (new_size == ext2fs_blocks_count(fs->super)) {
+ fprintf(stderr, _("The filesystem is already "
+ "%llu (%dk) blocks long. "
+ "Nothing to do!\n\n"),
+ (unsigned long long) new_size,
+ blocksize / 1024);
+ goto success_exit;
+ }
}
if ((flags & RESIZE_ENABLE_64BIT) &&
ext2fs_has_feature_64bit(fs->super)) {
diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 770d2d06..5ed0c9ee 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -1028,6 +1028,85 @@ errout:
return (retval);
}

+/*
+ * Replicate the first part of adjust_fs_info to determine what the
+ * new size of the file system should be. This allows resize2fs to
+ * exit early if we aren't going to make any changes to the file
+ * system.
+ */
+void adjust_new_size(ext2_filsys fs, blk64_t *sizep)
+{
+ blk64_t size, rem, overhead = 0;
+ unsigned long desc_blocks;
+ dgrp_t group_desc_count;
+ int has_bg;
+ unsigned long long new_inodes; /* u64 to check for overflow */
+
+ size = *sizep;
+retry:
+ group_desc_count = ext2fs_div64_ceil(size -
+ fs->super->s_first_data_block,
+ EXT2_BLOCKS_PER_GROUP(fs->super));
+ if (group_desc_count == 0)
+ return;
+ desc_blocks = ext2fs_div_ceil(group_desc_count,
+ EXT2_DESC_PER_BLOCK(fs->super));
+
+ /*
+ * Overhead is the number of bookkeeping blocks per group. It
+ * includes the superblock backup, the group descriptor
+ * backups, the inode bitmap, the block bitmap, and the inode
+ * table.
+ */
+ overhead = (int) (2 + fs->inode_blocks_per_group);
+
+ has_bg = 0;
+ if (ext2fs_has_feature_sparse_super2(fs->super)) {
+ /*
+ * We have to do this manually since
+ * super->s_backup_bgs hasn't been set up yet.
+ */
+ if (group_desc_count == 2)
+ has_bg = fs->super->s_backup_bgs[0] != 0;
+ else
+ has_bg = fs->super->s_backup_bgs[1] != 0;
+ } else
+ has_bg = ext2fs_bg_has_super(fs, group_desc_count - 1);
+ if (has_bg)
+ overhead += 1 + desc_blocks +
+ fs->super->s_reserved_gdt_blocks;
+
+ /*
+ * See if the last group is big enough to support the
+ * necessary data structures. If not, we need to get rid of
+ * it.
+ */
+ rem = (size - fs->super->s_first_data_block) %
+ fs->super->s_blocks_per_group;
+ if ((group_desc_count == 1) && rem && (rem < overhead))
+ return;
+ if ((group_desc_count > 1) && rem && (rem < overhead+50)) {
+ size -= rem;
+ goto retry;
+ }
+
+ /*
+ * If we need to reduce the size by no more than a block
+ * group to avoid overrunning the max inode limit, do it.
+ */
+ new_inodes =(unsigned long long) fs->super->s_inodes_per_group * group_desc_count;
+ if (new_inodes > ~0U) {
+ new_inodes = (unsigned long long) fs->super->s_inodes_per_group * (group_desc_count - 1);
+ if (new_inodes > ~0U)
+ return;
+ size = ((unsigned long long) fs->super->s_blocks_per_group *
+ (group_desc_count - 1)) + fs->super->s_first_data_block;
+
+ goto retry;
+ }
+ *sizep = size;
+}
+
/*
* This routine adjusts the superblock and other data structures, both
* in disk as well as in memory...
diff --git a/resize/resize2fs.h b/resize/resize2fs.h
index f9f58f20..96a878a7 100644
--- a/resize/resize2fs.h
+++ b/resize/resize2fs.h
@@ -150,6 +150,7 @@ extern errcode_t adjust_fs_info(ext2_filsys fs, ext2_filsys old_fs,
ext2fs_block_bitmap reserve_blocks,
blk64_t new_size);
extern blk64_t calculate_minimum_resize_size(ext2_filsys fs, int flags);
+extern void adjust_new_size(ext2_filsys fs, blk64_t *sizep);


/* extent.c */
diff --git a/tests/r_move_itable/expect b/tests/r_move_itable/expect
index b0a4873c..74a00fe0 100644
--- a/tests/r_move_itable/expect
+++ b/tests/r_move_itable/expect
@@ -61,7 +61,7 @@ Group 3: (Blocks 769-1023)
Free blocks: 781-1023
Free inodes: 97-128
resize2fs -p test.img 10000
-Resizing the filesystem on test.img to 10000 (1k) blocks.
+Resizing the filesystem on test.img to 9985 (1k) blocks.
Begin pass 1 (max = 35)
Extending the inode table ----------------------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 2 (max = 1)
@@ -354,7 +354,7 @@ Group 38: (Blocks 9729-9984)
Free inodes: 1217-1248
--------------------------------
resize2fs -p test.img 20000
-Resizing the filesystem on test.img to 20000 (1k) blocks.
+Resizing the filesystem on test.img to 19969 (1k) blocks.
Begin pass 1 (max = 39)
Extending the inode table ----------------------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 2 (max = 1)
@@ -884,7 +884,7 @@ Group 77: (Blocks 19713-19968)
Free inodes: 2465-2496
--------------------------------
resize2fs -p test.img 30000
-Resizing the filesystem on test.img to 30000 (1k) blocks.
+Resizing the filesystem on test.img to 29953 (1k) blocks.
Begin pass 1 (max = 39)
Extending the inode table ----------------------------------------XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 2 (max = 8)
--
2.31.0

2021-09-14 19:13:31

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH 3/3] resize2fs: optimize resize2fs_calculate_summary_stats()

Speed up an off-line resize of a 10GB file system to 64TB located on
tmpfs from 90 seconds to 16 seconds by extracting block group bitmaps
using a population count function to count the blocks in use instead
checking each bit in the block bitmap.

Signed-off-by: Theodore Ts'o <[email protected]>
---
resize/resize2fs.c | 74 ++++++++++++++--------------------------------
1 file changed, 23 insertions(+), 51 deletions(-)

diff --git a/resize/resize2fs.c b/resize/resize2fs.c
index 5ed0c9ee..f7ffaac5 100644
--- a/resize/resize2fs.c
+++ b/resize/resize2fs.c
@@ -2844,67 +2844,39 @@ errout:
*/
static errcode_t resize2fs_calculate_summary_stats(ext2_filsys fs)
{
- blk64_t blk;
+ errcode_t retval;
+ blk64_t blk = fs->super->s_first_data_block;
ext2_ino_t ino;
- unsigned int group = 0;
- unsigned int count = 0;
+ unsigned int n, c, group, count;
blk64_t total_blocks_free = 0;
int total_inodes_free = 0;
int group_free = 0;
int uninit = 0;
- blk64_t super_blk, old_desc_blk, new_desc_blk;
- int old_desc_blocks;
+ char *bitmap_buf;

/*
* First calculate the block statistics
*/
- uninit = ext2fs_bg_flags_test(fs, group, EXT2_BG_BLOCK_UNINIT);
- ext2fs_super_and_bgd_loc2(fs, group, &super_blk, &old_desc_blk,
- &new_desc_blk, 0);
- if (ext2fs_has_feature_meta_bg(fs->super))
- old_desc_blocks = fs->super->s_first_meta_bg;
- else
- old_desc_blocks = fs->desc_blocks +
- fs->super->s_reserved_gdt_blocks;
- for (blk = B2C(fs->super->s_first_data_block);
- blk < ext2fs_blocks_count(fs->super);
- blk += EXT2FS_CLUSTER_RATIO(fs)) {
- if ((uninit &&
- !(EQ_CLSTR(blk, super_blk) ||
- ((old_desc_blk && old_desc_blocks &&
- GE_CLSTR(blk, old_desc_blk) &&
- LT_CLSTR(blk, old_desc_blk + old_desc_blocks))) ||
- ((new_desc_blk && EQ_CLSTR(blk, new_desc_blk))) ||
- EQ_CLSTR(blk, ext2fs_block_bitmap_loc(fs, group)) ||
- EQ_CLSTR(blk, ext2fs_inode_bitmap_loc(fs, group)) ||
- ((GE_CLSTR(blk, ext2fs_inode_table_loc(fs, group)) &&
- LT_CLSTR(blk, ext2fs_inode_table_loc(fs, group)
- + fs->inode_blocks_per_group))))) ||
- (!ext2fs_fast_test_block_bitmap2(fs->block_map, blk))) {
- group_free++;
- total_blocks_free++;
- }
- count++;
- if ((count == fs->super->s_clusters_per_group) ||
- EQ_CLSTR(blk, ext2fs_blocks_count(fs->super)-1)) {
- ext2fs_bg_free_blocks_count_set(fs, group, group_free);
- ext2fs_group_desc_csum_set(fs, group);
- group++;
- if (group >= fs->group_desc_count)
- break;
- count = 0;
- group_free = 0;
- uninit = ext2fs_bg_flags_test(fs, group, EXT2_BG_BLOCK_UNINIT);
- ext2fs_super_and_bgd_loc2(fs, group, &super_blk,
- &old_desc_blk,
- &new_desc_blk, 0);
- if (ext2fs_has_feature_meta_bg(fs->super))
- old_desc_blocks = fs->super->s_first_meta_bg;
- else
- old_desc_blocks = fs->desc_blocks +
- fs->super->s_reserved_gdt_blocks;
+ bitmap_buf = malloc(fs->blocksize);
+ if (!bitmap_buf)
+ return ENOMEM;
+ for (group = 0; group < fs->group_desc_count;
+ group++) {
+ retval = ext2fs_get_block_bitmap_range2(fs->block_map,
+ C2B(blk), fs->super->s_clusters_per_group, bitmap_buf);
+ if (retval) {
+ free(bitmap_buf);
+ return retval;
}
- }
+ n = ext2fs_bitcount(bitmap_buf,
+ fs->super->s_clusters_per_group / 8);
+ group_free = fs->super->s_clusters_per_group - n;
+ total_blocks_free += group_free;
+ ext2fs_bg_free_blocks_count_set(fs, group, group_free);
+ ext2fs_group_desc_csum_set(fs, group);
+ blk += EXT2FS_NUM_B2C(fs, fs->super->s_clusters_per_group);
+ }
+ free(bitmap_buf);
total_blocks_free = C2B(total_blocks_free);
ext2fs_free_blocks_count_set(fs->super, total_blocks_free);

--
2.31.0

2021-09-14 20:09:32

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 3/3] resize2fs: optimize resize2fs_calculate_summary_stats()

On Tue, Sep 14, 2021 at 03:11:04PM -0400, Theodore Ts'o wrote:
> Speed up an off-line resize of a 10GB file system to 64TB located on
> tmpfs from 90 seconds to 16 seconds by extracting block group bitmaps
> using a population count function to count the blocks in use instead
> checking each bit in the block bitmap.
>
> Signed-off-by: Theodore Ts'o <[email protected]>
> ---
> resize/resize2fs.c | 74 ++++++++++++++--------------------------------
> 1 file changed, 23 insertions(+), 51 deletions(-)
>
> diff --git a/resize/resize2fs.c b/resize/resize2fs.c
> + retval = ext2fs_get_block_bitmap_range2(fs->block_map,
> + C2B(blk), fs->super->s_clusters_per_group, bitmap_buf);

Whoops, this should be B2C to convert a block number to a cluster
number.

Hmm, I note that this wasn't picked up by our regression tests.
Mental note to self --- we need to add more regression tests to better
exercise bigalloc resizes. Currently we have a big warning which gets
printed about off-line resizes of bigalloc file systems being
experimentlal, but we should aim to do better. :-)

- Ted

2021-09-17 06:13:08

by Leah Rumancik

[permalink] [raw]
Subject: Re: [PATCH 1/3] resize2fs: attempt to keep the # of inodes valid by removing the last bg

Looks good to me. Series compiled and ran without any test regressions
for me. Some lines are over 80 chars though, could wrap.

Signed-off-by: Leah Rumancik <[email protected]>