2022-06-30 02:25:50

by Kiselev, Oleg

[permalink] [raw]
Subject: [PATCH 1/2] ext4: reduce computation of overhead during resize

This patch avoids doing an O(n**2)-complexity walk through every flex group.
Instead, it uses the already computed overhead information for the newly
allocated space, and simply adds it to the previously calculated
overhead stored in the superblock. This drastically reduces the time
taken to resize very large bigalloc filesystems (from 3+ hours for a
64TB fs down to milliseconds).

Signed-off-by: Oleg Kiselev <[email protected]>
---
fs/ext4/resize.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 8b70a4701293..2acc9fca99ea 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1380,6 +1380,16 @@ static int ext4_setup_new_descs(handle_t *handle, struct super_block *sb,
return err;
}

+static void ext4_set_overhead(struct super_block *sb,
+ const ext4_grpblk_t overhead)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+ sbi->s_overhead += overhead;
+ es->s_overhead_clusters = cpu_to_le32((unsigned long) sbi->s_overhead);
+ smp_wmb();
+}
+
/*
* ext4_update_super() updates the super block so that the newly added
* groups can be seen by the filesystem.
@@ -1482,8 +1492,16 @@ static void ext4_update_super(struct super_block *sb,

/*
* Update the fs overhead information
+ *
+ * For bigalloc, if the superblock already has a properly calculated
+ * overhead, update it wth a value based on numbers already computed
+ * above for the newly allocated capacity.
*/
- ext4_calculate_overhead(sb);
+ if (ext4_has_feature_bigalloc(sb) && (sbi->s_overhead != 0))
+ ext4_set_overhead(sb,
+ EXT4_NUM_B2C(sbi, blocks_count - free_blocks));
+ else
+ ext4_calculate_overhead(sb);

if (test_opt(sb, DEBUG))
printk(KERN_DEBUG "EXT4-fs: added group %u:"
--
2.32.0


2022-07-14 13:50:42

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/2] ext4: reduce computation of overhead during resize

On Thu 30-06-22 02:17:21, Kiselev, Oleg wrote:
> This patch avoids doing an O(n**2)-complexity walk through every flex group.
> Instead, it uses the already computed overhead information for the newly
> allocated space, and simply adds it to the previously calculated
> overhead stored in the superblock. This drastically reduces the time
> taken to resize very large bigalloc filesystems (from 3+ hours for a
> 64TB fs down to milliseconds).
>
> Signed-off-by: Oleg Kiselev <[email protected]>
> ---
> fs/ext4/resize.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)

Overall this looks fine, a few smaller comments below.

>
> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
> index 8b70a4701293..2acc9fca99ea 100644
> --- a/fs/ext4/resize.c
> +++ b/fs/ext4/resize.c
> @@ -1380,6 +1380,16 @@ static int ext4_setup_new_descs(handle_t *handle, struct super_block *sb,
> return err;
> }
>
> +static void ext4_set_overhead(struct super_block *sb,
> + const ext4_grpblk_t overhead)
> +{

ext4_add_overhead() would be a better name I suppose. Also the 'overhead'
should rather be ext4_fsblk_t to be on the safe side...

> + struct ext4_sb_info *sbi = EXT4_SB(sb);
> + struct ext4_super_block *es = sbi->s_es;

Empty line between variable declarations and the code please.

> + sbi->s_overhead += overhead;
> + es->s_overhead_clusters = cpu_to_le32((unsigned long) sbi->s_overhead);
^^^ the typecast looks
bogus here...

> + smp_wmb();
> +}

The barrier without any comment makes me really wonder why it is here...
But I get ext4_calculate_overhead() has is as well so you're just keeping
it.

> +
> /*
> * ext4_update_super() updates the super block so that the newly added
> * groups can be seen by the filesystem.
> @@ -1482,8 +1492,16 @@ static void ext4_update_super(struct super_block *sb,
>
> /*
> * Update the fs overhead information
> + *
> + * For bigalloc, if the superblock already has a properly calculated
> + * overhead, update it wth a value based on numbers already computed
^^ with

> + * above for the newly allocated capacity.
> */
> - ext4_calculate_overhead(sb);
> + if (ext4_has_feature_bigalloc(sb) && (sbi->s_overhead != 0))
> + ext4_set_overhead(sb,
> + EXT4_NUM_B2C(sbi, blocks_count - free_blocks));
> + else
> + ext4_calculate_overhead(sb);
>
> if (test_opt(sb, DEBUG))
> printk(KERN_DEBUG "EXT4-fs: added group %u:"

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-07-14 20:04:14

by Kiselev, Oleg

[permalink] [raw]
Subject: Re: [PATCH 1/2] ext4: reduce computation of overhead during resize

Thanks for the review, Jan.

> On Jul 14, 2022, at 6:46 AM, Jan Kara <[email protected]> wrote:
>
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Thu 30-06-22 02:17:21, Kiselev, Oleg wrote:
>> This patch avoids doing an O(n**2)-complexity walk through every flex group.
>> Instead, it uses the already computed overhead information for the newly
>> allocated space, and simply adds it to the previously calculated
>> overhead stored in the superblock. This drastically reduces the time
>> taken to resize very large bigalloc filesystems (from 3+ hours for a
>> 64TB fs down to milliseconds).
>>
>> Signed-off-by: Oleg Kiselev <[email protected]>
>> ---
>> fs/ext4/resize.c | 20 +++++++++++++++++++-
>> 1 file changed, 19 insertions(+), 1 deletion(-)
>
> Overall this looks fine, a few smaller comments below.
>
>>
>> diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
>> index 8b70a4701293..2acc9fca99ea 100644
>> --- a/fs/ext4/resize.c
>> +++ b/fs/ext4/resize.c
>> @@ -1380,6 +1380,16 @@ static int ext4_setup_new_descs(handle_t *handle, struct super_block *sb,
>> return err;
>> }
>>
>> +static void ext4_set_overhead(struct super_block *sb,
>> + const ext4_grpblk_t overhead)
>> +{
>
> ext4_add_overhead() would be a better name I suppose. Also the 'overhead'
> should rather be ext4_fsblk_t to be on the safe side...

Done.

>
>> + struct ext4_sb_info *sbi = EXT4_SB(sb);
>> + struct ext4_super_block *es = sbi->s_es;
>
> Empty line between variable declarations and the code please.

Done

>
>> + sbi->s_overhead += overhead;
>> + es->s_overhead_clusters = cpu_to_le32((unsigned long) sbi->s_overhead);
> ^^^ the typecast looks
> bogus here...

This cast is the reverse of le32_to_cpu() cast done in fs/ext4/super.c:__ext4_fill_super():
sbi->s_overhead = le32_to_cpu(es->s_overhead_clusters);
And follows the logic of casting done in fs/ext4/ioctl.c:set_overhead() and fs/ext4/ioctl.c:ext4_update_overhead().

>
>> + smp_wmb();
>> +}
>
> The barrier without any comment makes me really wonder why it is here...
> But I get ext4_calculate_overhead() has is as well so you're just keeping
> it.

Yes, exactly. I am aware of only one place where s_overhead and s_overhead_clusters have to be coherent, in ext4_update_overhead(), which potentially may be called concurrently with the resize. Keeping the write barrier here seemed like a safe choice, since it’s not in the regular IO path, so the added expense is incurred only very rarely.

>
>> +
>> /*
>> * ext4_update_super() updates the super block so that the newly added
>> * groups can be seen by the filesystem.
>> @@ -1482,8 +1492,16 @@ static void ext4_update_super(struct super_block *sb,
>>
>> /*
>> * Update the fs overhead information
>> + *
>> + * For bigalloc, if the superblock already has a properly calculated
>> + * overhead, update it wth a value based on numbers already computed
> ^^ with
>
>> + * above for the newly allocated capacity.
>> */
>> - ext4_calculate_overhead(sb);
>> + if (ext4_has_feature_bigalloc(sb) && (sbi->s_overhead != 0))
>> + ext4_set_overhead(sb,
>> + EXT4_NUM_B2C(sbi, blocks_count - free_blocks));
>> + else
>> + ext4_calculate_overhead(sb);
>>
>> if (test_opt(sb, DEBUG))
>> printk(KERN_DEBUG "EXT4-fs: added group %u:"
>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

I’ll resubmit the updated patch.

2022-07-15 09:29:15

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 1/2] ext4: reduce computation of overhead during resize

On Thu 14-07-22 19:53:38, Kiselev, Oleg wrote:
> >
> >> + sbi->s_overhead += overhead;
> >> + es->s_overhead_clusters = cpu_to_le32((unsigned long) sbi->s_overhead);
> > ^^^ the typecast looks
> > bogus here...
>
> This cast is the reverse of le32_to_cpu() cast done in fs/ext4/super.c:__ext4_fill_super():
> sbi->s_overhead = le32_to_cpu(es->s_overhead_clusters);
> And follows the logic of casting done in fs/ext4/ioctl.c:set_overhead() and fs/ext4/ioctl.c:ext4_update_overhead().

I didn't mean the cpu_to_le32() call but rather the (unsigned long) part.
That is pointless because sbi->s_overhead is already 'unsigned long' and
even if it was not, I have hard time seeing a reason why would casting to
unsigned long make any difference here.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-07-15 23:57:59

by Kiselev, Oleg

[permalink] [raw]
Subject: Re: [PATCH 1/2] ext4: reduce computation of overhead during resize


> On Jul 15, 2022, at 2:27 AM, Jan Kara <[email protected]> wrote:
>
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Thu 14-07-22 19:53:38, Kiselev, Oleg wrote:
>>>
>>>> + sbi->s_overhead += overhead;
>>>> + es->s_overhead_clusters = cpu_to_le32((unsigned long) sbi->s_overhead);
>>> ^^^ the typecast looks
>>> bogus here...
>>
>> This cast is the reverse of le32_to_cpu() cast done in fs/ext4/super.c:__ext4_fill_super():
>> sbi->s_overhead = le32_to_cpu(es->s_overhead_clusters);
>> And follows the logic of casting done in fs/ext4/ioctl.c:set_overhead() and fs/ext4/ioctl.c:ext4_update_overhead().
>
> I didn't mean the cpu_to_le32() call but rather the (unsigned long) part.
> That is pointless because sbi->s_overhead is already 'unsigned long' and
> even if it was not, I have hard time seeing a reason why would casting to
> unsigned long make any difference here.

Got it. You are right. The indent of your comment got mangled by mail, so it looked like it was directed to cpu_to_ie32()!

>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR