2020-02-19 03:11:01

by Suraj Jitindar Singh

[permalink] [raw]
Subject: [PATCH 0/3] ext4: Fix potential races when performing online resizing

This patch series fixes 2 additional races between array resizing and
array element access when performing online resizing of the arrays
s_group_info and s_flex_groups.

These patches apply on top of the patch:
[PATCH RFC] ext4: fix potential race between online resizing and write operations

The macro sbi_array_rcu_deref() is introduced for simplicity but can be
removed if undesired.

Tested by performing the following:
truncate -s 100G /tmp/foo
sudo bash -c 'while true; do dd if=/dev/zero of=/mnt/xxx bs=1M count=1; sync; \
rm /mnt/xxx; done' &
while true; do mkfs.ext4 -b 1024 -E resize=26213883 /tmp/foo 2096635 -F; \
sudo mount -o loop /tmp/foo /mnt; sudo resize2fs /dev/loop0 26213883; \
sudo umount /mnt; done

Suraj Jitindar Singh (3):
ext4: introduce macro sbi_array_rcu_deref() to access rcu protected
fields
ext4: fix potential race between s_group_info online resizing and
access
ext4: fix potential race between s_flex_groups online resizing and
access

fs/ext4/balloc.c | 11 +++++-----
fs/ext4/ext4.h | 25 +++++++++++++++++----
fs/ext4/ialloc.c | 21 +++++++++++-------
fs/ext4/mballoc.c | 19 ++++++++++------
fs/ext4/resize.c | 4 ++--
fs/ext4/super.c | 56 ++++++++++++++++++++++++++++++++---------------
6 files changed, 91 insertions(+), 45 deletions(-)

--
2.17.1


2020-02-19 03:11:02

by Suraj Jitindar Singh

[permalink] [raw]
Subject: [PATCH 3/3] ext4: fix potential race between s_flex_groups online resizing and access

During an online resize an array of s_flex_groups structures gets replaced
so it can get enlarged. If there is a concurrent access to the array and
this memory has been reused then this can lead to an invalid memory access.

The s_flex_group array has been converted into an array of pointers rather
than an array of structures. This is to ensure that the information
contained in the structures cannot get out of sync during a resize due to
an accessor updating the value in the old structure after it has been
copied but before the array pointer is updated. Since the structures them-
selves are no longer copied but only the pointers to them this case is
mitigated.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Signed-off-by: Suraj Jitindar Singh <[email protected]>
Cc: [email protected]
---
fs/ext4/ext4.h | 2 +-
fs/ext4/ialloc.c | 21 +++++++++++-------
fs/ext4/mballoc.c | 9 +++++---
fs/ext4/resize.c | 4 ++--
fs/ext4/super.c | 56 ++++++++++++++++++++++++++++++++---------------
5 files changed, 60 insertions(+), 32 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3f4aaaae7da6..e8157ce5988b 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1512,7 +1512,7 @@ struct ext4_sb_info {
unsigned int s_extent_max_zeroout_kb;

unsigned int s_log_groups_per_flex;
- struct flex_groups *s_flex_groups;
+ struct flex_groups **s_flex_groups;
ext4_group_t s_flex_groups_allocated;

/* workqueue for reserved extent conversions (buffered io) */
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index c66e8f9451a2..9324552a2ac2 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -330,9 +330,11 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
if (sbi->s_log_groups_per_flex) {
ext4_group_t f = ext4_flex_group(sbi, block_group);

- atomic_inc(&sbi->s_flex_groups[f].free_inodes);
+ atomic_inc(&sbi_array_rcu_deref(sbi, s_flex_groups,
+ f)->free_inodes);
if (is_directory)
- atomic_dec(&sbi->s_flex_groups[f].used_dirs);
+ atomic_dec(&sbi_array_rcu_deref(sbi, s_flex_groups,
+ f)->used_dirs);
}
BUFFER_TRACE(bh2, "call ext4_handle_dirty_metadata");
fatal = ext4_handle_dirty_metadata(handle, NULL, bh2);
@@ -368,12 +370,13 @@ static void get_orlov_stats(struct super_block *sb, ext4_group_t g,
int flex_size, struct orlov_stats *stats)
{
struct ext4_group_desc *desc;
- struct flex_groups *flex_group = EXT4_SB(sb)->s_flex_groups;
+ struct flex_groups *flex_group = sbi_array_rcu_deref(EXT4_SB(sb),
+ s_flex_groups, g);

if (flex_size > 1) {
- stats->free_inodes = atomic_read(&flex_group[g].free_inodes);
- stats->free_clusters = atomic64_read(&flex_group[g].free_clusters);
- stats->used_dirs = atomic_read(&flex_group[g].used_dirs);
+ stats->free_inodes = atomic_read(&flex_group->free_inodes);
+ stats->free_clusters = atomic64_read(&flex_group->free_clusters);
+ stats->used_dirs = atomic_read(&flex_group->used_dirs);
return;
}

@@ -1054,7 +1057,8 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,
if (sbi->s_log_groups_per_flex) {
ext4_group_t f = ext4_flex_group(sbi, group);

- atomic_inc(&sbi->s_flex_groups[f].used_dirs);
+ atomic_inc(&sbi_array_rcu_deref(sbi, s_flex_groups,
+ f)->used_dirs);
}
}
if (ext4_has_group_desc_csum(sb)) {
@@ -1077,7 +1081,8 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir,

if (sbi->s_log_groups_per_flex) {
flex_group = ext4_flex_group(sbi, group);
- atomic_dec(&sbi->s_flex_groups[flex_group].free_inodes);
+ atomic_dec(&sbi_array_rcu_deref(sbi, s_flex_groups,
+ flex_group)->free_inodes);
}

inode->i_ino = ino + group * EXT4_INODES_PER_GROUP(sb);
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 0d9b17afc85f..0de1191e12a8 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3022,7 +3022,8 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
ext4_group_t flex_group = ext4_flex_group(sbi,
ac->ac_b_ex.fe_group);
atomic64_sub(ac->ac_b_ex.fe_len,
- &sbi->s_flex_groups[flex_group].free_clusters);
+ &sbi_array_rcu_deref(sbi, s_flex_groups,
+ flex_group)->free_clusters);
}

err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh);
@@ -4920,7 +4921,8 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
if (sbi->s_log_groups_per_flex) {
ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
atomic64_add(count_clusters,
- &sbi->s_flex_groups[flex_group].free_clusters);
+ &sbi_array_rcu_deref(sbi, s_flex_groups,
+ flex_group)->free_clusters);
}

/*
@@ -5077,7 +5079,8 @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb,
if (sbi->s_log_groups_per_flex) {
ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
atomic64_add(clusters_freed,
- &sbi->s_flex_groups[flex_group].free_clusters);
+ &sbi_array_rcu_deref(sbi, s_flex_groups,
+ flex_group)->free_clusters);
}

ext4_mb_unload_buddy(&e4b);
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 6fbe8607095f..941941a629a3 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1427,9 +1427,9 @@ static void ext4_update_super(struct super_block *sb,
ext4_group_t flex_group;
flex_group = ext4_flex_group(sbi, group_data[0].group);
atomic64_add(EXT4_NUM_B2C(sbi, free_blocks),
- &sbi->s_flex_groups[flex_group].free_clusters);
+ &sbi->s_flex_groups[flex_group]->free_clusters);
atomic_add(EXT4_INODES_PER_GROUP(sb) * flex_gd->count,
- &sbi->s_flex_groups[flex_group].free_inodes);
+ &sbi->s_flex_groups[flex_group]->free_inodes);
}

/*
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f464dff09774..3a401f930bca 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1049,7 +1049,11 @@ static void ext4_put_super(struct super_block *sb)
for (i = 0; i < sbi->s_gdb_count; i++)
brelse(sbi->s_group_desc[i]);
kvfree(sbi->s_group_desc);
- kvfree(sbi->s_flex_groups);
+ if (sbi->s_flex_groups) {
+ for (i = 0; i < sbi->s_flex_groups_allocated; i++)
+ kvfree(sbi->s_flex_groups[i]);
+ kvfree(sbi->s_flex_groups);
+ }
percpu_counter_destroy(&sbi->s_freeclusters_counter);
percpu_counter_destroy(&sbi->s_freeinodes_counter);
percpu_counter_destroy(&sbi->s_dirs_counter);
@@ -2380,8 +2384,8 @@ static int ext4_setup_super(struct super_block *sb, struct ext4_super_block *es,
int ext4_alloc_flex_bg_array(struct super_block *sb, ext4_group_t ngroup)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
- struct flex_groups *new_groups;
- int size;
+ struct flex_groups **old_groups, **new_groups;
+ int size, i;

if (!sbi->s_log_groups_per_flex)
return 0;
@@ -2390,22 +2394,35 @@ int ext4_alloc_flex_bg_array(struct super_block *sb, ext4_group_t ngroup)
if (size <= sbi->s_flex_groups_allocated)
return 0;

- size = roundup_pow_of_two(size * sizeof(struct flex_groups));
- new_groups = kvzalloc(size, GFP_KERNEL);
+ new_groups = kvzalloc(roundup_pow_of_two(size *
+ sizeof(*sbi->s_flex_groups)), GFP_KERNEL);
if (!new_groups) {
- ext4_msg(sb, KERN_ERR, "not enough memory for %d flex groups",
- size / (int) sizeof(struct flex_groups));
+ ext4_msg(sb, KERN_ERR,
+ "not enough memory for %d flex group pointers", size);
return -ENOMEM;
}
-
- if (sbi->s_flex_groups) {
+ for (i = sbi->s_flex_groups_allocated; i < size; i++) {
+ new_groups[i] = kvzalloc(roundup_pow_of_two(
+ sizeof(struct flex_groups)),
+ GFP_KERNEL);
+ if (!new_groups[i]) {
+ for (i--; i >= sbi->s_flex_groups_allocated; i--)
+ kvfree(new_groups[i]);
+ kvfree(new_groups);
+ ext4_msg(sb, KERN_ERR,
+ "not enough memory for %d flex groups", size);
+ return -ENOMEM;
+ }
+ }
+ old_groups = sbi->s_flex_groups;
+ if (sbi->s_flex_groups)
memcpy(new_groups, sbi->s_flex_groups,
(sbi->s_flex_groups_allocated *
- sizeof(struct flex_groups)));
- kvfree(sbi->s_flex_groups);
- }
- sbi->s_flex_groups = new_groups;
- sbi->s_flex_groups_allocated = size / sizeof(struct flex_groups);
+ sizeof(struct flex_groups *)));
+ rcu_assign_pointer(sbi->s_flex_groups, new_groups);
+ sbi->s_flex_groups_allocated = size;
+ if (old_groups)
+ ext4_kvfree_array_rcu(old_groups);
return 0;
}

@@ -2431,11 +2448,11 @@ static int ext4_fill_flex_info(struct super_block *sb)

flex_group = ext4_flex_group(sbi, i);
atomic_add(ext4_free_inodes_count(sb, gdp),
- &sbi->s_flex_groups[flex_group].free_inodes);
+ &sbi->s_flex_groups[flex_group]->free_inodes);
atomic64_add(ext4_free_group_clusters(sb, gdp),
- &sbi->s_flex_groups[flex_group].free_clusters);
+ &sbi->s_flex_groups[flex_group]->free_clusters);
atomic_add(ext4_used_dirs_count(sb, gdp),
- &sbi->s_flex_groups[flex_group].used_dirs);
+ &sbi->s_flex_groups[flex_group]->used_dirs);
}

return 1;
@@ -4682,8 +4699,11 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
ext4_unregister_li_request(sb);
failed_mount6:
ext4_mb_release(sb);
- if (sbi->s_flex_groups)
+ if (sbi->s_flex_groups) {
+ for (i = 0; i < sbi->s_flex_groups_allocated; i++)
+ kvfree(sbi->s_flex_groups[i]);
kvfree(sbi->s_flex_groups);
+ }
percpu_counter_destroy(&sbi->s_freeclusters_counter);
percpu_counter_destroy(&sbi->s_freeinodes_counter);
percpu_counter_destroy(&sbi->s_dirs_counter);
--
2.17.1

2020-02-20 06:15:00

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH 0/3] ext4: Fix potential races when performing online resizing

Hi Suraj,

All of the patches to fix BZ 206443 are now on the ext4 git tree:

https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=dev

I'm currently giving them a full regression test set using xfstests.
Could you run your tests to make sure it looks good for you?

I'm hoping to issue a pull request to Linus in time for 5.6-rc3 by
this weekend.

Also, if you can figure out a way to package up the repro as an
xfstests test case, that would be really excellent. I think the
challenge is that some of them took a *huge* amount of pounding before
they repro'ed, correct? I wasn't actually able to trigger the repro
using kvm, but I was only using a 2 CPU configuration.

Thanks,

- Ted

2020-02-21 00:09:18

by Suraj Jitindar Singh

[permalink] [raw]
Subject: Re: [PATCH 0/3] ext4: Fix potential races when performing online resizing

On Thu, 2020-02-20 at 01:14 -0500, Theodore Y. Ts'o wrote:
> Hi Suraj,
>
> All of the patches to fix BZ 206443 are now on the ext4 git tree:
>
>
https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/log/?h=dev
>
> I'm currently giving them a full regression test set using xfstests.
> Could you run your tests to make sure it looks good for you?

Hi,

Have run my repo case for 6 hours without issue.
Previously this reproduced 100% of the time within <30 mins.

>
> I'm hoping to issue a pull request to Linus in time for 5.6-rc3 by
> this weekend.
>
> Also, if you can figure out a way to package up the repro as an
> xfstests test case, that would be really excellent. I think the
> challenge is that some of them took a *huge* amount of pounding
> before
> they repro'ed, correct? I wasn't actually able to trigger the repro
> using kvm, but I was only using a 2 CPU configuration.

I will look into this.

Thanks,
Suraj

>
> Thanks,
>
> - Ted
>