Hello everyone,
This patchset is intended to avoid variables that can be modified via sysfs
from overflowing when stored or used and thus causing various problems.
"kvm-xfstests -c ext4/all -g auto" has been executed with no new failures.
V3->V4:
Patch 4: Corrects WARN_ON_ONCE added in V3.
V2->V3:
Add Reviewed-by tag from Jan Kara.
Patch 4: Trimming order before the for loop makes the logic easier to
understand.
V1->V2:
Patch 1: Use kstrtouint() as suggested by Alexey and Honza.
Patch 2: Adapted to patch 1 changes.
Patch 3: Add Reviewed-by tag.
Patch 4: Avoid useless loops as suggested by Ojaswin and rename
attr_group_prealloc to attr_clusters_in_group.
Patch 5: New patch added to limit mb_best_avail_max_trim_order < 64
as Honza's suggestion.
Patch 6: Reordered and updated description.
Patch 7: Add Reviewed-by tag.
Patch 8: Keep unrelated variables on different lines as suggested by Honza.
Patch 9: New patch to fix warnings found during compile checking.
[V1]: https://lore.kernel.org/all/[email protected]/
[V2]: https://lore.kernel.org/all/[email protected]/
[V3]: https://lore.kernel.org/all/[email protected]/
Baokun Li (9):
ext4: avoid overflow when setting values via sysfs
ext4: refactor out ext4_generic_attr_store()
ext4: refactor out ext4_generic_attr_show()
ext4: fix slab-out-of-bounds in
ext4_mb_find_good_group_avg_frag_lists()
ext4: add new attr pointer attr_mb_order
ext4: add positive int attr pointer to avoid sysfs variables overflow
ext4: set type of ac_groups_linear_remaining to __u32 to avoid
overflow
ext4: set the type of max_zeroout to unsigned int to avoid overflow
ext4: clean up s_mb_rb_lock to fix build warnings with C=1
fs/ext4/extents.c | 3 +-
fs/ext4/mballoc.c | 5 +-
fs/ext4/mballoc.h | 2 +-
fs/ext4/sysfs.c | 174 ++++++++++++++++++++++++++++------------------
4 files changed, 112 insertions(+), 72 deletions(-)
--
2.31.1
Running sparse (make C=1) on mballoc.c we get the following warning:
fs/ext4/mballoc.c:3194:13: warning: context imbalance in
'ext4_mb_seq_structs_summary_start' - wrong count at exit
This is because __acquires(&EXT4_SB(sb)->s_mb_rb_lock) was called in
ext4_mb_seq_structs_summary_start(), but s_mb_rb_lock was removed in commit
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree"),
so remove the __acquires to silence the warning.
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/mballoc.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index dbf04f91516c..c65fac9b8c72 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3190,7 +3190,6 @@ int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset)
}
static void *ext4_mb_seq_structs_summary_start(struct seq_file *seq, loff_t *pos)
-__acquires(&EXT4_SB(sb)->s_mb_rb_lock)
{
struct super_block *sb = pde_data(file_inode(seq->file));
unsigned long position;
--
2.31.1
We can trigger a slab-out-of-bounds with the following commands:
mkfs.ext4 -F /dev/$disk 10G
mount /dev/$disk /tmp/test
echo 2147483647 > /sys/fs/ext4/$disk/mb_group_prealloc
echo test > /tmp/test/file && sync
==================================================================
BUG: KASAN: slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
Read of size 8 at addr ffff888121b9d0f0 by task kworker/u2:0/11
CPU: 0 PID: 11 Comm: kworker/u2:0 Tainted: GL 6.7.0-next-20240118 #521
Call Trace:
dump_stack_lvl+0x2c/0x50
kasan_report+0xb6/0xf0
ext4_mb_find_good_group_avg_frag_lists+0x8a/0x200 [ext4]
ext4_mb_regular_allocator+0x19e9/0x2370 [ext4]
ext4_mb_new_blocks+0x88a/0x1370 [ext4]
ext4_ext_map_blocks+0x14f7/0x2390 [ext4]
ext4_map_blocks+0x569/0xea0 [ext4]
ext4_do_writepages+0x10f6/0x1bc0 [ext4]
[...]
==================================================================
The flow of issue triggering is as follows:
// Set s_mb_group_prealloc to 2147483647 via sysfs
ext4_mb_new_blocks
ext4_mb_normalize_request
ext4_mb_normalize_group_request
ac->ac_g_ex.fe_len = EXT4_SB(sb)->s_mb_group_prealloc
ext4_mb_regular_allocator
ext4_mb_choose_next_group
ext4_mb_choose_next_group_best_avail
mb_avg_fragment_size_order
order = fls(len) - 2 = 29
ext4_mb_find_good_group_avg_frag_lists
frag_list = &sbi->s_mb_avg_fragment_size[order]
if (list_empty(frag_list)) // Trigger SOOB!
At 4k block size, the length of the s_mb_avg_fragment_size list is 14,
but an oversized s_mb_group_prealloc is set, causing slab-out-of-bounds
to be triggered by an attempt to access an element at index 29.
Add a new attr_id attr_clusters_in_group with values in the range
[0, sbi->s_clusters_per_group] and declare mb_group_prealloc as
that type to fix the issue. In addition avoid returning an order
from mb_avg_fragment_size_order() greater than MB_NUM_ORDERS(sb)
and reduce some useless loops.
Fixes: 7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
CC: [email protected]
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reviewed-by: Ojaswin Mujoo <[email protected]>
---
fs/ext4/mballoc.c | 4 ++++
fs/ext4/sysfs.c | 13 ++++++++++++-
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 12b3f196010b..dbf04f91516c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -831,6 +831,8 @@ static int mb_avg_fragment_size_order(struct super_block *sb, ext4_grpblk_t len)
return 0;
if (order == MB_NUM_ORDERS(sb))
order--;
+ if (WARN_ON_ONCE(order > MB_NUM_ORDERS(sb)))
+ order = MB_NUM_ORDERS(sb) - 1;
return order;
}
@@ -1008,6 +1010,8 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* goal length.
*/
order = fls(ac->ac_g_ex.fe_len) - 1;
+ if (WARN_ON_ONCE(order - 1 > MB_NUM_ORDERS(ac->ac_sb)))
+ order = MB_NUM_ORDERS(ac->ac_sb);
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 7f455b5f22c0..ddd71673176c 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -29,6 +29,7 @@ typedef enum {
attr_trigger_test_error,
attr_first_error_time,
attr_last_error_time,
+ attr_clusters_in_group,
attr_feature,
attr_pointer_ui,
attr_pointer_ul,
@@ -207,13 +208,14 @@ EXT4_ATTR_FUNC(sra_exceeded_retry_limit, 0444);
EXT4_ATTR_OFFSET(inode_readahead_blks, 0644, inode_readahead,
ext4_sb_info, s_inode_readahead_blks);
+EXT4_ATTR_OFFSET(mb_group_prealloc, 0644, clusters_in_group,
+ ext4_sb_info, s_mb_group_prealloc);
EXT4_RW_ATTR_SBI_UI(inode_goal, s_inode_goal);
EXT4_RW_ATTR_SBI_UI(mb_stats, s_mb_stats);
EXT4_RW_ATTR_SBI_UI(mb_max_to_scan, s_mb_max_to_scan);
EXT4_RW_ATTR_SBI_UI(mb_min_to_scan, s_mb_min_to_scan);
EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs);
EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
-EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
@@ -376,6 +378,7 @@ static ssize_t ext4_generic_attr_show(struct ext4_attr *a,
switch (a->attr_id) {
case attr_inode_readahead:
+ case attr_clusters_in_group:
case attr_pointer_ui:
if (a->attr_ptr == ptr_ext4_super_block_offset)
return sysfs_emit(buf, "%u\n", le32_to_cpup(ptr));
@@ -455,6 +458,14 @@ static ssize_t ext4_generic_attr_store(struct ext4_attr *a,
else
*((unsigned int *) ptr) = t;
return len;
+ case attr_clusters_in_group:
+ ret = kstrtouint(skip_spaces(buf), 0, &t);
+ if (ret)
+ return ret;
+ if (t > sbi->s_clusters_per_group)
+ return -EINVAL;
+ *((unsigned int *) ptr) = t;
+ return len;
case attr_pointer_ul:
ret = kstrtoul(skip_spaces(buf), 0, <);
if (ret)
--
2.31.1
Now ac_groups_linear_remaining is of type __u16 and s_mb_max_linear_groups
is of type unsigned int, so an overflow occurs when setting a value above
65535 through the mb_max_linear_groups sysfs interface. Therefore, the
type of ac_groups_linear_remaining is set to __u32 to avoid overflow.
Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning")
CC: [email protected]
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Zhang Yi <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/mballoc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index 56938532b4ce..7bfc5fb5a128 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
@@ -193,8 +193,8 @@ struct ext4_allocation_context {
ext4_grpblk_t ac_orig_goal_len;
__u32 ac_flags; /* allocation hints */
+ __u32 ac_groups_linear_remaining;
__u16 ac_groups_scanned;
- __u16 ac_groups_linear_remaining;
__u16 ac_found;
__u16 ac_cX_found[EXT4_MB_NUM_CRS];
__u16 ac_tail;
--
2.31.1
The max_zeroout is of type int and the s_extent_max_zeroout_kb is of
type uint, and the s_extent_max_zeroout_kb can be freely modified via
the sysfs interface. When the block size is 1024, max_zeroout may
overflow, so declare it as unsigned int to avoid overflow.
Signed-off-by: Baokun Li <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
---
fs/ext4/extents.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e57054bdc5fd..e067f2dd0335 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3402,9 +3402,10 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
struct ext4_extent *ex, *abut_ex;
ext4_lblk_t ee_block, eof_block;
unsigned int ee_len, depth, map_len = map->m_len;
- int allocated = 0, max_zeroout = 0;
int err = 0;
int split_flag = EXT4_EXT_DATA_VALID2;
+ int allocated = 0;
+ unsigned int max_zeroout = 0;
ext_debug(inode, "logical block %llu, max_blocks %u\n",
(unsigned long long)map->m_lblk, map_len);
--
2.31.1
A gentle ping.
On 2024/3/19 19:33, Baokun Li wrote:
> Hello everyone,
>
> This patchset is intended to avoid variables that can be modified via sysfs
> from overflowing when stored or used and thus causing various problems.
>
> "kvm-xfstests -c ext4/all -g auto" has been executed with no new failures.
>
> V3->V4:
> Patch 4: Corrects WARN_ON_ONCE added in V3.
>
> V2->V3:
> Add Reviewed-by tag from Jan Kara.
> Patch 4: Trimming order before the for loop makes the logic easier to
> understand.
>
> V1->V2:
> Patch 1: Use kstrtouint() as suggested by Alexey and Honza.
> Patch 2: Adapted to patch 1 changes.
> Patch 3: Add Reviewed-by tag.
> Patch 4: Avoid useless loops as suggested by Ojaswin and rename
> attr_group_prealloc to attr_clusters_in_group.
> Patch 5: New patch added to limit mb_best_avail_max_trim_order < 64
> as Honza's suggestion.
> Patch 6: Reordered and updated description.
> Patch 7: Add Reviewed-by tag.
> Patch 8: Keep unrelated variables on different lines as suggested by Honza.
> Patch 9: New patch to fix warnings found during compile checking.
>
> [V1]: https://lore.kernel.org/all/[email protected]/
> [V2]: https://lore.kernel.org/all/[email protected]/
> [V3]: https://lore.kernel.org/all/[email protected]/
>
> Baokun Li (9):
> ext4: avoid overflow when setting values via sysfs
> ext4: refactor out ext4_generic_attr_store()
> ext4: refactor out ext4_generic_attr_show()
> ext4: fix slab-out-of-bounds in
> ext4_mb_find_good_group_avg_frag_lists()
> ext4: add new attr pointer attr_mb_order
> ext4: add positive int attr pointer to avoid sysfs variables overflow
> ext4: set type of ac_groups_linear_remaining to __u32 to avoid
> overflow
> ext4: set the type of max_zeroout to unsigned int to avoid overflow
> ext4: clean up s_mb_rb_lock to fix build warnings with C=1
>
> fs/ext4/extents.c | 3 +-
> fs/ext4/mballoc.c | 5 +-
> fs/ext4/mballoc.h | 2 +-
> fs/ext4/sysfs.c | 174 ++++++++++++++++++++++++++++------------------
> 4 files changed, 112 insertions(+), 72 deletions(-)
>
On Fri, May 03, 2024 at 10:03:04AM +0800, Baokun Li wrote:
> Hi Ted,
>
> Would you consider merging in this patchset in the current merge
> window? I would appreciate it if you could.
Yes, in fact it's next on my review list. I've been working through
the patches on ext4's patchwork site roughly in chronological order
(focusing first on fixes and those that have been reviewed by other
folks).
Cheers,
- Ted
On 2024/5/3 11:14, Theodore Ts'o wrote:
> On Fri, May 03, 2024 at 10:03:04AM +0800, Baokun Li wrote:
>> Hi Ted,
>>
>> Would you consider merging in this patchset in the current merge
>> window? I would appreciate it if you could.
> Yes, in fact it's next on my review list. I've been working through
> the patches on ext4's patchwork site roughly in chronological order
> (focusing first on fixes and those that have been reviewed by other
> folks).
>
> Cheers,
>
> - Ted
Thanks a million for your work!
Cheers,
Baokun
On Tue, 19 Mar 2024 19:33:16 +0800, Baokun Li wrote:
> This patchset is intended to avoid variables that can be modified via sysfs
> from overflowing when stored or used and thus causing various problems.
>
> "kvm-xfstests -c ext4/all -g auto" has been executed with no new failures.
>
> V3->V4:
> Patch 4: Corrects WARN_ON_ONCE added in V3.
>
> [...]
Applied, thanks!
[1/9] ext4: avoid overflow when setting values via sysfs
commit: 9e8e819f8f272c4e5dcd0bd6c7450e36481ed139
[2/9] ext4: refactor out ext4_generic_attr_store()
commit: f536808adcc37a546bf9cc819c349bd55a28159b
[3/9] ext4: refactor out ext4_generic_attr_show()
commit: 57341fe3179c7694c92dcf99e7f836cee4c800dd
[4/9] ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()
commit: 13df4d44a3aaabe61cd01d277b6ee23ead2a5206
[5/9] ext4: add new attr pointer attr_mb_order
commit: b7b2a5799b8fafe95fcd5455c32ba2c643c86f99
[6/9] ext4: add positive int attr pointer to avoid sysfs variables overflow
commit: 63bfe841053f8dda09c9d059d543486d9dc16104
[7/9] ext4: set type of ac_groups_linear_remaining to __u32 to avoid overflow
commit: 9a9f3a9842927e4af7ca10c19c94dad83bebd713
[8/9] ext4: set the type of max_zeroout to unsigned int to avoid overflow
commit: 261341a932d9244cbcd372a3659428c8723e5a49
[9/9] ext4: clean up s_mb_rb_lock to fix build warnings with C=1
commit: e19089dff547c9e1f09712acc3536d7b0aa9ce3d
Best regards,
--
Theodore Ts'o <[email protected]>