Baokun Li <[email protected]> writes:
> When we calculate the end position of ext4_free_extent, this position may
> be exactly where ext4_lblk_t (i.e. uint) overflows. For example, if
> ac_g_ex.fe_logical is 4294965248 and ac_orig_goal_len is 2048, then the
> computed end is 0x100000000, which is 0. If ac->ac_o_ex.fe_logical is not
> the first case of adjusting the best extent, that is, new_bex_end > 0, the
> following BUG_ON will be triggered:
Nice spotting.
>
> =========================================================
> kernel BUG at fs/ext4/mballoc.c:5116!
> invalid opcode: 0000 [#1] PREEMPT SMP PTI
> CPU: 3 PID: 673 Comm: xfs_io Tainted: G E 6.5.0-rc1+ #279
> RIP: 0010:ext4_mb_new_inode_pa+0xc5/0x430
> Call Trace:
> <TASK>
> ext4_mb_use_best_found+0x203/0x2f0
> ext4_mb_try_best_found+0x163/0x240
> ext4_mb_regular_allocator+0x158/0x1550
> ext4_mb_new_blocks+0x86a/0xe10
> ext4_ext_map_blocks+0xb0c/0x13a0
> ext4_map_blocks+0x2cd/0x8f0
> ext4_iomap_begin+0x27b/0x400
> iomap_iter+0x222/0x3d0
> __iomap_dio_rw+0x243/0xcb0
> iomap_dio_rw+0x16/0x80
> =========================================================
>
> A simple reproducer demonstrating the problem:
>
> mkfs.ext4 -F /dev/sda -b 4096 100M
> mount /dev/sda /tmp/test
> fallocate -l1M /tmp/test/tmp
> fallocate -l10M /tmp/test/file
> fallocate -i -o 1M -l16777203M /tmp/test/file
> fsstress -d /tmp/test -l 0 -n 100000 -p 8 &
> sleep 10 && killall -9 fsstress
> rm -f /tmp/test/tmp
> xfs_io -c "open -ad /tmp/test/file" -c "pwrite -S 0xff 0 8192"
Could you please also add it into xfstests?
I think it's a nice test which can check the boundary conditions for
start and end of data types used in mballoc. I think it should even work
if you don't do fsstress but instead just fallocate some remaining space
in filesystem, so that when you open and try to write it to 0th offset,
if automatically hits this error in ext4_mb_new_inode_pa().
>
> We declare new_bex_start and new_bex_end as correct types and use fex_end()
> to avoid the problems caused by the ext4_lblk_t overflow above.
>
> Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
> Signed-off-by: Baokun Li <[email protected]>
> ---
> fs/ext4/mballoc.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index eb7f5d35ef96..2090e5e7ba58 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -5076,8 +5076,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
> pa = ac->ac_pa;
>
> if (ac->ac_b_ex.fe_len < ac->ac_orig_goal_len) {
> - int new_bex_start;
> - int new_bex_end;
> + ext4_lblk_t new_bex_start;
> + loff_t new_bex_end;
>
> /* we can't allocate as much as normalizer wants.
> * so, found space must get proper lstart
> @@ -5096,8 +5096,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
> * still cover original start
> * 3. Else, keep the best ex at start of original request.
> */
> - new_bex_end = ac->ac_g_ex.fe_logical +
> - EXT4_C2B(sbi, ac->ac_orig_goal_len);
> + new_bex_end = fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len);
> new_bex_start = new_bex_end - EXT4_C2B(sbi, ac->ac_b_ex.fe_len);
> if (ac->ac_o_ex.fe_logical >= new_bex_start)
> goto adjust_bex;
> @@ -5117,8 +5116,8 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
>
> BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
> BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
> - BUG_ON(new_bex_end > (ac->ac_g_ex.fe_logical +
> - EXT4_C2B(sbi, ac->ac_orig_goal_len)));
Ok so the right hand becomes 0 (because then end can go upto 1<<32 which
will be 0 for unsigned int). And the left (new_bex_end) might be
negative due to some operations above as I see it.
And comparing an int with unsigned int, it will promote new_bex_end to
unsigned int which will make it's value too large and will hit the
bug_on.
I would like to carefully review all such paths. I will soon review and
get back.
> + BUG_ON(new_bex_end >
> + fex_end(sbi, &ac->ac_g_ex, &ac->ac_orig_goal_len));
I am not sure whether using fex_end or pa_end is any helpful.
I think we can just typecast if needed and keep it simple rather
than adding helpers functions for addition operation.
(because of the fact that fex_end() can take a third parameter which
sometimes you pass as NULL. Hence it doesn't look clean, IMO)
> }
>
> pa->pa_lstart = ac->ac_b_ex.fe_logical;
> --
> 2.31.1
-ritesh