2022-07-13 19:11:05

by Tadeusz Struk

[permalink] [raw]
Subject: [PATCH] ext4: fix kernel BUG in ext4_free_blocks

Syzbot reported a BUG in ext4_free_blocks.
The issue is triggered from ext4_mb_clear_bb(). What happens is the
block number passed to ext4_get_group_no_and_offset() is 0 and the
es->s_first_data_block is 1. This makes block group number returned
from ext4_get_group_no_and_offset equal to -1. This is then passed to
ext4_get_group_info() and hits a BUG:
BUG_ON(group >= EXT4_SB(sb)->s_groups_count),
what can be seen in the trace below.
This patch adds an assertion to ext4_get_group_no_and_offset() that
checks if block number is not smaller than es->s_first_data_block.

kernel BUG at fs/ext4/ext4.h:3319!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 337 Comm: repro Not tainted 5.19.0-rc6-00105-g4e8e898e4107-dirty #14
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
RIP: 0010:ext4_mb_clear_bb+0x1bd6/0x1be0
Call Trace:
<TASK>
ext4_free_blocks+0x9b3/0xc90
ext4_clear_blocks+0x344/0x3b0
ext4_ind_truncate+0x967/0x1050
ext4_truncate+0xb1b/0x1210
ext4_evict_inode+0xf06/0x16f0
evict+0x2a3/0x630
iput+0x618/0x850
ext4_enable_quotas+0x578/0x920
ext4_orphan_cleanup+0x539/0x1200
ext4_fill_super+0x94d8/0x9bc0
get_tree_bdev+0x40c/0x630
ext4_get_tree+0x1c/0x20
vfs_get_tree+0x88/0x290
do_new_mount+0x289/0xac0
path_mount+0x607/0xfd0
__se_sys_mount+0x2c4/0x3b0
__x64_sys_mount+0xbf/0xd0
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>

Link: https://syzkaller.appspot.com/bug?id=5266d464285a03cee9dbfda7d2452a72c3c2ae7c
Reported-by: [email protected]
Signed-off-by: Tadeusz Struk <[email protected]>
---
fs/ext4/balloc.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
index 78ee3ef795ae..1175750ad05f 100644
--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -56,6 +56,9 @@ void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr,
struct ext4_super_block *es = EXT4_SB(sb)->s_es;
ext4_grpblk_t offset;

+ if (blocknr < le32_to_cpu(es->s_first_data_block))
+ blocknr = le32_to_cpu(es->s_first_data_block);
+
blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
offset = do_div(blocknr, EXT4_BLOCKS_PER_GROUP(sb)) >>
EXT4_SB(sb)->s_cluster_bits;
--
2.36.1


2022-07-14 09:59:40

by Lukas Czerner

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix kernel BUG in ext4_free_blocks

On Wed, Jul 13, 2022 at 11:59:04AM -0700, Tadeusz Struk wrote:
> Syzbot reported a BUG in ext4_free_blocks.
> The issue is triggered from ext4_mb_clear_bb(). What happens is the
> block number passed to ext4_get_group_no_and_offset() is 0 and the
> es->s_first_data_block is 1. This makes block group number returned
> from ext4_get_group_no_and_offset equal to -1. This is then passed to
> ext4_get_group_info() and hits a BUG:
> BUG_ON(group >= EXT4_SB(sb)->s_groups_count),
> what can be seen in the trace below.
> This patch adds an assertion to ext4_get_group_no_and_offset() that
> checks if block number is not smaller than es->s_first_data_block.
>
> kernel BUG at fs/ext4/ext4.h:3319!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 337 Comm: repro Not tainted 5.19.0-rc6-00105-g4e8e898e4107-dirty #14
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
> RIP: 0010:ext4_mb_clear_bb+0x1bd6/0x1be0
> Call Trace:
> <TASK>
> ext4_free_blocks+0x9b3/0xc90
> ext4_clear_blocks+0x344/0x3b0
> ext4_ind_truncate+0x967/0x1050
> ext4_truncate+0xb1b/0x1210
> ext4_evict_inode+0xf06/0x16f0
> evict+0x2a3/0x630
> iput+0x618/0x850
> ext4_enable_quotas+0x578/0x920
> ext4_orphan_cleanup+0x539/0x1200
> ext4_fill_super+0x94d8/0x9bc0
> get_tree_bdev+0x40c/0x630
> ext4_get_tree+0x1c/0x20
> vfs_get_tree+0x88/0x290
> do_new_mount+0x289/0xac0
> path_mount+0x607/0xfd0
> __se_sys_mount+0x2c4/0x3b0
> __x64_sys_mount+0xbf/0xd0
> do_syscall_64+0x3d/0x90
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> </TASK>
>
> Link: https://syzkaller.appspot.com/bug?id=5266d464285a03cee9dbfda7d2452a72c3c2ae7c
> Reported-by: [email protected]
> Signed-off-by: Tadeusz Struk <[email protected]>
> ---
> fs/ext4/balloc.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c
> index 78ee3ef795ae..1175750ad05f 100644
> --- a/fs/ext4/balloc.c
> +++ b/fs/ext4/balloc.c
> @@ -56,6 +56,9 @@ void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr,
> struct ext4_super_block *es = EXT4_SB(sb)->s_es;
> ext4_grpblk_t offset;
>
> + if (blocknr < le32_to_cpu(es->s_first_data_block))
> + blocknr = le32_to_cpu(es->s_first_data_block);
> +

This does not seem right. we should never work with block number smaller
than s_first_data_block. The first 1024 bytes of the file system are
unused and in case we have 1k block size, the entire first block is
unused.

I guess the image we work here with is corrupted, from the log it seems
that it was noticed correctly so the question is why did we still ended
up calling ext4_free_blocks() ? Seems like this should have been stopped
earlier by ext4_clear_blocks() ?

I did notice that in ext4_mb_clear_bb() we call
ext4_get_group_no_and_offset() before ext4_inode_block_valid() but
again we should have caught this problem earlier.

Can you link me the file system image that generated this problem?

Thanks!
-Lukas


> blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
> offset = do_div(blocknr, EXT4_BLOCKS_PER_GROUP(sb)) >>
> EXT4_SB(sb)->s_cluster_bits;
> --
> 2.36.1
>

2022-07-14 13:56:58

by Tadeusz Struk

[permalink] [raw]
Subject: Re: [PATCH] ext4: fix kernel BUG in ext4_free_blocks

On 7/14/22 05:23, Lukas Czerner wrote:
>> This does not seem right. we should never work with block number smaller
>> than s_first_data_block. The first 1024 bytes of the file system are
>> unused and in case we have 1k block size, the entire first block is
>> unused.
>>
>> I guess the image we work here with is corrupted, from the log it seems
>> that it was noticed correctly so the question is why did we still ended
>> up calling ext4_free_blocks() ? Seems like this should have been stopped
>> earlier by ext4_clear_blocks() ?
>>
>> I did notice that in ext4_mb_clear_bb() we call
>> ext4_get_group_no_and_offset() before ext4_inode_block_valid() but
>> again we should have caught this problem earlier.
>>
>> Can you link me the file system image that generated this problem?
> ok, I got the syzkaller C repro to work. The problem is that it's
> bigalloc file system and the 'block' and 'count' to free in
> ext4_free_blocks will get adjusted after the ext4_inode_block_valid().
>
> We should make sure that if this happens we also clear the
> EXT4_FREE_BLOCKS_VALIDATED. Additonally the ext4_inode_block_valid()
> in ext4_mb_clear_bb() should be called*before* the values are taken for
> granted. I'll prepare a patch to fix this.

Thank you for feedback Lukas. Please CC me on your patch so I could test it.
--
Thanks,
Tadeusz

2022-07-14 17:03:49

by Lukas Czerner

[permalink] [raw]
Subject: [PATCH] ext4: block range must be validated before use in ext4_mb_clear_bb()

Block range to free is validated in ext4_free_blocks() using
ext4_inode_block_valid() and then it's passed to ext4_mb_clear_bb().
However in some situations on bigalloc file system the range might be
adjusted after the validation in ext4_free_blocks() which can lead to
troubles on corrupted file systems such as one found by syzkaller that
resulted in the following BUG

kernel BUG at fs/ext4/ext4.h:3319!
PREEMPT SMP NOPTI
CPU: 28 PID: 4243 Comm: repro Kdump: loaded Not tainted 5.19.0-rc6+ #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
RIP: 0010:ext4_free_blocks+0x95e/0xa90
Call Trace:
<TASK>
? lock_timer_base+0x61/0x80
? __es_remove_extent+0x5a/0x760
? __mod_timer+0x256/0x380
? ext4_ind_truncate_ensure_credits+0x90/0x220
ext4_clear_blocks+0x107/0x1b0
ext4_free_data+0x15b/0x170
ext4_ind_truncate+0x214/0x2c0
? _raw_spin_unlock+0x15/0x30
? ext4_discard_preallocations+0x15a/0x410
? ext4_journal_check_start+0xe/0x90
? __ext4_journal_start_sb+0x2f/0x110
ext4_truncate+0x1b5/0x460
? __ext4_journal_start_sb+0x2f/0x110
ext4_evict_inode+0x2b4/0x6f0
evict+0xd0/0x1d0
ext4_enable_quotas+0x11f/0x1f0
ext4_orphan_cleanup+0x3de/0x430
? proc_create_seq_private+0x43/0x50
ext4_fill_super+0x295f/0x3ae0
? snprintf+0x39/0x40
? sget_fc+0x19c/0x330
? ext4_reconfigure+0x850/0x850
get_tree_bdev+0x16d/0x260
vfs_get_tree+0x25/0xb0
path_mount+0x431/0xa70
__x64_sys_mount+0xe2/0x120
do_syscall_64+0x5b/0x80
? do_user_addr_fault+0x1e2/0x670
? exc_page_fault+0x70/0x170
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fdf4e512ace

Fix it by making sure that the block range is properly validated before
used every time it changes in ext4_free_blocks() or ext4_mb_clear_bb().

Link: https://syzkaller.appspot.com/bug?id=5266d464285a03cee9dbfda7d2452a72c3c2ae7c
Reported-by: [email protected]
Signed-off-by: Lukas Czerner <[email protected]>
Cc: Tadeusz Struk <[email protected]>
---
fs/ext4/mballoc.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 9e06334771a3..38e7dc2531b1 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5928,6 +5928,15 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,

sbi = EXT4_SB(sb);

+ if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
+ !ext4_inode_block_valid(inode, block, count)) {
+ ext4_error(sb, "Freeing blocks in system zone - "
+ "Block = %llu, count = %lu", block, count);
+ /* err = 0. ext4_std_error should be a no op */
+ goto error_return;
+ }
+ flags |= EXT4_FREE_BLOCKS_VALIDATED;
+
do_more:
overflow = 0;
ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
@@ -5944,6 +5953,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
overflow = EXT4_C2B(sbi, bit) + count -
EXT4_BLOCKS_PER_GROUP(sb);
count -= overflow;
+ /* The range changed so it's no longer validated */
+ flags &= ~EXT4_FREE_BLOCKS_VALIDATED;
}
count_clusters = EXT4_NUM_B2C(sbi, count);
bitmap_bh = ext4_read_block_bitmap(sb, block_group);
@@ -5958,7 +5969,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
goto error_return;
}

- if (!ext4_inode_block_valid(inode, block, count)) {
+ if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) &&
+ !ext4_inode_block_valid(inode, block, count)) {
ext4_error(sb, "Freeing blocks in system zone - "
"Block = %llu, count = %lu", block, count);
/* err = 0. ext4_std_error should be a no op */
@@ -6081,6 +6093,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode,
block += count;
count = overflow;
put_bh(bitmap_bh);
+ /* The range changed so it's no longer validated */
+ flags &= ~EXT4_FREE_BLOCKS_VALIDATED;
goto do_more;
}
error_return:
@@ -6127,6 +6141,7 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
"block = %llu, count = %lu", block, count);
return;
}
+ flags |= EXT4_FREE_BLOCKS_VALIDATED;

ext4_debug("freeing block %llu\n", block);
trace_ext4_free_blocks(inode, block, count, flags);
@@ -6158,6 +6173,8 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
block -= overflow;
count += overflow;
}
+ /* The range changed so it's no longer validated */
+ flags &= ~EXT4_FREE_BLOCKS_VALIDATED;
}
overflow = EXT4_LBLK_COFF(sbi, count);
if (overflow) {
@@ -6168,6 +6185,8 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode,
return;
} else
count += sbi->s_cluster_ratio - overflow;
+ /* The range changed so it's no longer validated */
+ flags &= ~EXT4_FREE_BLOCKS_VALIDATED;
}

if (!bh && (flags & EXT4_FREE_BLOCKS_FORGET)) {
--
2.35.3

2022-07-14 17:44:45

by Tadeusz Struk

[permalink] [raw]
Subject: Re: [PATCH] ext4: block range must be validated before use in ext4_mb_clear_bb()

On 7/14/22 09:59, Lukas Czerner wrote:
> Block range to free is validated in ext4_free_blocks() using
> ext4_inode_block_valid() and then it's passed to ext4_mb_clear_bb().
> However in some situations on bigalloc file system the range might be
> adjusted after the validation in ext4_free_blocks() which can lead to
> troubles on corrupted file systems such as one found by syzkaller that
> resulted in the following BUG
>
> kernel BUG at fs/ext4/ext4.h:3319!
> PREEMPT SMP NOPTI
> CPU: 28 PID: 4243 Comm: repro Kdump: loaded Not tainted 5.19.0-rc6+ #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
> RIP: 0010:ext4_free_blocks+0x95e/0xa90
> Call Trace:
> <TASK>
> ? lock_timer_base+0x61/0x80
> ? __es_remove_extent+0x5a/0x760
> ? __mod_timer+0x256/0x380
> ? ext4_ind_truncate_ensure_credits+0x90/0x220
> ext4_clear_blocks+0x107/0x1b0
> ext4_free_data+0x15b/0x170
> ext4_ind_truncate+0x214/0x2c0
> ? _raw_spin_unlock+0x15/0x30
> ? ext4_discard_preallocations+0x15a/0x410
> ? ext4_journal_check_start+0xe/0x90
> ? __ext4_journal_start_sb+0x2f/0x110
> ext4_truncate+0x1b5/0x460
> ? __ext4_journal_start_sb+0x2f/0x110
> ext4_evict_inode+0x2b4/0x6f0
> evict+0xd0/0x1d0
> ext4_enable_quotas+0x11f/0x1f0
> ext4_orphan_cleanup+0x3de/0x430
> ? proc_create_seq_private+0x43/0x50
> ext4_fill_super+0x295f/0x3ae0
> ? snprintf+0x39/0x40
> ? sget_fc+0x19c/0x330
> ? ext4_reconfigure+0x850/0x850
> get_tree_bdev+0x16d/0x260
> vfs_get_tree+0x25/0xb0
> path_mount+0x431/0xa70
> __x64_sys_mount+0xe2/0x120
> do_syscall_64+0x5b/0x80
> ? do_user_addr_fault+0x1e2/0x670
> ? exc_page_fault+0x70/0x170
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7fdf4e512ace
>
> Fix it by making sure that the block range is properly validated before
> used every time it changes in ext4_free_blocks() or ext4_mb_clear_bb().

That works for me. Once applied it will need to be ported to stable
kernels. I will take care of that. Thanks!

Tested-by: Tadeusz Struk <[email protected]>

--
Thanks,
Tadeusz

2022-07-22 13:59:50

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: block range must be validated before use in ext4_mb_clear_bb()

On Thu, 14 Jul 2022 18:59:03 +0200, Lukas Czerner wrote:
> Block range to free is validated in ext4_free_blocks() using
> ext4_inode_block_valid() and then it's passed to ext4_mb_clear_bb().
> However in some situations on bigalloc file system the range might be
> adjusted after the validation in ext4_free_blocks() which can lead to
> troubles on corrupted file systems such as one found by syzkaller that
> resulted in the following BUG
>
> [...]

Applied, thanks!

[1/1] ext4: block range must be validated before use in ext4_mb_clear_bb()
commit: 91e204c46741b198693dd88bd7b03a5b5fe0ce17

Best regards,
--
Theodore Ts'o <[email protected]>