2022-04-21 15:30:51

by Nguyen Dinh Phi

[permalink] [raw]
Subject: [PATCH] ext4: Fix block validation on non-journal fs in __ext4_iget()

Syzbot report following KERNEL BUG:
kernel BUG at fs/ext4/extents_status.c:899!
....
Call Trace:
<TASK>
ext4_cache_extents+0x13e/0x2d0 fs/ext4/extents.c:518
ext4_find_extent+0x8f6/0xd10 fs/ext4/extents.c:916
ext4_ext_map_blocks+0x1e2/0x5f30 fs/ext4/extents.c:4098
ext4_map_blocks+0x9ca/0x18a0 fs/ext4/inode.c:563
ext4_getblk+0x553/0x6b0 fs/ext4/inode.c:849
ext4_bread_batch+0x7c/0x550 fs/ext4/inode.c:923
__ext4_find_entry+0x482/0x1050 fs/ext4/namei.c:1600
ext4_lookup_entry fs/ext4/namei.c:1701 [inline]
ext4_lookup fs/ext4/namei.c:1769 [inline]
ext4_lookup+0x4fc/0x730 fs/ext4/namei.c:1760
__lookup_slow+0x24c/0x480 fs/namei.c:1707
lookup_slow fs/namei.c:1724 [inline]
walk_component+0x40f/0x6a0 fs/namei.c:2020
link_path_walk.part.0+0x7ef/0xf70 fs/namei.c:2347
link_path_walk fs/namei.c:2272 [inline]
path_openat+0x266/0x2940 fs/namei.c:3605
do_filp_open+0x1aa/0x400 fs/namei.c:3636
do_sys_openat2+0x16d/0x4d0 fs/open.c:1214
do_sys_open fs/open.c:1230 [inline]
__do_sys_openat fs/open.c:1246 [inline]
__se_sys_openat fs/open.c:1241 [inline]
__x64_sys_openat+0x13f/0x1f0 fs/open.c:1241
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>

The reason is fast commit recovery path will skip block validation in
__ext4_iget(), it allows syzbot be able to mount a corrupted non-journal
filesystem and cause kernel BUG when accessing it.

Fix it by adding a condition checking.

Signed-off-by: Nguyen Dinh Phi <[email protected]>
Reported-by: [email protected]
---
fs/ext4/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 560e56b42829..66c86d85081e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4951,7 +4951,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
goto bad_inode;
} else if (!ext4_has_inline_data(inode)) {
/* validate the block references in the inode */
- if (!(EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY) &&
+ if (!(journal && EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY) &&
(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
(S_ISLNK(inode->i_mode) &&
!ext4_inode_is_fast_symlink(inode)))) {
--
2.25.1


2022-05-14 05:01:16

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] ext4: Fix block validation on non-journal fs in __ext4_iget()

On Thu, Apr 21, 2022 at 03:23:12AM +0800, Nguyen Dinh Phi wrote:
> Syzbot report following KERNEL BUG:
> kernel BUG at fs/ext4/extents_status.c:899!
> ....
>
> The reason is fast commit recovery path will skip block validation in
> __ext4_iget(), it allows syzbot be able to mount a corrupted non-journal
> filesystem and cause kernel BUG when accessing it.
>
> Fix it by adding a condition checking.

> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 560e56b42829..66c86d85081e 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4951,7 +4951,7 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
> goto bad_inode;
> } else if (!ext4_has_inline_data(inode)) {
> /* validate the block references in the inode */
> - if (!(EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY) &&
> + if (!(journal && EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY) &&

This isn't the right fix. It papers over the problem and fixes the
specific syzkaller fuzzed image, but there are other corrupted file
system images which will cause problems.

What the syzkaller fuzzed file system image did was to set the
EXT4_FC_REPLAY_BIT bit the on_disk superblock field s_state, which
then gets copied to sbi->s_mount_state:

sbi->s_mount_state = le16_to_cpu(es->s_state);

... and then hilarity ensues.

The root cause is that we are using EXT4_FC_REPLAY bit in
sbi->s_mount_state to indicate whether we are in the middle of a fast
commit replay. This *should* have been done using a bit in
s_mount_flags (e.g., EXT4_MF_FC_REPLAY) via the
ext4_{set,clear,test}_mount_flag() inline functions.

The previous paragraph describes the correct long-term fix, but the
trivial/hacky fix which is easy to backport to LTS stable kernels is
something like this:

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4b0ea8df1f5c..f7ae53d986f1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4889,7 +4889,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
sbi->s_inodes_per_block;
sbi->s_desc_per_block = blocksize / EXT4_DESC_SIZE(sb);
sbi->s_sbh = bh;
- sbi->s_mount_state = le16_to_cpu(es->s_state);
+ sbi->s_mount_state = le16_to_cpu(es->s_state) & ~EXT4_FC_REPLAY;
sbi->s_addr_per_block_bits = ilog2(EXT4_ADDR_PER_BLOCK(sb));
sbi->s_desc_per_block_bits = ilog2(EXT4_DESC_PER_BLOCK(sb));

@@ -6452,7 +6452,8 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (err)
goto restore_opts;
}
- sbi->s_mount_state = le16_to_cpu(es->s_state);
+ sbi->s_mount_state = (le16_to_cpu(es->s_state) &
+ ~EXT4_FC_REPLAY);

err = ext4_setup_super(sb, es, 0);
if (err)

(The first hunk is sufficient to suppress the syzkaller failure, but
for completeness sake we need catch the case where the journal
contains a maliciously modified superblock, which then is copied to
the active superblock, after which hilarity once again ensues.)

- Ted

2022-05-18 04:03:51

by Theodore Ts'o

[permalink] [raw]
Subject: [PATCH] ext4: filter out EXT4_FC_REPLAY from on-disk superblock field s_state

The EXT4_FC_REPLAY bit in sbi->s_mount_state is used to indicate that
we are in the middle of replay the fast commit journal. This was
actually a mistake, since the sbi->s_mount_info is initialized from
es->s_state. Arguably s_mount_state is misleadingly named, but the
name is historical --- s_mount_state and s_state dates back to ext2.

What should have been used is the ext4_{set,clear,test}_mount_flag()
inline functions, which sets EXT4_MF_* bits in sbi->s_mount_flags.

The problem with using EXT4_FC_REPLAY is that a maliciously corrupted
superblock could result in EXT4_FC_REPLAY getting set in
s_mount_state. This bypasses some sanity checks, and this can trigger
a BUG() in ext4_es_cache_extent(). As a easy-to-backport-fix, filter
out the EXT4_FC_REPLAY bit for now. We should eventually transition
away from EXT4_FC_REPLAY to something like EXT4_MF_REPLAY.

Signed-off-by: Theodore Ts'o <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reported-by: [email protected]
---
fs/ext4/super.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4b0ea8df1f5c..f7ae53d986f1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4889,7 +4889,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
sbi->s_inodes_per_block;
sbi->s_desc_per_block = blocksize / EXT4_DESC_SIZE(sb);
sbi->s_sbh = bh;
- sbi->s_mount_state = le16_to_cpu(es->s_state);
+ sbi->s_mount_state = le16_to_cpu(es->s_state) & ~EXT4_FC_REPLAY;
sbi->s_addr_per_block_bits = ilog2(EXT4_ADDR_PER_BLOCK(sb));
sbi->s_desc_per_block_bits = ilog2(EXT4_DESC_PER_BLOCK(sb));

@@ -6452,7 +6452,8 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
if (err)
goto restore_opts;
}
- sbi->s_mount_state = le16_to_cpu(es->s_state);
+ sbi->s_mount_state = (le16_to_cpu(es->s_state) &
+ ~EXT4_FC_REPLAY);

err = ext4_setup_super(sb, es, 0);
if (err)
--
2.31.0