When inode is created and written to using direct IO, there is nothing
to clear the EXT4_STATE_MAY_INLINE_DATA flag. Thus when inode gets
truncated later to say 1 byte and written using normal write, we will
try to store the data as inline data. This confuses the code later
because the inode now has both normal block and inline data allocated
and the confusion manifests for example as:
kernel BUG at fs/ext4/inode.c:2721!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 359 Comm: repro Not tainted 5.19.0-rc8-00001-g31ba1e3b8305-dirty #15
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
RIP: 0010:ext4_writepages+0x363d/0x3660
RSP: 0018:ffffc90000ccf260 EFLAGS: 00010293
RAX: ffffffff81e1abcd RBX: 0000008000000000 RCX: ffff88810842a180
RDX: 0000000000000000 RSI: 0000008000000000 RDI: 0000000000000000
RBP: ffffc90000ccf650 R08: ffffffff81e17d58 R09: ffffed10222c680b
R10: dfffe910222c680c R11: 1ffff110222c680a R12: ffff888111634128
R13: ffffc90000ccf880 R14: 0000008410000000 R15: 0000000000000001
FS: 00007f72635d2640(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000565243379180 CR3: 000000010aa74000 CR4: 0000000000150eb0
Call Trace:
<TASK>
do_writepages+0x397/0x640
filemap_fdatawrite_wbc+0x151/0x1b0
file_write_and_wait_range+0x1c9/0x2b0
ext4_sync_file+0x19e/0xa00
vfs_fsync_range+0x17b/0x190
ext4_buffered_write_iter+0x488/0x530
ext4_file_write_iter+0x449/0x1b90
vfs_write+0xbcd/0xf40
ksys_write+0x198/0x2c0
__x64_sys_write+0x7b/0x90
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
</TASK>
Fix the problem by clearing EXT4_STATE_MAY_INLINE_DATA when we are doing
direct IO write to a file.
Reported-by: Tadeusz Struk <[email protected]>
Reported-by: [email protected]
Link: https://syzkaller.appspot.com/bug?id=a1e89d09bbbcbd5c4cb45db230ee28c822953984
Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/file.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 109d07629f81..cab5dfed1cd6 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -528,6 +528,12 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
ret = -EAGAIN;
goto out;
}
+ /*
+ * Make sure inline data cannot be created anymore since we are going
+ * to allocate blocks for DIO. We know the inode does not have any
+ * inline data now because ext4_dio_supported() checked for that.
+ */
+ ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
offset = iocb->ki_pos;
count = ret;
--
2.35.3
On 7/27/22 08:57, Jan Kara wrote:
> When inode is created and written to using direct IO, there is nothing
> to clear the EXT4_STATE_MAY_INLINE_DATA flag. Thus when inode gets
> truncated later to say 1 byte and written using normal write, we will
> try to store the data as inline data. This confuses the code later
> because the inode now has both normal block and inline data allocated
> and the confusion manifests for example as:
>
> kernel BUG at fs/ext4/inode.c:2721!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 359 Comm: repro Not tainted 5.19.0-rc8-00001-g31ba1e3b8305-dirty #15
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
> RIP: 0010:ext4_writepages+0x363d/0x3660
> RSP: 0018:ffffc90000ccf260 EFLAGS: 00010293
> RAX: ffffffff81e1abcd RBX: 0000008000000000 RCX: ffff88810842a180
> RDX: 0000000000000000 RSI: 0000008000000000 RDI: 0000000000000000
> RBP: ffffc90000ccf650 R08: ffffffff81e17d58 R09: ffffed10222c680b
> R10: dfffe910222c680c R11: 1ffff110222c680a R12: ffff888111634128
> R13: ffffc90000ccf880 R14: 0000008410000000 R15: 0000000000000001
> FS: 00007f72635d2640(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000565243379180 CR3: 000000010aa74000 CR4: 0000000000150eb0
> Call Trace:
> <TASK>
> do_writepages+0x397/0x640
> filemap_fdatawrite_wbc+0x151/0x1b0
> file_write_and_wait_range+0x1c9/0x2b0
> ext4_sync_file+0x19e/0xa00
> vfs_fsync_range+0x17b/0x190
> ext4_buffered_write_iter+0x488/0x530
> ext4_file_write_iter+0x449/0x1b90
> vfs_write+0xbcd/0xf40
> ksys_write+0x198/0x2c0
> __x64_sys_write+0x7b/0x90
> do_syscall_64+0x3d/0x90
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> </TASK>
>
> Fix the problem by clearing EXT4_STATE_MAY_INLINE_DATA when we are doing
> direct IO write to a file.
>
> Reported-by: Tadeusz Struk<[email protected]>
> Reported-by:[email protected]
> Link:https://syzkaller.appspot.com/bug?id=a1e89d09bbbcbd5c4cb45db230ee28c822953984
> Signed-off-by: Jan Kara<[email protected]>
That works fine for me. Thanks Honza.
Tested-by: Tadeusz Struk<[email protected]>
It should also be applied to stable v5.15 and v5.10.
I will send a request once this lands in mainline.
--
Thanks,
Tadeusz
On Wed, Jul 27, 2022 at 05:57:53PM +0200, Jan Kara wrote:
> When inode is created and written to using direct IO, there is nothing
> to clear the EXT4_STATE_MAY_INLINE_DATA flag. Thus when inode gets
> truncated later to say 1 byte and written using normal write, we will
> try to store the data as inline data. This confuses the code later
> because the inode now has both normal block and inline data allocated
> and the confusion manifests for example as:
>
> kernel BUG at fs/ext4/inode.c:2721!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 359 Comm: repro Not tainted 5.19.0-rc8-00001-g31ba1e3b8305-dirty #15
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
> RIP: 0010:ext4_writepages+0x363d/0x3660
> RSP: 0018:ffffc90000ccf260 EFLAGS: 00010293
> RAX: ffffffff81e1abcd RBX: 0000008000000000 RCX: ffff88810842a180
> RDX: 0000000000000000 RSI: 0000008000000000 RDI: 0000000000000000
> RBP: ffffc90000ccf650 R08: ffffffff81e17d58 R09: ffffed10222c680b
> R10: dfffe910222c680c R11: 1ffff110222c680a R12: ffff888111634128
> R13: ffffc90000ccf880 R14: 0000008410000000 R15: 0000000000000001
> FS: 00007f72635d2640(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000565243379180 CR3: 000000010aa74000 CR4: 0000000000150eb0
> Call Trace:
> <TASK>
> do_writepages+0x397/0x640
> filemap_fdatawrite_wbc+0x151/0x1b0
> file_write_and_wait_range+0x1c9/0x2b0
> ext4_sync_file+0x19e/0xa00
> vfs_fsync_range+0x17b/0x190
> ext4_buffered_write_iter+0x488/0x530
> ext4_file_write_iter+0x449/0x1b90
> vfs_write+0xbcd/0xf40
> ksys_write+0x198/0x2c0
> __x64_sys_write+0x7b/0x90
> do_syscall_64+0x3d/0x90
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> </TASK>
>
> Fix the problem by clearing EXT4_STATE_MAY_INLINE_DATA when we are doing
> direct IO write to a file.
Looks good, thanks.
Reviewed-by: Lukas Czerner <[email protected]>
>
> Reported-by: Tadeusz Struk <[email protected]>
> Reported-by: [email protected]
> Link: https://syzkaller.appspot.com/bug?id=a1e89d09bbbcbd5c4cb45db230ee28c822953984
> Signed-off-by: Jan Kara <[email protected]>
> ---
> fs/ext4/file.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 109d07629f81..cab5dfed1cd6 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -528,6 +528,12 @@ static ssize_t ext4_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
> ret = -EAGAIN;
> goto out;
> }
> + /*
> + * Make sure inline data cannot be created anymore since we are going
> + * to allocate blocks for DIO. We know the inode does not have any
> + * inline data now because ext4_dio_supported() checked for that.
> + */
> + ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
>
> offset = iocb->ki_pos;
> count = ret;
> --
> 2.35.3
>
On Wed, 27 Jul 2022 17:57:53 +0200, Jan Kara wrote:
> When inode is created and written to using direct IO, there is nothing
> to clear the EXT4_STATE_MAY_INLINE_DATA flag. Thus when inode gets
> truncated later to say 1 byte and written using normal write, we will
> try to store the data as inline data. This confuses the code later
> because the inode now has both normal block and inline data allocated
> and the confusion manifests for example as:
>
> [...]
Applied, thanks!
[1/1] ext4: Avoid crash when inline data creation follows DIO write
commit: 4331037750fdd4c698facc8a03075f88f15ffbe6
Best regards,
--
Theodore Ts'o <[email protected]>