2024-06-14 13:30:24

by Hillf Danton

[permalink] [raw]
Subject: [RFC PATCH] mm: truncate: flush lru cache for evicted inode

Flush lru cache to avoid folio->mapping uaf in case of inode teardown.

Reported-and-tested-by: [email protected]
Signed-off-by: Hillf Danton <[email protected]>
---
Post for comments because lru_add_drain_all() is too haevy a hammer.

--- x/mm/truncate.c
+++ y/mm/truncate.c
@@ -419,6 +419,9 @@ void truncate_inode_pages_range(struct a
truncate_folio_batch_exceptionals(mapping, &fbatch, indices);
folio_batch_release(&fbatch);
}
+
+ if (mapping_exiting(mapping))
+ lru_add_drain_all();
}
EXPORT_SYMBOL(truncate_inode_pages_range);

--


2024-06-14 14:44:22

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode

On Fri, Jun 14, 2024 at 09:18:56PM +0800, Hillf Danton wrote:
> Flush lru cache to avoid folio->mapping uaf in case of inode teardown.

What? inodes are supposed to have all their folios removed before
being freed. Part of removing a folio sets the folio->mapping to NULL.
Where is the report?

> Reported-and-tested-by: [email protected]
> Signed-off-by: Hillf Danton <[email protected]>
> ---
> Post for comments because lru_add_drain_all() is too haevy a hammer.
>
> --- x/mm/truncate.c
> +++ y/mm/truncate.c
> @@ -419,6 +419,9 @@ void truncate_inode_pages_range(struct a
> truncate_folio_batch_exceptionals(mapping, &fbatch, indices);
> folio_batch_release(&fbatch);
> }
> +
> + if (mapping_exiting(mapping))
> + lru_add_drain_all();
> }
> EXPORT_SYMBOL(truncate_inode_pages_range);
>
> --

2024-06-15 00:01:01

by Hillf Danton

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode

On Fri, 14 Jun 2024 14:42:20 +0100 Matthew Wilcox wrote:
> On Fri, Jun 14, 2024 at 09:18:56PM +0800, Hillf Danton wrote:
> > Flush lru cache to avoid folio->mapping uaf in case of inode teardown.
>
> What? inodes are supposed to have all their folios removed before
> being freed. Part of removing a folio sets the folio->mapping to NULL.
> Where is the report?
>
Subject: Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn
https://lore.kernel.org/lkml/[email protected]/

2024-06-15 23:13:34

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode

On Sat, Jun 15, 2024 at 07:59:53AM +0800, Hillf Danton wrote:
> On Fri, 14 Jun 2024 14:42:20 +0100 Matthew Wilcox wrote:
> > On Fri, Jun 14, 2024 at 09:18:56PM +0800, Hillf Danton wrote:
> > > Flush lru cache to avoid folio->mapping uaf in case of inode teardown.
> >
> > What? inodes are supposed to have all their folios removed before
> > being freed. Part of removing a folio sets the folio->mapping to NULL.
> > Where is the report?
> >
> Subject: Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn
> https://lore.kernel.org/lkml/[email protected]/

Thanks. This fix is wrong. Of course syzbot says it fixes the problem,
but you're just avoiding putting the folios into the situation where we
have debug that would detect the problem.

I suspect this would trigger:

+++ b/fs/inode.c
@@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
void __destroy_inode(struct inode *inode)
{
BUG_ON(inode_has_buffers(inode));
+ BUG_ON(inode->i_data.nrpages);
inode_detach_wb(inode);
security_inode_free(inode);
fsnotify_inode_delete(inode);

and what a real fix would look like would be calling clear_inode()
before calling iput() in nilfs_put_root(). But I'm not an expert
in this layer of the VFS, so I might well be wrong.

2024-06-15 23:53:12

by Hillf Danton

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode

On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote:
> On Sat, Jun 15, 2024 at 07:59:53AM +0800, Hillf Danton wrote:
> > On Fri, 14 Jun 2024 14:42:20 +0100 Matthew Wilcox wrote:
> > > On Fri, Jun 14, 2024 at 09:18:56PM +0800, Hillf Danton wrote:
> > > > Flush lru cache to avoid folio->mapping uaf in case of inode teardown.
> > >
> > > What? inodes are supposed to have all their folios removed before
> > > being freed. Part of removing a folio sets the folio->mapping to NULL.
> > > Where is the report?
> > >
> > Subject: Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn
> > https://lore.kernel.org/lkml/[email protected]/
>
> Thanks. This fix is wrong. Of course syzbot says it fixes the problem,
> but you're just avoiding putting the folios into the situation where we
> have debug that would detect the problem.
>
> I suspect this would trigger:
>
Happy to test your idea.

> +++ b/fs/inode.c
> @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
> void __destroy_inode(struct inode *inode)
> {
> BUG_ON(inode_has_buffers(inode));
> + BUG_ON(inode->i_data.nrpages);
> inode_detach_wb(inode);
> security_inode_free(inode);
> fsnotify_inode_delete(inode);
>
> and what a real fix would look like would be calling clear_inode()
> before calling iput() in nilfs_put_root(). But I'm not an expert

Hm...given I_FREEING checked in clear_inode(), fix like this one could be
tried in midle 2026.

> in this layer of the VFS, so I might well be wrong.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 83a7eefedc9b

--- x/mm/truncate.c
+++ y/mm/truncate.c
@@ -419,6 +419,9 @@ void truncate_inode_pages_range(struct a
truncate_folio_batch_exceptionals(mapping, &fbatch, indices);
folio_batch_release(&fbatch);
}
+
+ if (mapping_exiting(mapping))
+ lru_add_drain_all();
}
EXPORT_SYMBOL(truncate_inode_pages_range);

--- x/fs/inode.c
+++ y/fs/inode.c
@@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct
void __destroy_inode(struct inode *inode)
{
BUG_ON(inode_has_buffers(inode));
+ BUG_ON(inode->i_data.nrpages);
inode_detach_wb(inode);
security_inode_free(inode);
fsnotify_inode_delete(inode);
--

2024-06-16 00:10:15

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel BUG in __destroy_inode

NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0)
NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0)
NILFS (loop0): disposed unprocessed dirty file(s) when stopping log writer
------------[ cut here ]------------
kernel BUG at fs/inode.c:285!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 2 PID: 5330 Comm: syz-executor Not tainted 6.10.0-rc3-syzkaller-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__destroy_inode+0x5e4/0x7a0 fs/inode.c:285
Code: 2a 03 00 00 48 c7 c7 40 78 3d 8b c6 05 aa 6d cc 0d 01 e8 bf d9 69 ff e9 0e fc ff ff e8 a5 8b 8c ff 90 0f 0b e8 9d 8b 8c ff 90 <0f> 0b e8 95 8b 8c ff 90 0f 0b 90 e9 fa fa ff ff e8 87 8b 8c ff 90
RSP: 0018:ffffc900035afaf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8880325ba7c8 RCX: ffffffff82015439
RDX: ffff8880222ec880 RSI: ffffffff820159b3 RDI: 0000000000000007
RBP: 0000000000000001 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880325ba980
R13: 0000000000000024 R14: ffffffff8b706c60 R15: ffff8880325ba8a0
FS: 0000555571e27480(0000) GS:ffff88806b200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f01cb366731 CR3: 0000000034ef4000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
destroy_inode+0x91/0x1b0 fs/inode.c:310
iput_final fs/inode.c:1742 [inline]
iput.part.0+0x5a8/0x7f0 fs/inode.c:1768
iput+0x5c/0x80 fs/inode.c:1758
nilfs_put_root+0xae/0xe0 fs/nilfs2/the_nilfs.c:925
nilfs_segctor_destroy fs/nilfs2/segment.c:2788 [inline]
nilfs_detach_log_writer+0x5ef/0xaa0 fs/nilfs2/segment.c:2850
nilfs_put_super+0x43/0x1b0 fs/nilfs2/super.c:498
generic_shutdown_super+0x159/0x3d0 fs/super.c:642
kill_block_super+0x3b/0x90 fs/super.c:1676
deactivate_locked_super+0xbe/0x1a0 fs/super.c:473
deactivate_super+0xde/0x100 fs/super.c:506
cleanup_mnt+0x222/0x450 fs/namespace.c:1267
task_work_run+0x14e/0x250 kernel/task_work.c:180
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x278/0x2a0 kernel/entry/common.c:218
do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fc203a7e217
Code: b0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 b0 ff ff ff f7 d8 64 89 02 b8
RSP: 002b:00007fffe9265ae8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000000000000064 RCX: 00007fc203a7e217
RDX: 0000000000000200 RSI: 0000000000000009 RDI: 00007fffe9266c90
RBP: 00007fc203ac8336 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000100 R11: 0000000000000202 R12: 00007fffe9266c90
R13: 00007fc203ac8336 R14: 0000555571e27430 R15: 0000000000000005
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__destroy_inode+0x5e4/0x7a0 fs/inode.c:285
Code: 2a 03 00 00 48 c7 c7 40 78 3d 8b c6 05 aa 6d cc 0d 01 e8 bf d9 69 ff e9 0e fc ff ff e8 a5 8b 8c ff 90 0f 0b e8 9d 8b 8c ff 90 <0f> 0b e8 95 8b 8c ff 90 0f 0b 90 e9 fa fa ff ff e8 87 8b 8c ff 90
RSP: 0018:ffffc900035afaf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8880325ba7c8 RCX: ffffffff82015439
RDX: ffff8880222ec880 RSI: ffffffff820159b3 RDI: 0000000000000007
RBP: 0000000000000001 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880325ba980
R13: 0000000000000024 R14: ffffffff8b706c60 R15: ffff8880325ba8a0
FS: 0000555571e27480(0000) GS:ffff88806b300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c0016fb000 CR3: 0000000034ef4000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


Tested on:

commit: 83a7eefe Linux 6.10-rc3
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=11bb8ada980000
kernel config: https://syzkaller.appspot.com/x/.config?x=b8786f381e62940f
dashboard link: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=16642012980000


2024-06-16 02:40:30

by Hillf Danton

[permalink] [raw]
Subject: Re: [RFC PATCH] mm: truncate: flush lru cache for evicted inode

On Sat, 15 Jun 2024 21:44:54 +0100 Matthew Wilcox wrote:
>
> I suspect this would trigger:
>
> +++ b/fs/inode.c
> @@ -282,6 +282,7 @@ static struct inode *alloc_inode(struct super_block *sb)
> void __destroy_inode(struct inode *inode)
> {
> BUG_ON(inode_has_buffers(inode));
> + BUG_ON(inode->i_data.nrpages);
> inode_detach_wb(inode);
> security_inode_free(inode);
> fsnotify_inode_delete(inode);
>
Yes, it was triggered [1]

[1] https://lore.kernel.org/lkml/[email protected]/

and given trigger after nrpages is checked in clear_inode(),

iput(inode)
evict(inode)
truncate_inode_pages_final(&inode->i_data);
clear_inode(inode);
destroy_inode(inode);

why is folio added to exiting mapping?

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 83a7eefedc9b

--- x/mm/filemap.c
+++ y/mm/filemap.c
@@ -870,6 +870,7 @@ noinline int __filemap_add_folio(struct
folio_ref_add(folio, nr);
folio->mapping = mapping;
folio->index = xas.xa_index;
+ BUG_ON(mapping_exiting(mapping));

for (;;) {
int order = -1, split_order = 0;
--

2024-06-16 03:06:17

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [nilfs?] [mm?] KASAN: slab-use-after-free Read in lru_add_fn

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel BUG in __filemap_add_folio

NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0)
NILFS (loop0): I/O error reading meta-data file (ino=3, block-offset=0)
NILFS (loop0): disposed unprocessed dirty file(s) when stopping log writer
------------[ cut here ]------------
kernel BUG at mm/filemap.c:873!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 PID: 5321 Comm: syz-executor Not tainted 6.10.0-rc3-syzkaller-dirty #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__filemap_add_folio+0xd1d/0xe80 mm/filemap.c:873
Code: 37 8b 4c 89 f7 e8 23 68 10 00 90 0f 0b e8 9b 14 ce ff 48 c7 c6 e0 92 37 8b 4c 89 f7 e8 0c 68 10 00 90 0f 0b e8 84 14 ce ff 90 <0f> 0b e8 7c 14 ce ff 90 0f 0b 90 e9 24 fb ff ff e8 6e 14 ce ff 48
RSP: 0018:ffffc900035773f0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81bfc8cd
RDX: ffff888023052440 RSI: ffffffff81bfd0cc RDI: 0000000000000001
RBP: ffff88803233a9f0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000003 R12: ffffc90003577468
R13: 0000000000000000 R14: ffffea0000b3f7c0 R15: 0000000000000000
FS: 000055556c846480(0000) GS:ffff88806b100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe311b9ff8 CR3: 000000001ae02000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
filemap_add_folio+0x110/0x220 mm/filemap.c:971
__filemap_get_folio+0x455/0xa80 mm/filemap.c:1959
filemap_grab_folio include/linux/pagemap.h:697 [inline]
nilfs_grab_buffer+0xc3/0x370 fs/nilfs2/page.c:57
nilfs_mdt_submit_block+0x9f/0x870 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xa4/0x3b0 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0xdb/0xb90 fs/nilfs2/mdt.c:251
nilfs_palloc_get_block+0xb5/0x300 fs/nilfs2/alloc.c:217
nilfs_palloc_get_entry_block+0x165/0x1b0 fs/nilfs2/alloc.c:319
nilfs_ifile_delete_inode+0x1e6/0x260 fs/nilfs2/ifile.c:109
nilfs_evict_inode+0x294/0x550 fs/nilfs2/inode.c:950
evict+0x2ed/0x6c0 fs/inode.c:667
iput_final fs/inode.c:1741 [inline]
iput.part.0+0x5a8/0x7f0 fs/inode.c:1767
iput+0x5c/0x80 fs/inode.c:1757
nilfs_put_root+0xae/0xe0 fs/nilfs2/the_nilfs.c:925
nilfs_segctor_destroy fs/nilfs2/segment.c:2788 [inline]
nilfs_detach_log_writer+0x5ef/0xaa0 fs/nilfs2/segment.c:2850
nilfs_put_super+0x43/0x1b0 fs/nilfs2/super.c:498
generic_shutdown_super+0x159/0x3d0 fs/super.c:642
kill_block_super+0x3b/0x90 fs/super.c:1676
deactivate_locked_super+0xbe/0x1a0 fs/super.c:473
deactivate_super+0xde/0x100 fs/super.c:506
cleanup_mnt+0x222/0x450 fs/namespace.c:1267
task_work_run+0x14e/0x250 kernel/task_work.c:180
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x278/0x2a0 kernel/entry/common.c:218
do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f70d447e217
Code: b0 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 b0 ff ff ff f7 d8 64 89 02 b8
RSP: 002b:00007ffe311ba288 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000000000000064 RCX: 00007f70d447e217
RDX: 0000000000000200 RSI: 0000000000000009 RDI: 00007ffe311bb430
RBP: 00007f70d44c8336 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000100 R11: 0000000000000202 R12: 00007ffe311bb430
R13: 00007f70d44c8336 R14: 000055556c846430 R15: 0000000000000005
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:__filemap_add_folio+0xd1d/0xe80 mm/filemap.c:873
Code: 37 8b 4c 89 f7 e8 23 68 10 00 90 0f 0b e8 9b 14 ce ff 48 c7 c6 e0 92 37 8b 4c 89 f7 e8 0c 68 10 00 90 0f 0b e8 84 14 ce ff 90 <0f> 0b e8 7c 14 ce ff 90 0f 0b 90 e9 24 fb ff ff e8 6e 14 ce ff 48
RSP: 0018:ffffc900035773f0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff81bfc8cd
RDX: ffff888023052440 RSI: ffffffff81bfd0cc RDI: 0000000000000001
RBP: ffff88803233a9f0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000003 R12: ffffc90003577468
R13: 0000000000000000 R14: ffffea0000b3f7c0 R15: 0000000000000000
FS: 000055556c846480(0000) GS:ffff88806b000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f70d45a8000 CR3: 000000001ae02000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


Tested on:

commit: 83a7eefe Linux 6.10-rc3
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=15608256980000
kernel config: https://syzkaller.appspot.com/x/.config?x=b8786f381e62940f
dashboard link: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=147bb012980000