2022-12-05 10:11:21

by syzbot

[permalink] [raw]
Subject: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

Hello,

syzbot found the following issue on:

HEAD commit: 0ba09b173387 Revert "mm: align larger anonymous mappings o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1736cf4b880000
kernel config: https://syzkaller.appspot.com/x/.config?x=2325e409a9a893e1
dashboard link: https://syzkaller.appspot.com/bug?extid=912776840162c13db1a3
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/9758ec2c06f4/disk-0ba09b17.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/06781dbfd581/vmlinux-0ba09b17.xz
kernel image: https://storage.googleapis.com/syzbot-assets/3d44a22d15fa/bzImage-0ba09b17.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KASAN: use-after-free in xfs_dquot_type fs/xfs/xfs_dquot.h:136 [inline]
BUG: KASAN: use-after-free in xfs_qm_dqfree_one+0x12f/0x170 fs/xfs/xfs_qm.c:1604
Read of size 1 at addr ffff88807ed63a98 by task syz-executor.2/22148

CPU: 1 PID: 22148 Comm: syz-executor.2 Not tainted 6.1.0-rc7-syzkaller-00211-g0ba09b173387 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
print_address_description+0x74/0x340 mm/kasan/report.c:284
print_report+0x107/0x1f0 mm/kasan/report.c:395
kasan_report+0xcd/0x100 mm/kasan/report.c:495
xfs_dquot_type fs/xfs/xfs_dquot.h:136 [inline]
xfs_qm_dqfree_one+0x12f/0x170 fs/xfs/xfs_qm.c:1604
xfs_qm_shrink_scan+0x351/0x410 fs/xfs/xfs_qm.c:523
do_shrink_slab+0x4e1/0xa00 mm/vmscan.c:842
shrink_slab+0x1e6/0x340 mm/vmscan.c:1002
drop_slab_node mm/vmscan.c:1037 [inline]
drop_slab+0x185/0x2c0 mm/vmscan.c:1047
drop_caches_sysctl_handler+0xb1/0x160 fs/drop_caches.c:66
proc_sys_call_handler+0x576/0x890 fs/proc/proc_sysctl.c:604
do_iter_write+0x6c2/0xc20 fs/read_write.c:861
iter_file_splice_write+0x7fc/0xfc0 fs/splice.c:686
do_splice_from fs/splice.c:764 [inline]
direct_splice_actor+0xe6/0x1c0 fs/splice.c:931
splice_direct_to_actor+0x4e4/0xc00 fs/splice.c:886
do_splice_direct+0x279/0x3d0 fs/splice.c:974
do_sendfile+0x5fb/0xf80 fs/read_write.c:1255
__do_sys_sendfile64 fs/read_write.c:1317 [inline]
__se_sys_sendfile64+0xd0/0x1b0 fs/read_write.c:1309
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7eff8be8c0d9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007eff8cbb7168 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 00007eff8bfabf80 RCX: 00007eff8be8c0d9
RDX: 0000000020002080 RSI: 0000000000000004 RDI: 0000000000000006
RBP: 00007eff8bee7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000870 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd5ac99a0f R14: 00007eff8cbb7300 R15: 0000000000022000
</TASK>

Allocated by task 22095:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x3d/0x60 mm/kasan/common.c:52
__kasan_slab_alloc+0x65/0x70 mm/kasan/common.c:325
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slab.h:737 [inline]
slab_alloc_node mm/slub.c:3398 [inline]
slab_alloc mm/slub.c:3406 [inline]
__kmem_cache_alloc_lru mm/slub.c:3413 [inline]
kmem_cache_alloc+0x1cc/0x300 mm/slub.c:3422
kmem_cache_zalloc include/linux/slab.h:679 [inline]
xfs_dquot_alloc+0x36/0x600 fs/xfs/xfs_dquot.c:475
xfs_qm_dqread+0x8a/0x1d0 fs/xfs/xfs_dquot.c:659
xfs_qm_dqget+0x27d/0x4f0 fs/xfs/xfs_dquot.c:870
xfs_qm_vop_dqalloc+0x9bf/0xca0 fs/xfs/xfs_qm.c:1704
xfs_setattr_nonsize+0x3c2/0xfd0 fs/xfs/xfs_iops.c:702
xfs_vn_setattr+0x2f5/0x340 fs/xfs/xfs_iops.c:1022
notify_change+0xe38/0x10f0 fs/attr.c:420
chown_common+0x586/0x8f0 fs/open.c:736
do_fchownat+0x165/0x240 fs/open.c:767
__do_sys_lchown fs/open.c:792 [inline]
__se_sys_lchown fs/open.c:790 [inline]
__x64_sys_lchown+0x81/0x90 fs/open.c:790
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 3661:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x3d/0x60 mm/kasan/common.c:52
kasan_save_free_info+0x27/0x40 mm/kasan/generic.c:511
____kasan_slab_free+0xd6/0x120 mm/kasan/common.c:236
kasan_slab_free include/linux/kasan.h:177 [inline]
slab_free_hook mm/slub.c:1724 [inline]
slab_free_freelist_hook+0x12e/0x1a0 mm/slub.c:1750
slab_free mm/slub.c:3661 [inline]
kmem_cache_free+0x94/0x1d0 mm/slub.c:3683
xfs_qm_dqpurge+0x4f7/0x660 fs/xfs/xfs_qm.c:177
xfs_qm_dquot_walk+0x249/0x490 fs/xfs/xfs_qm.c:87
xfs_qm_dqpurge_all fs/xfs/xfs_qm.c:193 [inline]
xfs_qm_unmount+0x71/0x100 fs/xfs/xfs_qm.c:205
xfs_unmountfs+0xc5/0x1e0 fs/xfs/xfs_mount.c:1059
xfs_fs_put_super+0x6e/0x2d0 fs/xfs/xfs_super.c:1115
generic_shutdown_super+0x130/0x310 fs/super.c:492
kill_block_super+0x79/0xd0 fs/super.c:1428
deactivate_locked_super+0xa7/0xf0 fs/super.c:332
cleanup_mnt+0x494/0x520 fs/namespace.c:1186
task_work_run+0x243/0x300 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop+0x124/0x150 kernel/entry/common.c:171
exit_to_user_mode_prepare+0xb2/0x140 kernel/entry/common.c:203
__syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
syscall_exit_to_user_mode+0x26/0x60 kernel/entry/common.c:296
do_syscall_64+0x49/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The buggy address belongs to the object at ffff88807ed63a80
which belongs to the cache xfs_dquot of size 704
The buggy address is located 24 bytes inside of
704-byte region [ffff88807ed63a80, ffff88807ed63d40)

The buggy address belongs to the physical page:
page:ffffea0001fb5800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7ed60
head:ffffea0001fb5800 order:2 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 ffffea0001a74500 dead000000000003 ffff88801c6f3a00
raw: 0000000000000000 0000000080130013 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 18894, tgid 18893 (syz-executor.0), ts 751185457243, free_ts 748684720140
prep_new_page mm/page_alloc.c:2539 [inline]
get_page_from_freelist+0x742/0x7c0 mm/page_alloc.c:4291
__alloc_pages+0x259/0x560 mm/page_alloc.c:5558
alloc_slab_page+0xbd/0x190 mm/slub.c:1794
allocate_slab+0x5e/0x4b0 mm/slub.c:1939
new_slab mm/slub.c:1992 [inline]
___slab_alloc+0x782/0xe20 mm/slub.c:3180
__slab_alloc mm/slub.c:3279 [inline]
slab_alloc_node mm/slub.c:3364 [inline]
slab_alloc mm/slub.c:3406 [inline]
__kmem_cache_alloc_lru mm/slub.c:3413 [inline]
kmem_cache_alloc+0x24c/0x300 mm/slub.c:3422
kmem_cache_zalloc include/linux/slab.h:679 [inline]
xfs_dquot_alloc+0x36/0x600 fs/xfs/xfs_dquot.c:475
xfs_qm_dqread+0x8a/0x1d0 fs/xfs/xfs_dquot.c:659
xfs_qm_dqget_inode+0x430/0x960 fs/xfs/xfs_dquot.c:973
xfs_qm_dqattach_one+0xe8/0x1c0 fs/xfs/xfs_qm.c:277
xfs_qm_dqattach_locked+0x3ed/0x4a0 fs/xfs/xfs_qm.c:336
xfs_qm_vop_dqalloc+0x3f2/0xca0 fs/xfs/xfs_qm.c:1659
xfs_setattr_nonsize+0x3c2/0xfd0 fs/xfs/xfs_iops.c:702
xfs_vn_setattr+0x2f5/0x340 fs/xfs/xfs_iops.c:1022
notify_change+0xe38/0x10f0 fs/attr.c:420
chown_common+0x586/0x8f0 fs/open.c:736
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1459 [inline]
free_pcp_prepare+0x80c/0x8f0 mm/page_alloc.c:1509
free_unref_page_prepare mm/page_alloc.c:3387 [inline]
free_unref_page_list+0xb4/0x7b0 mm/page_alloc.c:3529
release_pages+0x232a/0x25c0 mm/swap.c:1055
__pagevec_release+0x7d/0xf0 mm/swap.c:1075
pagevec_release include/linux/pagevec.h:71 [inline]
folio_batch_release include/linux/pagevec.h:135 [inline]
truncate_inode_pages_range+0x472/0x17f0 mm/truncate.c:373
kill_bdev block/bdev.c:76 [inline]
blkdev_flush_mapping+0x153/0x2c0 block/bdev.c:662
blkdev_put_whole block/bdev.c:693 [inline]
blkdev_put+0x4a5/0x730 block/bdev.c:953
deactivate_locked_super+0xa7/0xf0 fs/super.c:332
cleanup_mnt+0x494/0x520 fs/namespace.c:1186
task_work_run+0x243/0x300 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop+0x124/0x150 kernel/entry/common.c:171
exit_to_user_mode_prepare+0xb2/0x140 kernel/entry/common.c:203
__syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
syscall_exit_to_user_mode+0x26/0x60 kernel/entry/common.c:296
do_syscall_64+0x49/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Memory state around the buggy address:
ffff88807ed63980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88807ed63a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88807ed63a80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88807ed63b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88807ed63b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2022-12-05 10:41:12

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

syzbot has found a reproducer for the following issue on:

HEAD commit: 0ba09b173387 Revert "mm: align larger anonymous mappings o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15550c47880000
kernel config: https://syzkaller.appspot.com/x/.config?x=2325e409a9a893e1
dashboard link: https://syzkaller.appspot.com/bug?extid=912776840162c13db1a3
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=128c9e23880000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/9758ec2c06f4/disk-0ba09b17.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/06781dbfd581/vmlinux-0ba09b17.xz
kernel image: https://storage.googleapis.com/syzbot-assets/3d44a22d15fa/bzImage-0ba09b17.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/335889b2d730/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

XFS (loop1): Quotacheck: Done.
syz-executor.1 (4657): drop_caches: 2
==================================================================
BUG: KASAN: use-after-free in xfs_dquot_type fs/xfs/xfs_dquot.h:136 [inline]
BUG: KASAN: use-after-free in xfs_qm_dqfree_one+0x12f/0x170 fs/xfs/xfs_qm.c:1604
Read of size 1 at addr ffff888079a6aa58 by task syz-executor.1/4657

CPU: 1 PID: 4657 Comm: syz-executor.1 Not tainted 6.1.0-rc7-syzkaller-00211-g0ba09b173387 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
print_address_description+0x74/0x340 mm/kasan/report.c:284
print_report+0x107/0x1f0 mm/kasan/report.c:395
kasan_report+0xcd/0x100 mm/kasan/report.c:495
xfs_dquot_type fs/xfs/xfs_dquot.h:136 [inline]
xfs_qm_dqfree_one+0x12f/0x170 fs/xfs/xfs_qm.c:1604
xfs_qm_shrink_scan+0x351/0x410 fs/xfs/xfs_qm.c:523
do_shrink_slab+0x4e1/0xa00 mm/vmscan.c:842
shrink_slab+0x1e6/0x340 mm/vmscan.c:1002
drop_slab_node mm/vmscan.c:1037 [inline]
drop_slab+0x185/0x2c0 mm/vmscan.c:1047
drop_caches_sysctl_handler+0xb1/0x160 fs/drop_caches.c:66
proc_sys_call_handler+0x576/0x890 fs/proc/proc_sysctl.c:604
do_iter_write+0x6c2/0xc20 fs/read_write.c:861
iter_file_splice_write+0x7fc/0xfc0 fs/splice.c:686
do_splice_from fs/splice.c:764 [inline]
direct_splice_actor+0xe6/0x1c0 fs/splice.c:931
splice_direct_to_actor+0x4e4/0xc00 fs/splice.c:886
do_splice_direct+0x279/0x3d0 fs/splice.c:974
do_sendfile+0x5fb/0xf80 fs/read_write.c:1255
__do_sys_sendfile64 fs/read_write.c:1317 [inline]
__se_sys_sendfile64+0xd0/0x1b0 fs/read_write.c:1309
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fc3e3c8c0d9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fc3e4a9a168 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 00007fc3e3dabf80 RCX: 00007fc3e3c8c0d9
RDX: 0000000020002080 RSI: 0000000000000004 RDI: 0000000000000006
RBP: 00007fc3e3ce7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000870 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fffcb98dc7f R14: 00007fc3e4a9a300 R15: 0000000000022000
</TASK>

Allocated by task 4642:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x3d/0x60 mm/kasan/common.c:52
__kasan_slab_alloc+0x65/0x70 mm/kasan/common.c:325
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slab.h:737 [inline]
slab_alloc_node mm/slub.c:3398 [inline]
slab_alloc mm/slub.c:3406 [inline]
__kmem_cache_alloc_lru mm/slub.c:3413 [inline]
kmem_cache_alloc+0x1cc/0x300 mm/slub.c:3422
kmem_cache_zalloc include/linux/slab.h:679 [inline]
xfs_dquot_alloc+0x36/0x600 fs/xfs/xfs_dquot.c:475
xfs_qm_dqread+0x8a/0x1d0 fs/xfs/xfs_dquot.c:659
xfs_qm_dqget+0x27d/0x4f0 fs/xfs/xfs_dquot.c:870
xfs_qm_vop_dqalloc+0x9bf/0xca0 fs/xfs/xfs_qm.c:1704
xfs_setattr_nonsize+0x3c2/0xfd0 fs/xfs/xfs_iops.c:702
xfs_vn_setattr+0x2f5/0x340 fs/xfs/xfs_iops.c:1022
notify_change+0xe38/0x10f0 fs/attr.c:420
chown_common+0x586/0x8f0 fs/open.c:736
do_fchownat+0x165/0x240 fs/open.c:767
__do_sys_chown fs/open.c:787 [inline]
__se_sys_chown fs/open.c:785 [inline]
__x64_sys_chown+0x7e/0x90 fs/open.c:785
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 3677:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x3d/0x60 mm/kasan/common.c:52
kasan_save_free_info+0x27/0x40 mm/kasan/generic.c:511
____kasan_slab_free+0xd6/0x120 mm/kasan/common.c:236
kasan_slab_free include/linux/kasan.h:177 [inline]
slab_free_hook mm/slub.c:1724 [inline]
slab_free_freelist_hook+0x12e/0x1a0 mm/slub.c:1750
slab_free mm/slub.c:3661 [inline]
kmem_cache_free+0x94/0x1d0 mm/slub.c:3683
xfs_qm_dqpurge+0x4f7/0x660 fs/xfs/xfs_qm.c:177
xfs_qm_dquot_walk+0x249/0x490 fs/xfs/xfs_qm.c:87
xfs_qm_dqpurge_all fs/xfs/xfs_qm.c:193 [inline]
xfs_qm_unmount+0x71/0x100 fs/xfs/xfs_qm.c:205
xfs_unmountfs+0xc5/0x1e0 fs/xfs/xfs_mount.c:1059
xfs_fs_put_super+0x6e/0x2d0 fs/xfs/xfs_super.c:1115
generic_shutdown_super+0x130/0x310 fs/super.c:492
kill_block_super+0x79/0xd0 fs/super.c:1428
deactivate_locked_super+0xa7/0xf0 fs/super.c:332
cleanup_mnt+0x494/0x520 fs/namespace.c:1186
task_work_run+0x243/0x300 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop+0x124/0x150 kernel/entry/common.c:171
exit_to_user_mode_prepare+0xb2/0x140 kernel/entry/common.c:203
__syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
syscall_exit_to_user_mode+0x26/0x60 kernel/entry/common.c:296
do_syscall_64+0x49/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The buggy address belongs to the object at ffff888079a6aa40
which belongs to the cache xfs_dquot of size 704
The buggy address is located 24 bytes inside of
704-byte region [ffff888079a6aa40, ffff888079a6ad00)

The buggy address belongs to the physical page:
page:ffffea0001e69a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x79a68
head:ffffea0001e69a00 order:2 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 0000000000000000 dead000000000122 ffff88814660f000
raw: 0000000000000000 0000000080130013 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 33, tgid 33 (kworker/u4:2), ts 336553992665, free_ts 335702258247
prep_new_page mm/page_alloc.c:2539 [inline]
get_page_from_freelist+0x742/0x7c0 mm/page_alloc.c:4291
__alloc_pages+0x259/0x560 mm/page_alloc.c:5558
alloc_slab_page+0xbd/0x190 mm/slub.c:1794
allocate_slab+0x5e/0x4b0 mm/slub.c:1939
new_slab mm/slub.c:1992 [inline]
___slab_alloc+0x782/0xe20 mm/slub.c:3180
__slab_alloc mm/slub.c:3279 [inline]
slab_alloc_node mm/slub.c:3364 [inline]
slab_alloc mm/slub.c:3406 [inline]
__kmem_cache_alloc_lru mm/slub.c:3413 [inline]
kmem_cache_alloc+0x24c/0x300 mm/slub.c:3422
kmem_cache_zalloc include/linux/slab.h:679 [inline]
xfs_dquot_alloc+0x36/0x600 fs/xfs/xfs_dquot.c:475
xfs_qm_dqread+0x8a/0x1d0 fs/xfs/xfs_dquot.c:659
xfs_qm_dqget+0x27d/0x4f0 fs/xfs/xfs_dquot.c:870
xfs_qm_quotacheck_dqadjust+0xb7/0x380 fs/xfs/xfs_qm.c:1077
xfs_qm_dqusage_adjust+0x4bd/0x630 fs/xfs/xfs_qm.c:1189
xfs_iwalk_ag_recs+0x425/0x620 fs/xfs/xfs_iwalk.c:220
xfs_iwalk_run_callbacks+0x20f/0x410 fs/xfs/xfs_iwalk.c:376
xfs_iwalk_ag+0xaa5/0xb80 fs/xfs/xfs_iwalk.c:482
xfs_iwalk_ag_work+0xf5/0x1a0 fs/xfs/xfs_iwalk.c:624
xfs_pwork_work+0x7f/0x180 fs/xfs/xfs_pwork.c:47
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1459 [inline]
free_pcp_prepare+0x80c/0x8f0 mm/page_alloc.c:1509
free_unref_page_prepare mm/page_alloc.c:3387 [inline]
free_unref_page+0x7d/0x5f0 mm/page_alloc.c:3483
__stack_depot_save+0x430/0x4a0 lib/stackdepot.c:506
kasan_save_stack mm/kasan/common.c:46 [inline]
kasan_set_track+0x52/0x60 mm/kasan/common.c:52
kasan_save_free_info+0x27/0x40 mm/kasan/generic.c:511
____kasan_slab_free+0xd6/0x120 mm/kasan/common.c:236
kasan_slab_free include/linux/kasan.h:177 [inline]
slab_free_hook mm/slub.c:1724 [inline]
slab_free_freelist_hook+0x12e/0x1a0 mm/slub.c:1750
slab_free mm/slub.c:3661 [inline]
__kmem_cache_free+0x71/0x110 mm/slub.c:3674
memcg_free_slab_cgroups mm/slab.h:456 [inline]
unaccount_slab mm/slab.h:645 [inline]
__free_slab+0xf0/0x320 mm/slub.c:2015
qlist_free_all+0x2b/0x70 mm/kasan/quarantine.c:187
kasan_quarantine_reduce+0x169/0x180 mm/kasan/quarantine.c:294
__kasan_slab_alloc+0x1f/0x70 mm/kasan/common.c:302
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slab.h:737 [inline]
slab_alloc_node mm/slub.c:3398 [inline]
__kmem_cache_alloc_node+0x1d7/0x310 mm/slub.c:3437
__do_kmalloc_node mm/slab_common.c:954 [inline]
__kmalloc+0x9e/0x1a0 mm/slab_common.c:968
kmalloc include/linux/slab.h:558 [inline]
tomoyo_realpath_from_path+0xcd/0x5f0 security/tomoyo/realpath.c:251
tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
tomoyo_path_perm+0x227/0x670 security/tomoyo/file.c:822

Memory state around the buggy address:
ffff888079a6a900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888079a6a980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff888079a6aa00: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
^
ffff888079a6aa80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888079a6ab00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

2022-12-05 17:10:35

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4111 } 2640 jiffies s: 2849 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit: 0ba09b17 Revert "mm: align larger anonymous mappings o..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=135dc11d880000
kernel config: https://syzkaller.appspot.com/x/.config?x=2325e409a9a893e1
dashboard link: https://syzkaller.appspot.com/bug?extid=912776840162c13db1a3
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=1551f50f880000

2022-12-05 23:18:49

by Dave Chinner

[permalink] [raw]
Subject: [PATCH] xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING

On Mon, Dec 05, 2022 at 02:35:39AM -0800, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: 0ba09b173387 Revert "mm: align larger anonymous mappings o..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15550c47880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=2325e409a9a893e1
> dashboard link: https://syzkaller.appspot.com/bug?extid=912776840162c13db1a3
> compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=128c9e23880000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/9758ec2c06f4/disk-0ba09b17.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/06781dbfd581/vmlinux-0ba09b17.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/3d44a22d15fa/bzImage-0ba09b17.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/335889b2d730/mount_0.gz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> XFS (loop1): Quotacheck: Done.
> syz-executor.1 (4657): drop_caches: 2
> ==================================================================
> BUG: KASAN: use-after-free in xfs_dquot_type fs/xfs/xfs_dquot.h:136 [inline]
> BUG: KASAN: use-after-free in xfs_qm_dqfree_one+0x12f/0x170 fs/xfs/xfs_qm.c:1604
> Read of size 1 at addr ffff888079a6aa58 by task syz-executor.1/4657

Looks like we've missed a XFS_DQUOT_FREEING check in
xfs_qm_shrink_scan(), and the dquot purge run by unmount has raced
with the shrinker. Patch below should fix it.

-Dave.
--
Dave Chinner
[email protected]

xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING

From: Dave Chinner <[email protected]>

Resulting in a UAF if the shrinker races with some other dquot
freeing mechanism that sets XFS_DQFLAG_FREEING before the dquot is
removed from the LRU. This can occur if a dquot purge races with
drop_caches.

Reported-by: [email protected]
Signed-off-by: Dave Chinner <[email protected]>
---
fs/xfs/xfs_qm.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 18bb4ec4d7c9..ff53d40a2dae 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -422,6 +422,14 @@ xfs_qm_dquot_isolate(
if (!xfs_dqlock_nowait(dqp))
goto out_miss_busy;

+ /*
+ * If something else is freeing this dquot and hasn't yet removed it
+ * from the LRU, leave it for the freeing task to complete the freeing
+ * process rather than risk it being free from under us here.
+ */
+ if (dqp->q_flags & XFS_DQFLAG_FREEING)
+ goto out_miss_unlock;
+
/*
* This dquot has acquired a reference in the meantime remove it from
* the freelist and try again.
@@ -441,10 +449,8 @@ xfs_qm_dquot_isolate(
* skip it so there is time for the IO to complete before we try to
* reclaim it again on the next LRU pass.
*/
- if (!xfs_dqflock_nowait(dqp)) {
- xfs_dqunlock(dqp);
- goto out_miss_busy;
- }
+ if (!xfs_dqflock_nowait(dqp))
+ goto out_miss_unlock;

if (XFS_DQ_IS_DIRTY(dqp)) {
struct xfs_buf *bp = NULL;
@@ -478,6 +484,8 @@ xfs_qm_dquot_isolate(
XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaims);
return LRU_REMOVED;

+out_miss_unlock:
+ xfs_dqunlock(dqp);
out_miss_busy:
trace_xfs_dqreclaim_busy(dqp);
XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses);

2022-12-06 00:19:22

by Dave Chinner

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

On Mon, Dec 05, 2022 at 02:35:39AM -0800, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: 0ba09b173387 Revert "mm: align larger anonymous mappings o..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15550c47880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=2325e409a9a893e1
> dashboard link: https://syzkaller.appspot.com/bug?extid=912776840162c13db1a3
> compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=128c9e23880000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/9758ec2c06f4/disk-0ba09b17.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/06781dbfd581/vmlinux-0ba09b17.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/3d44a22d15fa/bzImage-0ba09b17.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/335889b2d730/mount_0.gz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master


xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING

From: Dave Chinner <[email protected]>

Resulting in a UAF if the shrinker races with some other dquot
freeing mechanism that sets XFS_DQFLAG_FREEING before the dquot is
removed from the LRU. This can occur if a dquot purge races with
drop_caches.

Reported-by: [email protected]
Signed-off-by: Dave Chinner <[email protected]>
---
fs/xfs/xfs_qm.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 18bb4ec4d7c9..ff53d40a2dae 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -422,6 +422,14 @@ xfs_qm_dquot_isolate(
if (!xfs_dqlock_nowait(dqp))
goto out_miss_busy;

+ /*
+ * If something else is freeing this dquot and hasn't yet removed it
+ * from the LRU, leave it for the freeing task to complete the freeing
+ * process rather than risk it being free from under us here.
+ */
+ if (dqp->q_flags & XFS_DQFLAG_FREEING)
+ goto out_miss_unlock;
+
/*
* This dquot has acquired a reference in the meantime remove it from
* the freelist and try again.
@@ -441,10 +449,8 @@ xfs_qm_dquot_isolate(
* skip it so there is time for the IO to complete before we try to
* reclaim it again on the next LRU pass.
*/
- if (!xfs_dqflock_nowait(dqp)) {
- xfs_dqunlock(dqp);
- goto out_miss_busy;
- }
+ if (!xfs_dqflock_nowait(dqp))
+ goto out_miss_unlock;

if (XFS_DQ_IS_DIRTY(dqp)) {
struct xfs_buf *bp = NULL;
@@ -478,6 +484,8 @@ xfs_qm_dquot_isolate(
XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaims);
return LRU_REMOVED;

+out_miss_unlock:
+ xfs_dqunlock(dqp);
out_miss_busy:
trace_xfs_dqreclaim_busy(dqp);
XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses);

2022-12-06 03:34:10

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4122 } 2641 jiffies s: 2877 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit: bce93322 proc: proc_skip_spaces() shouldn't think it i..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=1566216b880000
kernel config: https://syzkaller.appspot.com/x/.config?x=d58e7fe7f9cf5e24
dashboard link: https://syzkaller.appspot.com/bug?extid=912776840162c13db1a3
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=164cad83880000

2022-12-06 04:13:17

by Dave Chinner

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

On Mon, Dec 05, 2022 at 07:12:15PM -0800, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> INFO: rcu detected stall in corrupted
>
> rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4122 } 2641 jiffies s: 2877 root: 0x0/T
> rcu: blocking rcu_node structures (internal RCU debug):

I'm pretty sure this has nothing to do with the reproducer - the
console log here:

> Tested on:
>
> commit: bce93322 proc: proc_skip_spaces() shouldn't think it i..
> git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> console output: https://syzkaller.appspot.com/x/log.txt?x=1566216b880000

indicates that syzbot is screwing around with bluetooth, HCI,
netdevsim, bridging, bonding, etc.

There's no evidence that it actually ran the reproducer for the bug
reported in this thread - there's no record of a single XFS
filesystem being mounted in the log....

It look slike someone else also tried a private patch to fix this
problem (which was obviously broken) and it failed with exactly the
same RCU warnings. That was run from the same commit id as the
original reproducer, so this looks like either syzbot is broken or
there's some other completely unrelated problem that syzbot is
tripping over here.

Over to the syzbot people to debug the syzbot failure....

-Dave.

--
Dave Chinner
[email protected]

2022-12-06 11:24:06

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

On Tue, 6 Dec 2022 at 04:34, Dave Chinner <[email protected]> wrote:
>
> On Mon, Dec 05, 2022 at 07:12:15PM -0800, syzbot wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > INFO: rcu detected stall in corrupted
> >
> > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4122 } 2641 jiffies s: 2877 root: 0x0/T
> > rcu: blocking rcu_node structures (internal RCU debug):
>
> I'm pretty sure this has nothing to do with the reproducer - the
> console log here:
>
> > Tested on:
> >
> > commit: bce93322 proc: proc_skip_spaces() shouldn't think it i..
> > git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1566216b880000
>
> indicates that syzbot is screwing around with bluetooth, HCI,
> netdevsim, bridging, bonding, etc.
>
> There's no evidence that it actually ran the reproducer for the bug
> reported in this thread - there's no record of a single XFS
> filesystem being mounted in the log....
>
> It look slike someone else also tried a private patch to fix this
> problem (which was obviously broken) and it failed with exactly the
> same RCU warnings. That was run from the same commit id as the
> original reproducer, so this looks like either syzbot is broken or
> there's some other completely unrelated problem that syzbot is
> tripping over here.
>
> Over to the syzbot people to debug the syzbot failure....

Hi Dave,

It's not uncommon for a single program to trigger multiple bugs.
That's what happens here. The rcu stall issue is reproducible with
this test program.
In such cases you can either submit more test requests, or test manually.

I think there is an RCU expedited stall detection.
For some reason CONFIG_RCU_EXP_CPU_STALL_TIMEOUT is limited to 21
seconds, and that's not enough for reliable flake-free stress testing.
We bump other timeouts to 100+ seconds.
+RCU maintainers, do you mind removing the overly restrictive limit on
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT?
Or you think there is something to fix in the kernel to not stall? I
see the test writes to
/proc/sys/vm/drop_caches, maybe there is some issue in that code.

2022-12-06 15:52:43

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

On Tue, Dec 06, 2022 at 12:06:10PM +0100, Dmitry Vyukov wrote:
> On Tue, 6 Dec 2022 at 04:34, Dave Chinner <[email protected]> wrote:
> >
> > On Mon, Dec 05, 2022 at 07:12:15PM -0800, syzbot wrote:
> > > Hello,
> > >
> > > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > > INFO: rcu detected stall in corrupted
> > >
> > > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4122 } 2641 jiffies s: 2877 root: 0x0/T
> > > rcu: blocking rcu_node structures (internal RCU debug):
> >
> > I'm pretty sure this has nothing to do with the reproducer - the
> > console log here:
> >
> > > Tested on:
> > >
> > > commit: bce93322 proc: proc_skip_spaces() shouldn't think it i..
> > > git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=1566216b880000
> >
> > indicates that syzbot is screwing around with bluetooth, HCI,
> > netdevsim, bridging, bonding, etc.
> >
> > There's no evidence that it actually ran the reproducer for the bug
> > reported in this thread - there's no record of a single XFS
> > filesystem being mounted in the log....
> >
> > It look slike someone else also tried a private patch to fix this
> > problem (which was obviously broken) and it failed with exactly the
> > same RCU warnings. That was run from the same commit id as the
> > original reproducer, so this looks like either syzbot is broken or
> > there's some other completely unrelated problem that syzbot is
> > tripping over here.
> >
> > Over to the syzbot people to debug the syzbot failure....
>
> Hi Dave,
>
> It's not uncommon for a single program to trigger multiple bugs.
> That's what happens here. The rcu stall issue is reproducible with
> this test program.
> In such cases you can either submit more test requests, or test manually.
>
> I think there is an RCU expedited stall detection.
> For some reason CONFIG_RCU_EXP_CPU_STALL_TIMEOUT is limited to 21
> seconds, and that's not enough for reliable flake-free stress testing.
> We bump other timeouts to 100+ seconds.
> +RCU maintainers, do you mind removing the overly restrictive limit on
> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT?
> Or you think there is something to fix in the kernel to not stall? I
> see the test writes to
> /proc/sys/vm/drop_caches, maybe there is some issue in that code.

Like this?

If so, I don't see why not. And in that case, may I please have
your Tested-by or similar?

At the same time, I am sure that there are things in the kernel that
should be adjusted to avoid stalls, but I recognize that different
developers in different situations will have different issues that they
choose to focus on. ;-)

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 49da904df6aa6..2984de629f749 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -82,7 +82,7 @@ config RCU_CPU_STALL_TIMEOUT
config RCU_EXP_CPU_STALL_TIMEOUT
int "Expedited RCU CPU stall timeout in milliseconds"
depends on RCU_STALL_COMMON
- range 0 21000
+ range 0 300000
default 0
help
If a given expedited RCU grace period extends more than the

2022-12-06 16:35:12

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

On Tue, 6 Dec 2022 at 16:32, Paul E. McKenney <[email protected]> wrote:
>
> On Tue, Dec 06, 2022 at 12:06:10PM +0100, Dmitry Vyukov wrote:
> > On Tue, 6 Dec 2022 at 04:34, Dave Chinner <[email protected]> wrote:
> > >
> > > On Mon, Dec 05, 2022 at 07:12:15PM -0800, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > > > INFO: rcu detected stall in corrupted
> > > >
> > > > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4122 } 2641 jiffies s: 2877 root: 0x0/T
> > > > rcu: blocking rcu_node structures (internal RCU debug):
> > >
> > > I'm pretty sure this has nothing to do with the reproducer - the
> > > console log here:
> > >
> > > > Tested on:
> > > >
> > > > commit: bce93322 proc: proc_skip_spaces() shouldn't think it i..
> > > > git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1566216b880000
> > >
> > > indicates that syzbot is screwing around with bluetooth, HCI,
> > > netdevsim, bridging, bonding, etc.
> > >
> > > There's no evidence that it actually ran the reproducer for the bug
> > > reported in this thread - there's no record of a single XFS
> > > filesystem being mounted in the log....
> > >
> > > It look slike someone else also tried a private patch to fix this
> > > problem (which was obviously broken) and it failed with exactly the
> > > same RCU warnings. That was run from the same commit id as the
> > > original reproducer, so this looks like either syzbot is broken or
> > > there's some other completely unrelated problem that syzbot is
> > > tripping over here.
> > >
> > > Over to the syzbot people to debug the syzbot failure....
> >
> > Hi Dave,
> >
> > It's not uncommon for a single program to trigger multiple bugs.
> > That's what happens here. The rcu stall issue is reproducible with
> > this test program.
> > In such cases you can either submit more test requests, or test manually.
> >
> > I think there is an RCU expedited stall detection.
> > For some reason CONFIG_RCU_EXP_CPU_STALL_TIMEOUT is limited to 21
> > seconds, and that's not enough for reliable flake-free stress testing.
> > We bump other timeouts to 100+ seconds.
> > +RCU maintainers, do you mind removing the overly restrictive limit on
> > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT?
> > Or you think there is something to fix in the kernel to not stall? I
> > see the test writes to
> > /proc/sys/vm/drop_caches, maybe there is some issue in that code.
>
> Like this?
>
> If so, I don't see why not. And in that case, may I please have
> your Tested-by or similar?

I've tried with this patch and RCU_EXP_CPU_STALL_TIMEOUT=80000.
Running the test program I got some kernel BUG in XFS and no RCU
errors/warnings.

Tested-by: Dmitry Vyukov <[email protected]>

Thanks

> At the same time, I am sure that there are things in the kernel that
> should be adjusted to avoid stalls, but I recognize that different
> developers in different situations will have different issues that they
> choose to focus on. ;-)
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 49da904df6aa6..2984de629f749 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -82,7 +82,7 @@ config RCU_CPU_STALL_TIMEOUT
> config RCU_EXP_CPU_STALL_TIMEOUT
> int "Expedited RCU CPU stall timeout in milliseconds"
> depends on RCU_STALL_COMMON
> - range 0 21000
> + range 0 300000
> default 0
> help
> If a given expedited RCU grace period extends more than the

2022-12-06 18:37:24

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

On Tue, Dec 06, 2022 at 05:19:10PM +0100, Dmitry Vyukov wrote:
> On Tue, 6 Dec 2022 at 16:32, Paul E. McKenney <[email protected]> wrote:
> >
> > On Tue, Dec 06, 2022 at 12:06:10PM +0100, Dmitry Vyukov wrote:
> > > On Tue, 6 Dec 2022 at 04:34, Dave Chinner <[email protected]> wrote:
> > > >
> > > > On Mon, Dec 05, 2022 at 07:12:15PM -0800, syzbot wrote:
> > > > > Hello,
> > > > >
> > > > > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > > > > INFO: rcu detected stall in corrupted
> > > > >
> > > > > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4122 } 2641 jiffies s: 2877 root: 0x0/T
> > > > > rcu: blocking rcu_node structures (internal RCU debug):
> > > >
> > > > I'm pretty sure this has nothing to do with the reproducer - the
> > > > console log here:
> > > >
> > > > > Tested on:
> > > > >
> > > > > commit: bce93322 proc: proc_skip_spaces() shouldn't think it i..
> > > > > git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1566216b880000
> > > >
> > > > indicates that syzbot is screwing around with bluetooth, HCI,
> > > > netdevsim, bridging, bonding, etc.
> > > >
> > > > There's no evidence that it actually ran the reproducer for the bug
> > > > reported in this thread - there's no record of a single XFS
> > > > filesystem being mounted in the log....
> > > >
> > > > It look slike someone else also tried a private patch to fix this
> > > > problem (which was obviously broken) and it failed with exactly the
> > > > same RCU warnings. That was run from the same commit id as the
> > > > original reproducer, so this looks like either syzbot is broken or
> > > > there's some other completely unrelated problem that syzbot is
> > > > tripping over here.
> > > >
> > > > Over to the syzbot people to debug the syzbot failure....
> > >
> > > Hi Dave,
> > >
> > > It's not uncommon for a single program to trigger multiple bugs.
> > > That's what happens here. The rcu stall issue is reproducible with
> > > this test program.
> > > In such cases you can either submit more test requests, or test manually.
> > >
> > > I think there is an RCU expedited stall detection.
> > > For some reason CONFIG_RCU_EXP_CPU_STALL_TIMEOUT is limited to 21
> > > seconds, and that's not enough for reliable flake-free stress testing.
> > > We bump other timeouts to 100+ seconds.
> > > +RCU maintainers, do you mind removing the overly restrictive limit on
> > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT?
> > > Or you think there is something to fix in the kernel to not stall? I
> > > see the test writes to
> > > /proc/sys/vm/drop_caches, maybe there is some issue in that code.
> >
> > Like this?
> >
> > If so, I don't see why not. And in that case, may I please have
> > your Tested-by or similar?
>
> I've tried with this patch and RCU_EXP_CPU_STALL_TIMEOUT=80000.
> Running the test program I got some kernel BUG in XFS and no RCU
> errors/warnings.
>
> Tested-by: Dmitry Vyukov <[email protected]>

Applied, thank you both!

I expect to push this into the v6.3 merge window, that is, not the
one coming up real soon now, but the one after that.

Thanx, Paul

> Thanks
>
> > At the same time, I am sure that there are things in the kernel that
> > should be adjusted to avoid stalls, but I recognize that different
> > developers in different situations will have different issues that they
> > choose to focus on. ;-)
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> > index 49da904df6aa6..2984de629f749 100644
> > --- a/kernel/rcu/Kconfig.debug
> > +++ b/kernel/rcu/Kconfig.debug
> > @@ -82,7 +82,7 @@ config RCU_CPU_STALL_TIMEOUT
> > config RCU_EXP_CPU_STALL_TIMEOUT
> > int "Expedited RCU CPU stall timeout in milliseconds"
> > depends on RCU_STALL_COMMON
> > - range 0 21000
> > + range 0 300000
> > default 0
> > help
> > If a given expedited RCU grace period extends more than the

2022-12-06 21:08:05

by Dave Chinner

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

On Tue, Dec 06, 2022 at 05:19:10PM +0100, Dmitry Vyukov wrote:
> On Tue, 6 Dec 2022 at 16:32, Paul E. McKenney <[email protected]> wrote:
> >
> > On Tue, Dec 06, 2022 at 12:06:10PM +0100, Dmitry Vyukov wrote:
> > > On Tue, 6 Dec 2022 at 04:34, Dave Chinner <[email protected]> wrote:
> > > >
> > > > On Mon, Dec 05, 2022 at 07:12:15PM -0800, syzbot wrote:
> > > > > Hello,
> > > > >
> > > > > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > > > > INFO: rcu detected stall in corrupted
> > > > >
> > > > > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4122 } 2641 jiffies s: 2877 root: 0x0/T
> > > > > rcu: blocking rcu_node structures (internal RCU debug):
> > > >
> > > > I'm pretty sure this has nothing to do with the reproducer - the
> > > > console log here:
> > > >
> > > > > Tested on:
> > > > >
> > > > > commit: bce93322 proc: proc_skip_spaces() shouldn't think it i..
> > > > > git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1566216b880000
> > > >
> > > > indicates that syzbot is screwing around with bluetooth, HCI,
> > > > netdevsim, bridging, bonding, etc.
> > > >
> > > > There's no evidence that it actually ran the reproducer for the bug
> > > > reported in this thread - there's no record of a single XFS
> > > > filesystem being mounted in the log....
> > > >
> > > > It look slike someone else also tried a private patch to fix this
> > > > problem (which was obviously broken) and it failed with exactly the
> > > > same RCU warnings. That was run from the same commit id as the
> > > > original reproducer, so this looks like either syzbot is broken or
> > > > there's some other completely unrelated problem that syzbot is
> > > > tripping over here.
> > > >
> > > > Over to the syzbot people to debug the syzbot failure....
> > >
> > > Hi Dave,
> > >
> > > It's not uncommon for a single program to trigger multiple bugs.
> > > That's what happens here. The rcu stall issue is reproducible with
> > > this test program.
> > > In such cases you can either submit more test requests, or test manually.
> > >
> > > I think there is an RCU expedited stall detection.
> > > For some reason CONFIG_RCU_EXP_CPU_STALL_TIMEOUT is limited to 21
> > > seconds, and that's not enough for reliable flake-free stress testing.
> > > We bump other timeouts to 100+ seconds.
> > > +RCU maintainers, do you mind removing the overly restrictive limit on
> > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT?
> > > Or you think there is something to fix in the kernel to not stall? I
> > > see the test writes to
> > > /proc/sys/vm/drop_caches, maybe there is some issue in that code.
> >
> > Like this?
> >
> > If so, I don't see why not. And in that case, may I please have
> > your Tested-by or similar?
>
> I've tried with this patch and RCU_EXP_CPU_STALL_TIMEOUT=80000.
> Running the test program I got some kernel BUG in XFS and no RCU
> errors/warnings.

What BUG did it trigger? Where's the log?

-Dave.
--
Dave Chinner
[email protected]

2022-12-06 22:14:15

by Dave Chinner

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

On Tue, Dec 06, 2022 at 12:06:10PM +0100, Dmitry Vyukov wrote:
> On Tue, 6 Dec 2022 at 04:34, Dave Chinner <[email protected]> wrote:
> >
> > On Mon, Dec 05, 2022 at 07:12:15PM -0800, syzbot wrote:
> > > Hello,
> > >
> > > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > > INFO: rcu detected stall in corrupted
> > >
> > > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4122 } 2641 jiffies s: 2877 root: 0x0/T
> > > rcu: blocking rcu_node structures (internal RCU debug):
> >
> > I'm pretty sure this has nothing to do with the reproducer - the
> > console log here:
> >
> > > Tested on:
> > >
> > > commit: bce93322 proc: proc_skip_spaces() shouldn't think it i..
> > > git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=1566216b880000
> >
> > indicates that syzbot is screwing around with bluetooth, HCI,
> > netdevsim, bridging, bonding, etc.
> >
> > There's no evidence that it actually ran the reproducer for the bug
> > reported in this thread - there's no record of a single XFS
> > filesystem being mounted in the log....
> >
> > It look slike someone else also tried a private patch to fix this
> > problem (which was obviously broken) and it failed with exactly the
> > same RCU warnings. That was run from the same commit id as the
> > original reproducer, so this looks like either syzbot is broken or
> > there's some other completely unrelated problem that syzbot is
> > tripping over here.
> >
> > Over to the syzbot people to debug the syzbot failure....
>
> Hi Dave,
>
> It's not uncommon for a single program to trigger multiple bugs.
> That's what happens here. The rcu stall issue is reproducible with
> this test program.
> In such cases you can either submit more test requests, or test manually.

So you're telling us syzbot reproducers are unreliable and we are
expected to play whack-a-mole with test resubmission until we get
the result we want?

How do I tell syzbot to resubmit the same patch for testing without
having to send the same patch to syzbot via email again? Can I
retrigger a new test run through the web interface?

-Dave.
--
Dave Chinner
[email protected]

2022-12-07 16:25:30

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING

On Tue, Dec 06, 2022 at 09:52:46AM +1100, Dave Chinner wrote:
> On Mon, Dec 05, 2022 at 02:35:39AM -0800, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> >
> > HEAD commit: 0ba09b173387 Revert "mm: align larger anonymous mappings o..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15550c47880000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=2325e409a9a893e1
> > dashboard link: https://syzkaller.appspot.com/bug?extid=912776840162c13db1a3
> > compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=128c9e23880000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/9758ec2c06f4/disk-0ba09b17.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/06781dbfd581/vmlinux-0ba09b17.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/3d44a22d15fa/bzImage-0ba09b17.xz
> > mounted in repro: https://storage.googleapis.com/syzbot-assets/335889b2d730/mount_0.gz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
> >
> > XFS (loop1): Quotacheck: Done.
> > syz-executor.1 (4657): drop_caches: 2
> > ==================================================================
> > BUG: KASAN: use-after-free in xfs_dquot_type fs/xfs/xfs_dquot.h:136 [inline]
> > BUG: KASAN: use-after-free in xfs_qm_dqfree_one+0x12f/0x170 fs/xfs/xfs_qm.c:1604
> > Read of size 1 at addr ffff888079a6aa58 by task syz-executor.1/4657
>
> Looks like we've missed a XFS_DQUOT_FREEING check in
> xfs_qm_shrink_scan(), and the dquot purge run by unmount has raced
> with the shrinker. Patch below should fix it.
>
> -Dave.
> --
> Dave Chinner
> [email protected]
>
> xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING
>
> From: Dave Chinner <[email protected]>
>
> Resulting in a UAF if the shrinker races with some other dquot
> freeing mechanism that sets XFS_DQFLAG_FREEING before the dquot is
> removed from the LRU. This can occur if a dquot purge races with
> drop_caches.
>
> Reported-by: [email protected]
> Signed-off-by: Dave Chinner <[email protected]>

Please repost this as a toplevel thread so it doesn't get lost in the
depths. Anyway, this looks correct so:

Reviewed-by: Darrick J. Wong <[email protected]>

--D

> ---
> fs/xfs/xfs_qm.c | 16 ++++++++++++----
> 1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> index 18bb4ec4d7c9..ff53d40a2dae 100644
> --- a/fs/xfs/xfs_qm.c
> +++ b/fs/xfs/xfs_qm.c
> @@ -422,6 +422,14 @@ xfs_qm_dquot_isolate(
> if (!xfs_dqlock_nowait(dqp))
> goto out_miss_busy;
>
> + /*
> + * If something else is freeing this dquot and hasn't yet removed it
> + * from the LRU, leave it for the freeing task to complete the freeing
> + * process rather than risk it being free from under us here.
> + */
> + if (dqp->q_flags & XFS_DQFLAG_FREEING)
> + goto out_miss_unlock;
> +
> /*
> * This dquot has acquired a reference in the meantime remove it from
> * the freelist and try again.
> @@ -441,10 +449,8 @@ xfs_qm_dquot_isolate(
> * skip it so there is time for the IO to complete before we try to
> * reclaim it again on the next LRU pass.
> */
> - if (!xfs_dqflock_nowait(dqp)) {
> - xfs_dqunlock(dqp);
> - goto out_miss_busy;
> - }
> + if (!xfs_dqflock_nowait(dqp))
> + goto out_miss_unlock;
>
> if (XFS_DQ_IS_DIRTY(dqp)) {
> struct xfs_buf *bp = NULL;
> @@ -478,6 +484,8 @@ xfs_qm_dquot_isolate(
> XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaims);
> return LRU_REMOVED;
>
> +out_miss_unlock:
> + xfs_dqunlock(dqp);
> out_miss_busy:
> trace_xfs_dqreclaim_busy(dqp);
> XFS_STATS_INC(dqp->q_mount, xs_qm_dqreclaim_misses);

2022-12-09 04:47:34

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in xfs_qm_dqfree_one

On Fri, Dec 09, 2022 at 11:46:05AM +0800, Hillf Danton wrote:
> On 6 Dec 2022 07:32:11 -0800 "Paul E. McKenney" <[email protected]>
> > On Tue, Dec 06, 2022 at 12:06:10PM +0100, Dmitry Vyukov wrote:
> > > On Tue, 6 Dec 2022 at 04:34, Dave Chinner <[email protected]> wrote:
> > > >
> > > > On Mon, Dec 05, 2022 at 07:12:15PM -0800, syzbot wrote:
> > > > > Hello,
> > > > >
> > > > > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > > > > INFO: rcu detected stall in corrupted
> > > > >
> > > > > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P4122 } 2641 jiffies s: 2877 root: 0x0/T
> > > > > rcu: blocking rcu_node structures (internal RCU debug):
> > > >
> > > > I'm pretty sure this has nothing to do with the reproducer - the
> > > > console log here:
> > > >
> > > > > Tested on:
> > > > >
> > > > > commit: bce93322 proc: proc_skip_spaces() shouldn't think it i..
> > > > > git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1566216b880000
> > > >
> > > > indicates that syzbot is screwing around with bluetooth, HCI,
> > > > netdevsim, bridging, bonding, etc.
> > > >
> > > > There's no evidence that it actually ran the reproducer for the bug
> > > > reported in this thread - there's no record of a single XFS
> > > > filesystem being mounted in the log....
> > > >
> > > > It look slike someone else also tried a private patch to fix this
> > > > problem (which was obviously broken) and it failed with exactly the
> > > > same RCU warnings. That was run from the same commit id as the
> > > > original reproducer, so this looks like either syzbot is broken or
> > > > there's some other completely unrelated problem that syzbot is
> > > > tripping over here.
> > > >
> > > > Over to the syzbot people to debug the syzbot failure....
> > >
> > > Hi Dave,
> > >
> > > It's not uncommon for a single program to trigger multiple bugs.
> > > That's what happens here. The rcu stall issue is reproducible with
> > > this test program.
> > > In such cases you can either submit more test requests, or test manually.
> > >
> > > I think there is an RCU expedited stall detection.
> > > For some reason CONFIG_RCU_EXP_CPU_STALL_TIMEOUT is limited to 21
> > > seconds, and that's not enough for reliable flake-free stress testing.
> > > We bump other timeouts to 100+ seconds.
> > > +RCU maintainers, do you mind removing the overly restrictive limit on
> > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT?
> > > Or you think there is something to fix in the kernel to not stall? I
> > > see the test writes to
> > > /proc/sys/vm/drop_caches, maybe there is some issue in that code.
> >
> > Like this?
> >
> > If so, I don't see why not. And in that case, may I please have
> > your Tested-by or similar?
> >
> > At the same time, I am sure that there are things in the kernel that
> > should be adjusted to avoid stalls, but I recognize that different
> > developers in different situations will have different issues that they
> > choose to focus on. ;-)
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> > index 49da904df6aa6..2984de629f749 100644
> > --- a/kernel/rcu/Kconfig.debug
> > +++ b/kernel/rcu/Kconfig.debug
> > @@ -82,7 +82,7 @@ config RCU_CPU_STALL_TIMEOUT
> > config RCU_EXP_CPU_STALL_TIMEOUT
> > int "Expedited RCU CPU stall timeout in milliseconds"
> > depends on RCU_STALL_COMMON
> > - range 0 21000
> > + range 0 300000
> > default 0
> > help
> > If a given expedited RCU grace period extends more than the
> >
> // Limit check must be consistent with the Kconfig limits for
> // CONFIG_RCU_EXP_CPU_STALL_TIMEOUT, so check the allowed range.
> // The minimum clamped value is "2UL", because at least one full
> // tick has to be guaranteed.
> till_stall_check = clamp(msecs_to_jiffies(cpu_stall_timeout), 2UL, 21UL * HZ);
>
> But with 21UL left behind intact?

Good catch, will fix, thank you!

Thanx, Paul