2023-02-02 06:54:44

by syzbot

[permalink] [raw]
Subject: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

Hello,

syzbot found the following issue on:

HEAD commit: c96618275234 Fix up more non-executable files marked execu..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14287dc1480000
kernel config: https://syzkaller.appspot.com/x/.config?x=c8d5c2ee6c2bd4b8
dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
compiler: Debian clang version 13.0.1-6~deb11u1, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/a829cd39e940/disk-c9661827.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/abbc86f52a98/vmlinux-c9661827.xz
kernel image: https://storage.googleapis.com/syzbot-assets/ab0970dd4f84/bzImage-c9661827.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KASAN: slab-out-of-bounds in crc16+0x206/0x280 lib/crc16.c:58
Read of size 1 at addr ffff888075f5c0a8 by task syz-executor.2/15586

CPU: 1 PID: 15586 Comm: syz-executor.2 Not tainted 6.2.0-rc5-syzkaller-00205-gc96618275234 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
print_address_description+0x74/0x340 mm/kasan/report.c:306
print_report+0x107/0x1f0 mm/kasan/report.c:417
kasan_report+0xcd/0x100 mm/kasan/report.c:517
crc16+0x206/0x280 lib/crc16.c:58
ext4_group_desc_csum+0x81b/0xb20 fs/ext4/super.c:3187
ext4_group_desc_csum_set+0x195/0x230 fs/ext4/super.c:3210
ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
ext4_free_blocks+0x191a/0x2810 fs/ext4/mballoc.c:6173
ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
ext4_ext_remove_space+0x24ef/0x46a0 fs/ext4/extents.c:2958
ext4_ext_truncate+0x177/0x220 fs/ext4/extents.c:4416
ext4_truncate+0xa6a/0xea0 fs/ext4/inode.c:4342
ext4_setattr+0x10c8/0x1930 fs/ext4/inode.c:5622
notify_change+0xe50/0x1100 fs/attr.c:482
do_truncate+0x200/0x2f0 fs/open.c:65
handle_truncate fs/namei.c:3216 [inline]
do_open fs/namei.c:3561 [inline]
path_openat+0x272b/0x2dd0 fs/namei.c:3714
do_filp_open+0x264/0x4f0 fs/namei.c:3741
do_sys_openat2+0x124/0x4e0 fs/open.c:1310
do_sys_open fs/open.c:1326 [inline]
__do_sys_creat fs/open.c:1402 [inline]
__se_sys_creat fs/open.c:1396 [inline]
__x64_sys_creat+0x11f/0x160 fs/open.c:1396
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f72f8a8c0c9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f72f97e3168 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
RAX: ffffffffffffffda RBX: 00007f72f8bac050 RCX: 00007f72f8a8c0c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 00007f72f8ae7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd165348bf R14: 00007f72f97e3300 R15: 0000000000022000
</TASK>

Allocated by task 5119:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x3d/0x60 mm/kasan/common.c:52
__kasan_slab_alloc+0x65/0x70 mm/kasan/common.c:325
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slab.h:761 [inline]
slab_alloc_node mm/slub.c:3452 [inline]
slab_alloc mm/slub.c:3460 [inline]
__kmem_cache_alloc_lru mm/slub.c:3467 [inline]
kmem_cache_alloc+0x1b3/0x350 mm/slub.c:3476
kmem_cache_zalloc include/linux/slab.h:710 [inline]
__kernfs_new_node+0xdb/0x730 fs/kernfs/dir.c:614
kernfs_new_node+0x95/0x160 fs/kernfs/dir.c:676
__kernfs_create_file+0x45/0x2e0 fs/kernfs/file.c:1047
sysfs_add_file_mode_ns+0x21d/0x330 fs/sysfs/file.c:294
create_files fs/sysfs/group.c:64 [inline]
internal_create_group+0x508/0xde0 fs/sysfs/group.c:148
internal_create_groups fs/sysfs/group.c:188 [inline]
sysfs_create_groups+0x5d/0x130 fs/sysfs/group.c:214
create_dir lib/kobject.c:68 [inline]
kobject_add_internal+0x723/0xd10 lib/kobject.c:223
kobject_add_varg lib/kobject.c:358 [inline]
kobject_init_and_add+0x104/0x160 lib/kobject.c:441
netdev_queue_add_kobject net/core/net-sysfs.c:1666 [inline]
netdev_queue_update_kobjects+0x20c/0x4c0 net/core/net-sysfs.c:1718
register_queue_kobjects net/core/net-sysfs.c:1779 [inline]
netdev_register_kobject+0x263/0x310 net/core/net-sysfs.c:2019
register_netdevice+0x1043/0x17a0 net/core/dev.c:10045
bond_newlink+0x3f/0x90 drivers/net/bonding/bond_netlink.c:560
rtnl_newlink_create net/core/rtnetlink.c:3407 [inline]
__rtnl_newlink net/core/rtnetlink.c:3624 [inline]
rtnl_newlink+0x14b3/0x2020 net/core/rtnetlink.c:3637
rtnetlink_rcv_msg+0x822/0xf10 net/core/rtnetlink.c:6141
netlink_rcv_skb+0x1f0/0x470 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x7e7/0x9c0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x9b3/0xcd0 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
__sys_sendto+0x46e/0x5f0 net/socket.c:2117
__do_sys_sendto net/socket.c:2129 [inline]
__se_sys_sendto net/socket.c:2125 [inline]
__x64_sys_sendto+0xda/0xf0 net/socket.c:2125
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The buggy address belongs to the object at ffff888075f5c000
which belongs to the cache kernfs_node_cache of size 168
The buggy address is located 0 bytes to the right of
168-byte region [ffff888075f5c000, ffff888075f5c0a8)

The buggy address belongs to the physical page:
page:ffffea0001d7d700 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x75f5c
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 ffff8880129ebc80 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000110011 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5119, tgid 5119 (syz-executor.3), ts 232703738304, free_ts 232703424583
prep_new_page mm/page_alloc.c:2531 [inline]
get_page_from_freelist+0x742/0x7c0 mm/page_alloc.c:4283
__alloc_pages+0x259/0x560 mm/page_alloc.c:5549
alloc_slab_page+0xbd/0x190 mm/slub.c:1851
allocate_slab+0x5e/0x3c0 mm/slub.c:1998
new_slab mm/slub.c:2051 [inline]
___slab_alloc+0x782/0xe20 mm/slub.c:3193
__slab_alloc mm/slub.c:3292 [inline]
__slab_alloc_node mm/slub.c:3345 [inline]
slab_alloc_node mm/slub.c:3442 [inline]
slab_alloc mm/slub.c:3460 [inline]
__kmem_cache_alloc_lru mm/slub.c:3467 [inline]
kmem_cache_alloc+0x268/0x350 mm/slub.c:3476
kmem_cache_zalloc include/linux/slab.h:710 [inline]
__kernfs_new_node+0xdb/0x730 fs/kernfs/dir.c:614
kernfs_new_node+0x95/0x160 fs/kernfs/dir.c:676
__kernfs_create_file+0x45/0x2e0 fs/kernfs/file.c:1047
sysfs_add_file_mode_ns+0x21d/0x330 fs/sysfs/file.c:294
create_files fs/sysfs/group.c:64 [inline]
internal_create_group+0x508/0xde0 fs/sysfs/group.c:148
internal_create_groups fs/sysfs/group.c:188 [inline]
sysfs_create_groups+0x5d/0x130 fs/sysfs/group.c:214
create_dir lib/kobject.c:68 [inline]
kobject_add_internal+0x723/0xd10 lib/kobject.c:223
kobject_add_varg lib/kobject.c:358 [inline]
kobject_init_and_add+0x104/0x160 lib/kobject.c:441
netdev_queue_add_kobject net/core/net-sysfs.c:1666 [inline]
netdev_queue_update_kobjects+0x20c/0x4c0 net/core/net-sysfs.c:1718
register_queue_kobjects net/core/net-sysfs.c:1779 [inline]
netdev_register_kobject+0x263/0x310 net/core/net-sysfs.c:2019
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1446 [inline]
free_pcp_prepare+0x751/0x780 mm/page_alloc.c:1496
free_unref_page_prepare mm/page_alloc.c:3369 [inline]
free_unref_page+0x19/0x4c0 mm/page_alloc.c:3464
qlist_free_all+0x2b/0x70 mm/kasan/quarantine.c:187
kasan_quarantine_reduce+0x156/0x170 mm/kasan/quarantine.c:294
__kasan_slab_alloc+0x1f/0x70 mm/kasan/common.c:302
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slab.h:761 [inline]
slab_alloc_node mm/slub.c:3452 [inline]
__kmem_cache_alloc_node+0x1e0/0x340 mm/slub.c:3491
kmalloc_trace+0x26/0x60 mm/slab_common.c:1062
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
ref_tracker_alloc+0x128/0x440 lib/ref_tracker.c:85
__netdev_tracker_alloc include/linux/netdevice.h:4020 [inline]
netdev_hold include/linux/netdevice.h:4049 [inline]
rx_queue_add_kobject net/core/net-sysfs.c:1060 [inline]
net_rx_queue_update_kobjects+0x15d/0x4c0 net/core/net-sysfs.c:1114
register_queue_kobjects net/core/net-sysfs.c:1774 [inline]
netdev_register_kobject+0x222/0x310 net/core/net-sysfs.c:2019
register_netdevice+0x1043/0x17a0 net/core/dev.c:10045
bond_newlink+0x3f/0x90 drivers/net/bonding/bond_netlink.c:560
rtnl_newlink_create net/core/rtnetlink.c:3407 [inline]
__rtnl_newlink net/core/rtnetlink.c:3624 [inline]
rtnl_newlink+0x14b3/0x2020 net/core/rtnetlink.c:3637
rtnetlink_rcv_msg+0x822/0xf10 net/core/rtnetlink.c:6141
netlink_rcv_skb+0x1f0/0x470 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x7e7/0x9c0 net/netlink/af_netlink.c:1365

Memory state around the buggy address:
ffff888075f5bf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888075f5c000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888075f5c080: 00 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00
^
ffff888075f5c100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888075f5c180: 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 00 00
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2023-02-13 15:56:49

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

syzbot has found a reproducer for the following issue on:

HEAD commit: ceaa837f96ad Linux 6.2-rc8
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
kernel config: https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
kernel image: https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339

CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted 6.2.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:306 [inline]
print_report+0x163/0x4f0 mm/kasan/report.c:417
kasan_report+0x13a/0x170 mm/kasan/report.c:517
crc16+0x1fb/0x280 lib/crc16.c:58
ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
evict+0x2a4/0x620 fs/inode.c:664
do_unlinkat+0x4f1/0x930 fs/namei.c:4327
__do_sys_unlink fs/namei.c:4368 [inline]
__se_sys_unlink fs/namei.c:4366 [inline]
__x64_sys_unlink+0x49/0x50 fs/namei.c:4366
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fbc85a8c0f9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
</TASK>

The buggy address belongs to the physical page:
page:ffffea0001f78000 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x7de00
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08 0000000000000000
raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as freed
page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
prep_new_page mm/page_alloc.c:2531 [inline]
get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
__alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
alloc_slab_page+0x6a/0x160 mm/slub.c:1851
allocate_slab mm/slub.c:1998 [inline]
new_slab+0x84/0x2f0 mm/slub.c:2051
___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
__kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
mt_alloc_bulk lib/maple_tree.c:157 [inline]
mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
mas_node_count_gfp lib/maple_tree.c:1316 [inline]
mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
vma_expand+0x277/0x850 mm/mmap.c:541
mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
do_mmap+0x8c9/0xf70 mm/mmap.c:1411
vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1446 [inline]
free_pcp_prepare mm/page_alloc.c:1496 [inline]
free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
slab_alloc_node mm/slub.c:3452 [inline]
kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
__alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
alloc_skb include/linux/skbuff.h:1270 [inline]
alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
__sys_sendto+0x475/0x5f0 net/socket.c:2117
__do_sys_sendto net/socket.c:2129 [inline]
__se_sys_sendto net/socket.c:2125 [inline]
__x64_sys_sendto+0xde/0xf0 net/socket.c:2125
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Memory state around the buggy address:
ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================


2023-03-01 12:13:58

by Tudor Ambarus

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

Hi!

On 2/13/23 15:56, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: ceaa837f96ad Linux 6.2-rc8
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> kernel config: https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> ==================================================================
> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
>
> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted 6.2.0-rc8-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> print_address_description mm/kasan/report.c:306 [inline]
> print_report+0x163/0x4f0 mm/kasan/report.c:417
> kasan_report+0x13a/0x170 mm/kasan/report.c:517
> crc16+0x1fb/0x280 lib/crc16.c:58
> ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> evict+0x2a4/0x620 fs/inode.c:664
> do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> __do_sys_unlink fs/namei.c:4368 [inline]
> __se_sys_unlink fs/namei.c:4366 [inline]
> __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7fbc85a8c0f9
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> </TASK>
>
> The buggy address belongs to the physical page:
> page:ffffea0001f78000 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x7de00
> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08 0000000000000000
> raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
> page dumped because: kasan: bad access detected
> page_owner tracks the page as freed
> page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> prep_new_page mm/page_alloc.c:2531 [inline]
> get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> allocate_slab mm/slub.c:1998 [inline]
> new_slab+0x84/0x2f0 mm/slub.c:2051
> ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> mt_alloc_bulk lib/maple_tree.c:157 [inline]
> mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> vma_expand+0x277/0x850 mm/mmap.c:541
> mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> page last free stack trace:
> reset_page_owner include/linux/page_owner.h:24 [inline]
> free_pages_prepare mm/page_alloc.c:1446 [inline]
> free_pcp_prepare mm/page_alloc.c:1496 [inline]
> free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> kasan_slab_alloc include/linux/kasan.h:201 [inline]
> slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> slab_alloc_node mm/slub.c:3452 [inline]
> kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> alloc_skb include/linux/skbuff.h:1270 [inline]
> alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> sock_sendmsg_nosec net/socket.c:714 [inline]
> sock_sendmsg net/socket.c:734 [inline]
> __sys_sendto+0x475/0x5f0 net/socket.c:2117
> __do_sys_sendto net/socket.c:2129 [inline]
> __se_sys_sendto net/socket.c:2125 [inline]
> __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> Memory state around the buggy address:
> ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ^
> ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ==================================================================
>


I think the patch from below should fix it.

I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
super block in the buffer get corrupted sometime after the .get_tree
(which eventually calls __ext4_fill_super()) is called. So instead of
relying on the contents of the buffer, we should instead rely on the
s_desc_size initialized at the __ext4_fill_super() time.

If someone finds this good (or bad), or has a more in depth explanation,
please let me know, it will help me better understand the subsystem. In
the meantime I'll continue to investigate this and prepare a patch for
it.

Cheers,
ta

index 260c1b3e3ef2..91d41e84da32 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3182,11 +3182,9 @@ static __le16 ext4_group_desc_csum(struct
super_block *sb, __u32 block_group,
crc = crc16(crc, (__u8 *)gdp, offset);
offset += sizeof(gdp->bg_checksum); /* skip checksum */
/* for checksum of struct ext4_group_desc do the rest...*/
- if (ext4_has_feature_64bit(sb) &&
- offset < le16_to_cpu(sbi->s_es->s_desc_size))
+ if (ext4_has_feature_64bit(sb) && offset < sbi->s_desc_size)
crc = crc16(crc, (__u8 *)gdp + offset,
- le16_to_cpu(sbi->s_es->s_desc_size) -
- offset);
+ sbi->s_desc_size - offset);

out:
return cpu_to_le16(crc);

2023-03-03 21:51:28

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

syzbot has found a reproducer for the following issue on:

HEAD commit: 596b6b709632 Merge branch 'for-next/core' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=1151054cc80000
kernel config: https://syzkaller.appspot.com/x/.config?x=3519974f3f27816d
dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ce3de4c80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16b02598c80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/06e2210b88a3/disk-596b6b70.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/79e6930ab577/vmlinux-596b6b70.xz
kernel image: https://storage.googleapis.com/syzbot-assets/56b95e6bcb5c/Image-596b6b70.gz.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/a765d6554060/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KASAN: slab-out-of-bounds in crc16+0xc0/0x104 lib/crc16.c:58
Read of size 1 at addr ffff0000d5eff0a8 by task syz-executor175/8245

CPU: 1 PID: 8245 Comm: syz-executor175 Not tainted 6.2.0-syzkaller-18302-g596b6b709632 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
Call trace:
dump_backtrace+0x1c8/0x1f4 arch/arm64/kernel/stacktrace.c:158
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:165
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:306 [inline]
print_report+0x174/0x4c0 mm/kasan/report.c:417
kasan_report+0xd4/0x130 mm/kasan/report.c:517
__asan_report_load1_noabort+0x2c/0x38 mm/kasan/report_generic.c:348
crc16+0xc0/0x104 lib/crc16.c:58
ext4_group_desc_csum+0x6a8/0x99c fs/ext4/super.c:3187
ext4_group_desc_csum_set+0x17c/0x210 fs/ext4/super.c:3210
__ext4_new_inode+0x20dc/0x3acc fs/ext4/ialloc.c:1227
ext4_create+0x234/0x480 fs/ext4/namei.c:2809
lookup_open fs/namei.c:3413 [inline]
open_last_lookups fs/namei.c:3481 [inline]
path_openat+0xe6c/0x2578 fs/namei.c:3711
do_filp_open+0x1bc/0x3cc fs/namei.c:3741
do_sys_openat2+0x128/0x3d8 fs/open.c:1310
do_sys_open fs/open.c:1326 [inline]
__do_sys_openat fs/open.c:1342 [inline]
__se_sys_openat fs/open.c:1337 [inline]
__arm64_sys_openat+0x1f0/0x240 fs/open.c:1337
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591

Allocated by task 5961:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x4c/0x80 mm/kasan/common.c:52
kasan_save_alloc_info+0x24/0x30 mm/kasan/generic.c:512
__kasan_slab_alloc+0x74/0x8c mm/kasan/common.c:328
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook+0x80/0x478 mm/slab.h:761
slab_alloc_node mm/slub.c:3452 [inline]
slab_alloc mm/slub.c:3460 [inline]
__kmem_cache_alloc_lru mm/slub.c:3467 [inline]
kmem_cache_alloc+0x288/0x37c mm/slub.c:3476
kmem_cache_zalloc include/linux/slab.h:710 [inline]
__kernfs_new_node+0xe4/0x66c fs/kernfs/dir.c:614
kernfs_new_node+0x98/0x184 fs/kernfs/dir.c:676
__kernfs_create_file+0x60/0x2d4 fs/kernfs/file.c:1047
sysfs_add_file_mode_ns+0x1dc/0x298 fs/sysfs/file.c:294
create_files fs/sysfs/group.c:64 [inline]
internal_create_group+0x428/0xbec fs/sysfs/group.c:148
internal_create_groups fs/sysfs/group.c:188 [inline]
sysfs_create_groups+0x60/0x130 fs/sysfs/group.c:214
create_dir lib/kobject.c:68 [inline]
kobject_add_internal+0x5d4/0xb14 lib/kobject.c:223
kobject_add_varg lib/kobject.c:358 [inline]
kobject_init_and_add+0x130/0x1a0 lib/kobject.c:441
netdev_queue_add_kobject net/core/net-sysfs.c:1666 [inline]
netdev_queue_update_kobjects+0x1d8/0x470 net/core/net-sysfs.c:1718
register_queue_kobjects net/core/net-sysfs.c:1779 [inline]
netdev_register_kobject+0x22c/0x2d8 net/core/net-sysfs.c:2019
register_netdevice+0xcb8/0x1270 net/core/dev.c:10037
bond_newlink+0x50/0xa8 drivers/net/bonding/bond_netlink.c:560
rtnl_newlink_create net/core/rtnetlink.c:3407 [inline]
__rtnl_newlink net/core/rtnetlink.c:3624 [inline]
rtnl_newlink+0x1174/0x1b1c net/core/rtnetlink.c:3637
rtnetlink_rcv_msg+0x6ec/0xc8c net/core/rtnetlink.c:6141
netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2574
rtnetlink_rcv+0x28/0x38 net/core/rtnetlink.c:6159
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x800/0xae0 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
__sys_sendto+0x3b4/0x504 net/socket.c:2120
__do_sys_sendto net/socket.c:2132 [inline]
__se_sys_sendto net/socket.c:2128 [inline]
__arm64_sys_sendto+0xd8/0xf8 net/socket.c:2128
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591

The buggy address belongs to the object at ffff0000d5eff000
which belongs to the cache kernfs_node_cache of size 168
The buggy address is located 0 bytes to the right of
168-byte region [ffff0000d5eff000, ffff0000d5eff0a8)

The buggy address belongs to the physical page:
page:0000000016584f53 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x115eff
flags: 0x5ffc00000000200(slab|node=0|zone=2|lastcpupid=0x7ff)
raw: 05ffc00000000200 ffff0000c0844c00 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000110011 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff0000d5efef80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff0000d5eff000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff0000d5eff080: 00 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00
^
ffff0000d5eff100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff0000d5eff180: 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 00 00
==================================================================
EXT4-fs error (device loop3): __ext4_get_inode_loc:4560: comm syz-executor175: Invalid inode table block 4 in block_group 0
EXT4-fs error (device loop3) in ext4_reserve_inode_write:5906: Corrupt filesystem
EXT4-fs error (device loop3): __ext4_get_inode_loc:4560: comm syz-executor175: Invalid inode table block 4 in block_group 0
EXT4-fs error (device loop3) in ext4_reserve_inode_write:5906: Corrupt filesystem
EXT4-fs error (device loop3): ext4_evict_inode:279: inode #18: comm syz-executor175: mark_inode_dirty error
EXT4-fs warning (device loop3): ext4_evict_inode:282: couldn't mark inode dirty (err -117)


2023-03-07 10:40:25

by Jan Kara

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

Hi!

On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> On 2/13/23 15:56, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> >
> > HEAD commit: ceaa837f96ad Linux 6.2-rc8
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > mounted in repro: https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
> >
> > ==================================================================
> > BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> >
> > CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted 6.2.0-rc8-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
> > Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:88 [inline]
> > dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > print_address_description mm/kasan/report.c:306 [inline]
> > print_report+0x163/0x4f0 mm/kasan/report.c:417
> > kasan_report+0x13a/0x170 mm/kasan/report.c:517
> > crc16+0x1fb/0x280 lib/crc16.c:58
> > ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> > ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> > ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> > ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> > ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> > ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> > ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> > ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> > ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> > ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> > evict+0x2a4/0x620 fs/inode.c:664
> > do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> > __do_sys_unlink fs/namei.c:4368 [inline]
> > __se_sys_unlink fs/namei.c:4366 [inline]
> > __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > RIP: 0033:0x7fbc85a8c0f9
> > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> > </TASK>
> >
> > The buggy address belongs to the physical page:
> > page:ffffea0001f78000 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x7de00
> > flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08 0000000000000000
> > raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
> > page dumped because: kasan: bad access detected
> > page_owner tracks the page as freed
> > page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> > prep_new_page mm/page_alloc.c:2531 [inline]
> > get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> > __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> > alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> > allocate_slab mm/slub.c:1998 [inline]
> > new_slab+0x84/0x2f0 mm/slub.c:2051
> > ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> > __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> > kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> > mt_alloc_bulk lib/maple_tree.c:157 [inline]
> > mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> > mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> > mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> > vma_expand+0x277/0x850 mm/mmap.c:541
> > mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> > do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> > vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > page last free stack trace:
> > reset_page_owner include/linux/page_owner.h:24 [inline]
> > free_pages_prepare mm/page_alloc.c:1446 [inline]
> > free_pcp_prepare mm/page_alloc.c:1496 [inline]
> > free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> > free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> > qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> > kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> > __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> > kasan_slab_alloc include/linux/kasan.h:201 [inline]
> > slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> > slab_alloc_node mm/slub.c:3452 [inline]
> > kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> > __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> > alloc_skb include/linux/skbuff.h:1270 [inline]
> > alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> > sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> > unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> > sock_sendmsg_nosec net/socket.c:714 [inline]
> > sock_sendmsg net/socket.c:734 [inline]
> > __sys_sendto+0x475/0x5f0 net/socket.c:2117
> > __do_sys_sendto net/socket.c:2129 [inline]
> > __se_sys_sendto net/socket.c:2125 [inline]
> > __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> > Memory state around the buggy address:
> > ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ^
> > ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > ==================================================================
> >
>
>
> I think the patch from below should fix it.
>
> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> super block in the buffer get corrupted sometime after the .get_tree
> (which eventually calls __ext4_fill_super()) is called. So instead of
> relying on the contents of the buffer, we should instead rely on the
> s_desc_size initialized at the __ext4_fill_super() time.
>
> If someone finds this good (or bad), or has a more in depth explanation,
> please let me know, it will help me better understand the subsystem. In
> the meantime I'll continue to investigate this and prepare a patch for
> it.

If there's something corrupting the superblock while the filesystem is
mounted, we need to find what is corrupting the SB and fix *that*. Not try
to paper over the problem by not using the on-disk data... Maybe journal
replay is corrupting the value or something like that?

Honza

> index 260c1b3e3ef2..91d41e84da32 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3182,11 +3182,9 @@ static __le16 ext4_group_desc_csum(struct super_block
> *sb, __u32 block_group,
> crc = crc16(crc, (__u8 *)gdp, offset);
> offset += sizeof(gdp->bg_checksum); /* skip checksum */
> /* for checksum of struct ext4_group_desc do the rest...*/
> - if (ext4_has_feature_64bit(sb) &&
> - offset < le16_to_cpu(sbi->s_es->s_desc_size))
> + if (ext4_has_feature_64bit(sb) && offset < sbi->s_desc_size)
> crc = crc16(crc, (__u8 *)gdp + offset,
> - le16_to_cpu(sbi->s_es->s_desc_size) -
> - offset);
> + sbi->s_desc_size - offset);
>
> out:
> return cpu_to_le16(crc);
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-03-07 11:06:15

by Tudor Ambarus

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum



On 3/7/23 10:39, Jan Kara wrote:
> Hi!

Hi!

Thanks for taking the time to review the proposal!

>
> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
>> On 2/13/23 15:56, syzbot wrote:
>>> syzbot has found a reproducer for the following issue on:
>>>
>>> HEAD commit: ceaa837f96ad Linux 6.2-rc8
>>> git tree: upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
>>> kernel config: https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
>>> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
>>>
>>> Downloadable assets:
>>> disk image: https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
>>> vmlinux: https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
>>> kernel image: https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
>>> mounted in repro: https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
>>>
>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>> Reported-by: [email protected]
>>>
>>> ==================================================================
>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
>>>
>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted 6.2.0-rc8-syzkaller #0
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
>>> Call Trace:
>>> <TASK>
>>> __dump_stack lib/dump_stack.c:88 [inline]
>>> dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
>>> print_address_description mm/kasan/report.c:306 [inline]
>>> print_report+0x163/0x4f0 mm/kasan/report.c:417
>>> kasan_report+0x13a/0x170 mm/kasan/report.c:517
>>> crc16+0x1fb/0x280 lib/crc16.c:58
>>> ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
>>> ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
>>> ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
>>> ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
>>> ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
>>> ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
>>> ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
>>> ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
>>> ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
>>> ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
>>> evict+0x2a4/0x620 fs/inode.c:664
>>> do_unlinkat+0x4f1/0x930 fs/namei.c:4327
>>> __do_sys_unlink fs/namei.c:4368 [inline]
>>> __se_sys_unlink fs/namei.c:4366 [inline]
>>> __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> RIP: 0033:0x7fbc85a8c0f9
>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
>>> </TASK>
>>>
>>> The buggy address belongs to the physical page:
>>> page:ffffea0001f78000 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x7de00
>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08 0000000000000000
>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
>>> page dumped because: kasan: bad access detected
>>> page_owner tracks the page as freed
>>> page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
>>> prep_new_page mm/page_alloc.c:2531 [inline]
>>> get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
>>> __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
>>> alloc_slab_page+0x6a/0x160 mm/slub.c:1851
>>> allocate_slab mm/slub.c:1998 [inline]
>>> new_slab+0x84/0x2f0 mm/slub.c:2051
>>> ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
>>> __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
>>> kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
>>> mt_alloc_bulk lib/maple_tree.c:157 [inline]
>>> mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
>>> mas_node_count_gfp lib/maple_tree.c:1316 [inline]
>>> mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
>>> vma_expand+0x277/0x850 mm/mmap.c:541
>>> mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
>>> do_mmap+0x8c9/0xf70 mm/mmap.c:1411
>>> vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> page last free stack trace:
>>> reset_page_owner include/linux/page_owner.h:24 [inline]
>>> free_pages_prepare mm/page_alloc.c:1446 [inline]
>>> free_pcp_prepare mm/page_alloc.c:1496 [inline]
>>> free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
>>> free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
>>> qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
>>> kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
>>> __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
>>> kasan_slab_alloc include/linux/kasan.h:201 [inline]
>>> slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
>>> slab_alloc_node mm/slub.c:3452 [inline]
>>> kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
>>> __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
>>> alloc_skb include/linux/skbuff.h:1270 [inline]
>>> alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
>>> sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
>>> unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>> sock_sendmsg net/socket.c:734 [inline]
>>> __sys_sendto+0x475/0x5f0 net/socket.c:2117
>>> __do_sys_sendto net/socket.c:2129 [inline]
>>> __se_sys_sendto net/socket.c:2125 [inline]
>>> __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>
>>> Memory state around the buggy address:
>>> ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>> ^
>>> ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>> ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>> ==================================================================
>>>
>>
>>
>> I think the patch from below should fix it.
>>
>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
>> super block in the buffer get corrupted sometime after the .get_tree
>> (which eventually calls __ext4_fill_super()) is called. So instead of
>> relying on the contents of the buffer, we should instead rely on the
>> s_desc_size initialized at the __ext4_fill_super() time.
>>
>> If someone finds this good (or bad), or has a more in depth explanation,
>> please let me know, it will help me better understand the subsystem. In
>> the meantime I'll continue to investigate this and prepare a patch for
>> it.
>
> If there's something corrupting the superblock while the filesystem is
> mounted, we need to find what is corrupting the SB and fix *that*. Not try
> to paper over the problem by not using the on-disk data... Maybe journal
> replay is corrupting the value or something like that?
>
> Honza
>

Ok, I agree. First thing would be to understand the reproducer and to
simplify it if possible. I haven't yet decoded what the syz repro is
doing at
https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
Will reply to this email thread once I understand what's happening. If
you or someone else can decode the syz repro faster than me, shoot.

Cheers,
ta

>> index 260c1b3e3ef2..91d41e84da32 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -3182,11 +3182,9 @@ static __le16 ext4_group_desc_csum(struct super_block
>> *sb, __u32 block_group,
>> crc = crc16(crc, (__u8 *)gdp, offset);
>> offset += sizeof(gdp->bg_checksum); /* skip checksum */
>> /* for checksum of struct ext4_group_desc do the rest...*/
>> - if (ext4_has_feature_64bit(sb) &&
>> - offset < le16_to_cpu(sbi->s_es->s_desc_size))
>> + if (ext4_has_feature_64bit(sb) && offset < sbi->s_desc_size)
>> crc = crc16(crc, (__u8 *)gdp + offset,
>> - le16_to_cpu(sbi->s_es->s_desc_size) -
>> - offset);
>> + sbi->s_desc_size - offset);
>>
>> out:
>> return cpu_to_le16(crc);

2023-03-13 11:12:15

by Tudor Ambarus

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

Hi, Jan,

On 3/7/23 11:02, Tudor Ambarus wrote:
>
>
> On 3/7/23 10:39, Jan Kara wrote:
>> Hi!
>
> Hi!
>
> Thanks for taking the time to review the proposal!
>
>>
>> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
>>> On 2/13/23 15:56, syzbot wrote:
>>>> syzbot has found a reproducer for the following issue on:
>>>>
>>>> HEAD commit:    ceaa837f96ad Linux 6.2-rc8
>>>> git tree:       upstream
>>>> console output:
>>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
>>>> kernel config: 
>>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
>>>> dashboard link:
>>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
>>>> compiler:       Debian clang version 15.0.7, GNU ld (GNU Binutils
>>>> for Debian) 2.35.2
>>>> syz repro:     
>>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
>>>>
>>>> Downloadable assets:
>>>> disk image:
>>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
>>>> vmlinux:
>>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
>>>> kernel image:
>>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
>>>> mounted in repro:
>>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
>>>>
>>>> IMPORTANT: if you fix the issue, please add the following tag to the
>>>> commit:
>>>> Reported-by: [email protected]
>>>>
>>>> ==================================================================
>>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
>>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
>>>>
>>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
>>>> 6.2.0-rc8-syzkaller #0
>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>> BIOS Google 01/21/2023
>>>> Call Trace:
>>>>    <TASK>
>>>>    __dump_stack lib/dump_stack.c:88 [inline]
>>>>    dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
>>>>    print_address_description mm/kasan/report.c:306 [inline]
>>>>    print_report+0x163/0x4f0 mm/kasan/report.c:417
>>>>    kasan_report+0x13a/0x170 mm/kasan/report.c:517
>>>>    crc16+0x1fb/0x280 lib/crc16.c:58
>>>>    ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
>>>>    ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
>>>>    ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
>>>>    ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
>>>>    ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
>>>>    ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
>>>>    ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
>>>>    ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
>>>>    ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
>>>>    ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
>>>>    evict+0x2a4/0x620 fs/inode.c:664
>>>>    do_unlinkat+0x4f1/0x930 fs/namei.c:4327
>>>>    __do_sys_unlink fs/namei.c:4368 [inline]
>>>>    __se_sys_unlink fs/namei.c:4366 [inline]
>>>>    __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
>>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>> RIP: 0033:0x7fbc85a8c0f9
>>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
>>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
>>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
>>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
>>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
>>>>    </TASK>
>>>>
>>>> The buggy address belongs to the physical page:
>>>> page:ffffea0001f78000 refcount:0 mapcount:-128
>>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
>>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
>>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
>>>> 0000000000000000
>>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
>>>> 0000000000000000
>>>> page dumped because: kasan: bad access detected
>>>> page_owner tracks the page as freed
>>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
>>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
>>>>    prep_new_page mm/page_alloc.c:2531 [inline]
>>>>    get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
>>>>    __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
>>>>    alloc_slab_page+0x6a/0x160 mm/slub.c:1851
>>>>    allocate_slab mm/slub.c:1998 [inline]
>>>>    new_slab+0x84/0x2f0 mm/slub.c:2051
>>>>    ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
>>>>    __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
>>>>    kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
>>>>    mt_alloc_bulk lib/maple_tree.c:157 [inline]
>>>>    mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
>>>>    mas_node_count_gfp lib/maple_tree.c:1316 [inline]
>>>>    mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
>>>>    vma_expand+0x277/0x850 mm/mmap.c:541
>>>>    mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
>>>>    do_mmap+0x8c9/0xf70 mm/mmap.c:1411
>>>>    vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
>>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>> page last free stack trace:
>>>>    reset_page_owner include/linux/page_owner.h:24 [inline]
>>>>    free_pages_prepare mm/page_alloc.c:1446 [inline]
>>>>    free_pcp_prepare mm/page_alloc.c:1496 [inline]
>>>>    free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
>>>>    free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
>>>>    qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
>>>>    kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
>>>>    __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
>>>>    kasan_slab_alloc include/linux/kasan.h:201 [inline]
>>>>    slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
>>>>    slab_alloc_node mm/slub.c:3452 [inline]
>>>>    kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
>>>>    __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
>>>>    alloc_skb include/linux/skbuff.h:1270 [inline]
>>>>    alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
>>>>    sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
>>>>    unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
>>>>    sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>    sock_sendmsg net/socket.c:734 [inline]
>>>>    __sys_sendto+0x475/0x5f0 net/socket.c:2117
>>>>    __do_sys_sendto net/socket.c:2129 [inline]
>>>>    __se_sys_sendto net/socket.c:2125 [inline]
>>>>    __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
>>>>    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>
>>>> Memory state around the buggy address:
>>>>    ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>    ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>                      ^
>>>>    ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>    ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>> ==================================================================
>>>>
>>>
>>>
>>> I think the patch from below should fix it.
>>>
>>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
>>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
>>> super block in the buffer get corrupted sometime after the .get_tree
>>> (which eventually calls __ext4_fill_super()) is called. So instead of
>>> relying on the contents of the buffer, we should instead rely on the
>>> s_desc_size initialized at the __ext4_fill_super() time.
>>>
>>> If someone finds this good (or bad), or has a more in depth explanation,
>>> please let me know, it will help me better understand the subsystem. In
>>> the meantime I'll continue to investigate this and prepare a patch for
>>> it.
>>
>> If there's something corrupting the superblock while the filesystem is
>> mounted, we need to find what is corrupting the SB and fix *that*. Not
>> try
>> to paper over the problem by not using the on-disk data... Maybe journal
>> replay is corrupting the value or something like that?
>>
>>                                 Honza
>>
>
> Ok, I agree. First thing would be to understand the reproducer and to
> simplify it if possible. I haven't yet decoded what the syz repro is
> doing at
> https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> Will reply to this email thread once I understand what's happening. If
> you or someone else can decode the syz repro faster than me, shoot.
>

I can now explain how the contents of the super block of the buffer get
corrupted. After the ext4 fs is mounted to the target ("./bus"), the
reproducer maps 6MB of data starting at offset 0 in the target's file
("./bus"), then it starts overriding the data with something else, by
using memcpy, memset, individual byte inits. Does that mean that we
shouldn't rely on the contents of the super block in the buffer after we
mount the file system? If so, then my patch stands. I'll be happy to
extend it if needed. Below one may find a step by step interpretation of
the reproducer.

We have a strace log for the same bug, but on Android 5.15:
https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000

Look for pid 328. You notice that the bpf() syscalls return error, so I
commented them out in the c repro to confirm that they are not the
cause. The bug reproduced without the bpf() calls. One can find the c
repro at:
https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000

Let's look at these calls, just before the bug was hit:
[pid 328] open("./bus",
O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
000) = 4
[pid 328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
[pid 328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
[pid 328] mmap(0x20000000, 6291456,
PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
5, 0) = 0x20000000

- ./bus is created (if it does not exist), fd 4 is returned.
- /dev/loop0 is mounted to ./bus
- then it creates a new file descriptor (5) for the same ./bus
- then it creates a mapping for ./bus starting at offset zero. The
mapped area is at 0x20000000 and is of 0x600000ul length.

Now look again in the c reproducer. You'll see that after the mapping
lots of bytes are overwritten starting with 0x20000300. If I comment out
all those byte modifications after the mmap, the reproducer is silenced.

Cheers,
ta
>
>>> index 260c1b3e3ef2..91d41e84da32 100644
>>> --- a/fs/ext4/super.c
>>> +++ b/fs/ext4/super.c
>>> @@ -3182,11 +3182,9 @@ static __le16 ext4_group_desc_csum(struct
>>> super_block
>>> *sb, __u32 block_group,
>>>          crc = crc16(crc, (__u8 *)gdp, offset);
>>>          offset += sizeof(gdp->bg_checksum); /* skip checksum */
>>>          /* for checksum of struct ext4_group_desc do the rest...*/
>>> -       if (ext4_has_feature_64bit(sb) &&
>>> -           offset < le16_to_cpu(sbi->s_es->s_desc_size))
>>> +       if (ext4_has_feature_64bit(sb) && offset < sbi->s_desc_size)
>>>                  crc = crc16(crc, (__u8 *)gdp + offset,
>>> -                           le16_to_cpu(sbi->s_es->s_desc_size) -
>>> -                               offset);
>>> +                           sbi->s_desc_size - offset);
>>>
>>>   out:
>>>          return cpu_to_le16(crc);

2023-03-13 11:57:45

by Jan Kara

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

Hi Tudor!

On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> On 3/7/23 11:02, Tudor Ambarus wrote:
> > On 3/7/23 10:39, Jan Kara wrote:
> >> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> >>> On 2/13/23 15:56, syzbot wrote:
> >>>> syzbot has found a reproducer for the following issue on:
> >>>>
> >>>> HEAD commit:??? ceaa837f96ad Linux 6.2-rc8
> >>>> git tree:?????? upstream
> >>>> console output:
> >>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> >>>> kernel config:?
> >>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> >>>> dashboard link:
> >>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> >>>> compiler:?????? Debian clang version 15.0.7, GNU ld (GNU Binutils
> >>>> for Debian) 2.35.2
> >>>> syz repro:?????
> >>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> >>>>
> >>>> Downloadable assets:
> >>>> disk image:
> >>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> >>>> vmlinux:
> >>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> >>>> kernel image:
> >>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> >>>> mounted in repro:
> >>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> >>>>
> >>>> IMPORTANT: if you fix the issue, please add the following tag to the
> >>>> commit:
> >>>> Reported-by: [email protected]
> >>>>
> >>>> ==================================================================
> >>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> >>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> >>>>
> >>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> >>>> 6.2.0-rc8-syzkaller #0
> >>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
> >>>> BIOS Google 01/21/2023
> >>>> Call Trace:
> >>>> ?? <TASK>
> >>>> ?? __dump_stack lib/dump_stack.c:88 [inline]
> >>>> ?? dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> >>>> ?? print_address_description mm/kasan/report.c:306 [inline]
> >>>> ?? print_report+0x163/0x4f0 mm/kasan/report.c:417
> >>>> ?? kasan_report+0x13a/0x170 mm/kasan/report.c:517
> >>>> ?? crc16+0x1fb/0x280 lib/crc16.c:58
> >>>> ?? ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> >>>> ?? ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> >>>> ?? ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> >>>> ?? ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> >>>> ?? ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> >>>> ?? ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> >>>> ?? ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> >>>> ?? ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> >>>> ?? ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> >>>> ?? ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> >>>> ?? evict+0x2a4/0x620 fs/inode.c:664
> >>>> ?? do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> >>>> ?? __do_sys_unlink fs/namei.c:4368 [inline]
> >>>> ?? __se_sys_unlink fs/namei.c:4366 [inline]
> >>>> ?? __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> >>>> ?? do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >>>> ?? do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> >>>> ?? entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>>> RIP: 0033:0x7fbc85a8c0f9
> >>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> >>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> >>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> >>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> >>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> >>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> >>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> >>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> >>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> >>>> ?? </TASK>
> >>>>
> >>>> The buggy address belongs to the physical page:
> >>>> page:ffffea0001f78000 refcount:0 mapcount:-128
> >>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
> >>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> >>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> >>>> 0000000000000000
> >>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
> >>>> 0000000000000000
> >>>> page dumped because: kasan: bad access detected
> >>>> page_owner tracks the page as freed
> >>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
> >>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> >>>> ?? prep_new_page mm/page_alloc.c:2531 [inline]
> >>>> ?? get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> >>>> ?? __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> >>>> ?? alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> >>>> ?? allocate_slab mm/slub.c:1998 [inline]
> >>>> ?? new_slab+0x84/0x2f0 mm/slub.c:2051
> >>>> ?? ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> >>>> ?? __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> >>>> ?? kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> >>>> ?? mt_alloc_bulk lib/maple_tree.c:157 [inline]
> >>>> ?? mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> >>>> ?? mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> >>>> ?? mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> >>>> ?? vma_expand+0x277/0x850 mm/mmap.c:541
> >>>> ?? mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> >>>> ?? do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> >>>> ?? vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> >>>> ?? do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >>>> ?? do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> >>>> ?? entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>>> page last free stack trace:
> >>>> ?? reset_page_owner include/linux/page_owner.h:24 [inline]
> >>>> ?? free_pages_prepare mm/page_alloc.c:1446 [inline]
> >>>> ?? free_pcp_prepare mm/page_alloc.c:1496 [inline]
> >>>> ?? free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> >>>> ?? free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> >>>> ?? qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> >>>> ?? kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> >>>> ?? __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> >>>> ?? kasan_slab_alloc include/linux/kasan.h:201 [inline]
> >>>> ?? slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> >>>> ?? slab_alloc_node mm/slub.c:3452 [inline]
> >>>> ?? kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> >>>> ?? __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> >>>> ?? alloc_skb include/linux/skbuff.h:1270 [inline]
> >>>> ?? alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> >>>> ?? sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> >>>> ?? unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> >>>> ?? sock_sendmsg_nosec net/socket.c:714 [inline]
> >>>> ?? sock_sendmsg net/socket.c:734 [inline]
> >>>> ?? __sys_sendto+0x475/0x5f0 net/socket.c:2117
> >>>> ?? __do_sys_sendto net/socket.c:2129 [inline]
> >>>> ?? __se_sys_sendto net/socket.c:2125 [inline]
> >>>> ?? __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> >>>> ?? do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >>>> ?? do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> >>>> ?? entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>>>
> >>>> Memory state around the buggy address:
> >>>> ?? ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>>> ?? ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >>>> ???????????????????? ^
> >>>> ?? ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >>>> ?? ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >>>> ==================================================================
> >>>>
> >>>
> >>>
> >>> I think the patch from below should fix it.
> >>>
> >>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> >>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> >>> super block in the buffer get corrupted sometime after the .get_tree
> >>> (which eventually calls __ext4_fill_super()) is called. So instead of
> >>> relying on the contents of the buffer, we should instead rely on the
> >>> s_desc_size initialized at the __ext4_fill_super() time.
> >>>
> >>> If someone finds this good (or bad), or has a more in depth explanation,
> >>> please let me know, it will help me better understand the subsystem. In
> >>> the meantime I'll continue to investigate this and prepare a patch for
> >>> it.
> >>
> >> If there's something corrupting the superblock while the filesystem is
> >> mounted, we need to find what is corrupting the SB and fix *that*. Not
> >> try
> >> to paper over the problem by not using the on-disk data... Maybe journal
> >> replay is corrupting the value or something like that?
> >>
> >> ??????????????????????????????? Honza
> >>
> >
> > Ok, I agree. First thing would be to understand the reproducer and to
> > simplify it if possible. I haven't yet decoded what the syz repro is
> > doing at
> > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > Will reply to this email thread once I understand what's happening. If
> > you or someone else can decode the syz repro faster than me, shoot.
> >
>
> I can now explain how the contents of the super block of the buffer get
> corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> reproducer maps 6MB of data starting at offset 0 in the target's file
> ("./bus"), then it starts overriding the data with something else, by
> using memcpy, memset, individual byte inits. Does that mean that we
> shouldn't rely on the contents of the super block in the buffer after we
> mount the file system? If so, then my patch stands. I'll be happy to
> extend it if needed. Below one may find a step by step interpretation of
> the reproducer.
>
> We have a strace log for the same bug, but on Android 5.15:
> https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
>
> Look for pid 328. You notice that the bpf() syscalls return error, so I
> commented them out in the c repro to confirm that they are not the
> cause. The bug reproduced without the bpf() calls. One can find the c
> repro at:
> https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
>
> Let's look at these calls, just before the bug was hit:
> [pid 328] open("./bus",
> O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> 000) = 4
> [pid 328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> [pid 328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> [pid 328] mmap(0x20000000, 6291456,
> PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> 5, 0) = 0x20000000

Yeah, looking at the reproducer, before this the reproducer also mounts
/dev/loop0 as ext4 filesystem.

> - ./bus is created (if it does not exist), fd 4 is returned.
> - /dev/loop0 is mounted to ./bus
> - then it creates a new file descriptor (5) for the same ./bus
> - then it creates a mapping for ./bus starting at offset zero. The
> mapped area is at 0x20000000 and is of 0x600000ul length.

So the result is that the reproducer modified the block device while it is
mounted by the filesystem. We know cases like this can crash the kernel and
it is inherently difficult to fix. We have to trust the buffer cache
contents as otherwise the performance will be unacceptable. For historical
reasons we also have to allow modifications of buffer cache while ext4 is
mounted because tune2fs uses this to e.g. update the label of a mounted
filesystem.

Long-term we are moving ext4 in a direction where we can disallow block
device modifications while the fs is mounted but we are not there yet. I've
discussed some shorter-term solution to avoid such known problems with syzbot
developers and what seems plausible would be a kconfig option to disallow
writing to a block device when it is exclusively open by someone else.
But so far I didn't get to trying whether this would reasonably work. Would
you be interested in having a look into this?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-03-13 12:27:45

by yebin

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum



On 2023/3/13 19:57, Jan Kara wrote:
> Hi Tudor!
>
> On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
>> On 3/7/23 11:02, Tudor Ambarus wrote:
>>> On 3/7/23 10:39, Jan Kara wrote:
>>>> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
>>>>> On 2/13/23 15:56, syzbot wrote:
>>>>>> syzbot has found a reproducer for the following issue on:
>>>>>>
>>>>>> HEAD commit: ceaa837f96ad Linux 6.2-rc8
>>>>>> git tree: upstream
>>>>>> console output:
>>>>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
>>>>>> kernel config:
>>>>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
>>>>>> dashboard link:
>>>>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
>>>>>> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils
>>>>>> for Debian) 2.35.2
>>>>>> syz repro:
>>>>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
>>>>>>
>>>>>> Downloadable assets:
>>>>>> disk image:
>>>>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
>>>>>> vmlinux:
>>>>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
>>>>>> kernel image:
>>>>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
>>>>>> mounted in repro:
>>>>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
>>>>>>
>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the
>>>>>> commit:
>>>>>> Reported-by: [email protected]
>>>>>>
>>>>>> ==================================================================
>>>>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
>>>>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
>>>>>>
>>>>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
>>>>>> 6.2.0-rc8-syzkaller #0
>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>>>> BIOS Google 01/21/2023
>>>>>> Call Trace:
>>>>>> <TASK>
>>>>>> __dump_stack lib/dump_stack.c:88 [inline]
>>>>>> dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
>>>>>> print_address_description mm/kasan/report.c:306 [inline]
>>>>>> print_report+0x163/0x4f0 mm/kasan/report.c:417
>>>>>> kasan_report+0x13a/0x170 mm/kasan/report.c:517
>>>>>> crc16+0x1fb/0x280 lib/crc16.c:58
>>>>>> ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
>>>>>> ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
>>>>>> ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
>>>>>> ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
>>>>>> ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
>>>>>> ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
>>>>>> ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
>>>>>> ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
>>>>>> ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
>>>>>> ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
>>>>>> evict+0x2a4/0x620 fs/inode.c:664
>>>>>> do_unlinkat+0x4f1/0x930 fs/namei.c:4327
>>>>>> __do_sys_unlink fs/namei.c:4368 [inline]
>>>>>> __se_sys_unlink fs/namei.c:4366 [inline]
>>>>>> __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>> RIP: 0033:0x7fbc85a8c0f9
>>>>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
>>>>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
>>>>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>>>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
>>>>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
>>>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
>>>>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>>>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
>>>>>> </TASK>
>>>>>>
>>>>>> The buggy address belongs to the physical page:
>>>>>> page:ffffea0001f78000 refcount:0 mapcount:-128
>>>>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
>>>>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
>>>>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
>>>>>> 0000000000000000
>>>>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
>>>>>> 0000000000000000
>>>>>> page dumped because: kasan: bad access detected
>>>>>> page_owner tracks the page as freed
>>>>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
>>>>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
>>>>>> prep_new_page mm/page_alloc.c:2531 [inline]
>>>>>> get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
>>>>>> __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
>>>>>> alloc_slab_page+0x6a/0x160 mm/slub.c:1851
>>>>>> allocate_slab mm/slub.c:1998 [inline]
>>>>>> new_slab+0x84/0x2f0 mm/slub.c:2051
>>>>>> ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
>>>>>> __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
>>>>>> kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
>>>>>> mt_alloc_bulk lib/maple_tree.c:157 [inline]
>>>>>> mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
>>>>>> mas_node_count_gfp lib/maple_tree.c:1316 [inline]
>>>>>> mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
>>>>>> vma_expand+0x277/0x850 mm/mmap.c:541
>>>>>> mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
>>>>>> do_mmap+0x8c9/0xf70 mm/mmap.c:1411
>>>>>> vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>> page last free stack trace:
>>>>>> reset_page_owner include/linux/page_owner.h:24 [inline]
>>>>>> free_pages_prepare mm/page_alloc.c:1446 [inline]
>>>>>> free_pcp_prepare mm/page_alloc.c:1496 [inline]
>>>>>> free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
>>>>>> free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
>>>>>> qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
>>>>>> kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
>>>>>> __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
>>>>>> kasan_slab_alloc include/linux/kasan.h:201 [inline]
>>>>>> slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
>>>>>> slab_alloc_node mm/slub.c:3452 [inline]
>>>>>> kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
>>>>>> __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
>>>>>> alloc_skb include/linux/skbuff.h:1270 [inline]
>>>>>> alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
>>>>>> sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
>>>>>> unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>> sock_sendmsg net/socket.c:734 [inline]
>>>>>> __sys_sendto+0x475/0x5f0 net/socket.c:2117
>>>>>> __do_sys_sendto net/socket.c:2129 [inline]
>>>>>> __se_sys_sendto net/socket.c:2125 [inline]
>>>>>> __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>
>>>>>> Memory state around the buggy address:
>>>>>> ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>> ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>> ^
>>>>>> ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>> ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>> ==================================================================
>>>>>>
>>>>>
>>>>> I think the patch from below should fix it.
>>>>>
>>>>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
>>>>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
>>>>> super block in the buffer get corrupted sometime after the .get_tree
>>>>> (which eventually calls __ext4_fill_super()) is called. So instead of
>>>>> relying on the contents of the buffer, we should instead rely on the
>>>>> s_desc_size initialized at the __ext4_fill_super() time.
>>>>>
>>>>> If someone finds this good (or bad), or has a more in depth explanation,
>>>>> please let me know, it will help me better understand the subsystem. In
>>>>> the meantime I'll continue to investigate this and prepare a patch for
>>>>> it.
>>>> If there's something corrupting the superblock while the filesystem is
>>>> mounted, we need to find what is corrupting the SB and fix *that*. Not
>>>> try
>>>> to paper over the problem by not using the on-disk data... Maybe journal
>>>> replay is corrupting the value or something like that?
>>>>
>>>> Honza
>>>>
>>> Ok, I agree. First thing would be to understand the reproducer and to
>>> simplify it if possible. I haven't yet decoded what the syz repro is
>>> doing at
>>> https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
>>> Will reply to this email thread once I understand what's happening. If
>>> you or someone else can decode the syz repro faster than me, shoot.
>>>
>> I can now explain how the contents of the super block of the buffer get
>> corrupted. After the ext4 fs is mounted to the target ("./bus"), the
>> reproducer maps 6MB of data starting at offset 0 in the target's file
>> ("./bus"), then it starts overriding the data with something else, by
>> using memcpy, memset, individual byte inits. Does that mean that we
>> shouldn't rely on the contents of the super block in the buffer after we
>> mount the file system? If so, then my patch stands. I'll be happy to
>> extend it if needed. Below one may find a step by step interpretation of
>> the reproducer.
>>
>> We have a strace log for the same bug, but on Android 5.15:
>> https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
>>
>> Look for pid 328. You notice that the bpf() syscalls return error, so I
>> commented them out in the c repro to confirm that they are not the
>> cause. The bug reproduced without the bpf() calls. One can find the c
>> repro at:
>> https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
>>
>> Let's look at these calls, just before the bug was hit:
>> [pid 328] open("./bus",
>> O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
>> 000) = 4
>> [pid 328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
>> [pid 328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
>> [pid 328] mmap(0x20000000, 6291456,
>> PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
>> 5, 0) = 0x20000000
> Yeah, looking at the reproducer, before this the reproducer also mounts
> /dev/loop0 as ext4 filesystem.
>
>> - ./bus is created (if it does not exist), fd 4 is returned.
>> - /dev/loop0 is mounted to ./bus
>> - then it creates a new file descriptor (5) for the same ./bus
>> - then it creates a mapping for ./bus starting at offset zero. The
>> mapped area is at 0x20000000 and is of 0x600000ul length.
> So the result is that the reproducer modified the block device while it is
> mounted by the filesystem. We know cases like this can crash the kernel and
> it is inherently difficult to fix. We have to trust the buffer cache
> contents as otherwise the performance will be unacceptable. For historical
> reasons we also have to allow modifications of buffer cache while ext4 is
> mounted because tune2fs uses this to e.g. update the label of a mounted
> filesystem.
>
> Long-term we are moving ext4 in a direction where we can disallow block
> device modifications while the fs is mounted but we are not there yet. I've
> discussed some shorter-term solution to avoid such known problems with syzbot
> developers and what seems plausible would be a kconfig option to disallow
> writing to a block device when it is exclusively open by someone else.
> But so far I didn't get to trying whether this would reasonably work. Would
> you be interested in having a look into this?
I am interested in this job. The file system is often damaged by writing
block devices,
which is a headache. I have always wanted to eradicate this kind of problem.
A few months ago, I tried to add a mount parameter to prohibit
modification after the
block device is mounted.But I encountered several problems that led to
the termination
of my attempt. First of all, the 32-bit super block flags have been used
up and need to
be extended. Secondly, I don't know how to handle read-only flag in the
case of multiple
mount points.
"disallow writing to a block device when it is exclusively open by
someone else. "
-> Perhaps we can add a new IOCTL command to control whether write
operations are
allowed after the block device has been exclusively opened. I don't know
if this is feasible?
Do you have any good suggestions?
> Honza


2023-03-13 13:02:07

by Jan Kara

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

On Mon 13-03-23 20:27:34, yebin wrote:
> On 2023/3/13 19:57, Jan Kara wrote:
> > On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> > > On 3/7/23 11:02, Tudor Ambarus wrote:
> > > > On 3/7/23 10:39, Jan Kara wrote:
> > > > > On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> > > > > > On 2/13/23 15:56, syzbot wrote:
> > > > > > > syzbot has found a reproducer for the following issue on:
> > > > > > >
> > > > > > > HEAD commit: ceaa837f96ad Linux 6.2-rc8
> > > > > > > git tree: upstream
> > > > > > > console output:
> > > > > > > https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > > > > > > kernel config:
> > > > > > > https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > > > > > > dashboard link:
> > > > > > > https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > > > > > > compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils
> > > > > > > for Debian) 2.35.2
> > > > > > > syz repro:
> > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> > > > > > >
> > > > > > > Downloadable assets:
> > > > > > > disk image:
> > > > > > > https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > > > > > > vmlinux:
> > > > > > > https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > > > > > > kernel image:
> > > > > > > https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > > > > > > mounted in repro:
> > > > > > > https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> > > > > > >
> > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the
> > > > > > > commit:
> > > > > > > Reported-by: [email protected]
> > > > > > >
> > > > > > > ==================================================================
> > > > > > > BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > > > > > > Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> > > > > > >
> > > > > > > CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> > > > > > > 6.2.0-rc8-syzkaller #0
> > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > BIOS Google 01/21/2023
> > > > > > > Call Trace:
> > > > > > > <TASK>
> > > > > > > __dump_stack lib/dump_stack.c:88 [inline]
> > > > > > > dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > > > > > > print_address_description mm/kasan/report.c:306 [inline]
> > > > > > > print_report+0x163/0x4f0 mm/kasan/report.c:417
> > > > > > > kasan_report+0x13a/0x170 mm/kasan/report.c:517
> > > > > > > crc16+0x1fb/0x280 lib/crc16.c:58
> > > > > > > ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> > > > > > > ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> > > > > > > ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> > > > > > > ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> > > > > > > ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> > > > > > > ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> > > > > > > ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> > > > > > > ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> > > > > > > ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> > > > > > > ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> > > > > > > evict+0x2a4/0x620 fs/inode.c:664
> > > > > > > do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> > > > > > > __do_sys_unlink fs/namei.c:4368 [inline]
> > > > > > > __se_sys_unlink fs/namei.c:4366 [inline]
> > > > > > > __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> > > > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > RIP: 0033:0x7fbc85a8c0f9
> > > > > > > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> > > > > > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> > > > > > > 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > > > > RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > > > > > > RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > > > > > > RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > > > > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> > > > > > > </TASK>
> > > > > > >
> > > > > > > The buggy address belongs to the physical page:
> > > > > > > page:ffffea0001f78000 refcount:0 mapcount:-128
> > > > > > > mapping:0000000000000000 index:0x0 pfn:0x7de00
> > > > > > > flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > > > > > raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> > > > > > > 0000000000000000
> > > > > > > raw: 0000000000000000 0000000000000001 00000000ffffff7f
> > > > > > > 0000000000000000
> > > > > > > page dumped because: kasan: bad access detected
> > > > > > > page_owner tracks the page as freed
> > > > > > > page last allocated via order 1, migratetype Unmovable, gfp_mask
> > > > > > > 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> > > > > > > prep_new_page mm/page_alloc.c:2531 [inline]
> > > > > > > get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> > > > > > > __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> > > > > > > alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> > > > > > > allocate_slab mm/slub.c:1998 [inline]
> > > > > > > new_slab+0x84/0x2f0 mm/slub.c:2051
> > > > > > > ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> > > > > > > __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> > > > > > > kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> > > > > > > mt_alloc_bulk lib/maple_tree.c:157 [inline]
> > > > > > > mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> > > > > > > mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> > > > > > > mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> > > > > > > vma_expand+0x277/0x850 mm/mmap.c:541
> > > > > > > mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> > > > > > > do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> > > > > > > vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> > > > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > page last free stack trace:
> > > > > > > reset_page_owner include/linux/page_owner.h:24 [inline]
> > > > > > > free_pages_prepare mm/page_alloc.c:1446 [inline]
> > > > > > > free_pcp_prepare mm/page_alloc.c:1496 [inline]
> > > > > > > free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> > > > > > > free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> > > > > > > qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> > > > > > > kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> > > > > > > __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> > > > > > > kasan_slab_alloc include/linux/kasan.h:201 [inline]
> > > > > > > slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> > > > > > > slab_alloc_node mm/slub.c:3452 [inline]
> > > > > > > kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> > > > > > > __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> > > > > > > alloc_skb include/linux/skbuff.h:1270 [inline]
> > > > > > > alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> > > > > > > sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> > > > > > > unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> > > > > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > > > sock_sendmsg net/socket.c:734 [inline]
> > > > > > > __sys_sendto+0x475/0x5f0 net/socket.c:2117
> > > > > > > __do_sys_sendto net/socket.c:2129 [inline]
> > > > > > > __se_sys_sendto net/socket.c:2125 [inline]
> > > > > > > __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> > > > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > >
> > > > > > > Memory state around the buggy address:
> > > > > > > ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > > ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > > > ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > ^
> > > > > > > ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > ==================================================================
> > > > > > >
> > > > > >
> > > > > > I think the patch from below should fix it.
> > > > > >
> > > > > > I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> > > > > > EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> > > > > > super block in the buffer get corrupted sometime after the .get_tree
> > > > > > (which eventually calls __ext4_fill_super()) is called. So instead of
> > > > > > relying on the contents of the buffer, we should instead rely on the
> > > > > > s_desc_size initialized at the __ext4_fill_super() time.
> > > > > >
> > > > > > If someone finds this good (or bad), or has a more in depth explanation,
> > > > > > please let me know, it will help me better understand the subsystem. In
> > > > > > the meantime I'll continue to investigate this and prepare a patch for
> > > > > > it.
> > > > > If there's something corrupting the superblock while the filesystem is
> > > > > mounted, we need to find what is corrupting the SB and fix *that*. Not
> > > > > try
> > > > > to paper over the problem by not using the on-disk data... Maybe journal
> > > > > replay is corrupting the value or something like that?
> > > > >
> > > > > Honza
> > > > >
> > > > Ok, I agree. First thing would be to understand the reproducer and to
> > > > simplify it if possible. I haven't yet decoded what the syz repro is
> > > > doing at
> > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > > > Will reply to this email thread once I understand what's happening. If
> > > > you or someone else can decode the syz repro faster than me, shoot.
> > > >
> > > I can now explain how the contents of the super block of the buffer get
> > > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > > reproducer maps 6MB of data starting at offset 0 in the target's file
> > > ("./bus"), then it starts overriding the data with something else, by
> > > using memcpy, memset, individual byte inits. Does that mean that we
> > > shouldn't rely on the contents of the super block in the buffer after we
> > > mount the file system? If so, then my patch stands. I'll be happy to
> > > extend it if needed. Below one may find a step by step interpretation of
> > > the reproducer.
> > >
> > > We have a strace log for the same bug, but on Android 5.15:
> > > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
> > >
> > > Look for pid 328. You notice that the bpf() syscalls return error, so I
> > > commented them out in the c repro to confirm that they are not the
> > > cause. The bug reproduced without the bpf() calls. One can find the c
> > > repro at:
> > > https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
> > >
> > > Let's look at these calls, just before the bug was hit:
> > > [pid 328] open("./bus",
> > > O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> > > 000) = 4
> > > [pid 328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> > > [pid 328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> > > [pid 328] mmap(0x20000000, 6291456,
> > > PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> > > 5, 0) = 0x20000000
> > Yeah, looking at the reproducer, before this the reproducer also mounts
> > /dev/loop0 as ext4 filesystem.
> >
> > > - ./bus is created (if it does not exist), fd 4 is returned.
> > > - /dev/loop0 is mounted to ./bus
> > > - then it creates a new file descriptor (5) for the same ./bus
> > > - then it creates a mapping for ./bus starting at offset zero. The
> > > mapped area is at 0x20000000 and is of 0x600000ul length.
> > So the result is that the reproducer modified the block device while it is
> > mounted by the filesystem. We know cases like this can crash the kernel and
> > it is inherently difficult to fix. We have to trust the buffer cache
> > contents as otherwise the performance will be unacceptable. For historical
> > reasons we also have to allow modifications of buffer cache while ext4 is
> > mounted because tune2fs uses this to e.g. update the label of a mounted
> > filesystem.
> >
> > Long-term we are moving ext4 in a direction where we can disallow block
> > device modifications while the fs is mounted but we are not there yet. I've
> > discussed some shorter-term solution to avoid such known problems with syzbot
> > developers and what seems plausible would be a kconfig option to disallow
> > writing to a block device when it is exclusively open by someone else.
> > But so far I didn't get to trying whether this would reasonably work. Would
> > you be interested in having a look into this?
>
> I am interested in this job. The file system is often damaged by writing
> block devices, which is a headache. I have always wanted to eradicate
> this kind of problem. A few months ago, I tried to add a mount parameter
> to prohibit modification after the block device is mounted.But I
> encountered several problems that led to the termination of my attempt.
> First of all, the 32-bit super block flags have been used up and need to
> be extended. Secondly, I don't know how to handle read-only flag in the
> case of multiple mount points.
> "disallow writing to a block device when it is exclusively open by someone
> else. "
> -> Perhaps we can add a new IOCTL command to control whether write
> operations are allowed after the block device has been exclusively
> opened. I don't know if this is feasible? Do you have any good
> suggestions?

Well, ioctl() for syzbot would be possible as well but for start I'd try
whether the idea with kconfig option will work. Then it will be enough to
just make sure all kernels used for fuzzing are built with this option set.
Thanks for having a look into this!

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-03-13 13:18:09

by yebin (H)

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum



On 2023/3/13 21:01, Jan Kara wrote:
> On Mon 13-03-23 20:27:34, yebin wrote:
>> On 2023/3/13 19:57, Jan Kara wrote:
>>> On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
>>>> On 3/7/23 11:02, Tudor Ambarus wrote:
>>>>> On 3/7/23 10:39, Jan Kara wrote:
>>>>>> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
>>>>>>> On 2/13/23 15:56, syzbot wrote:
>>>>>>>> syzbot has found a reproducer for the following issue on:
>>>>>>>>
>>>>>>>> HEAD commit: ceaa837f96ad Linux 6.2-rc8
>>>>>>>> git tree: upstream
>>>>>>>> console output:
>>>>>>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
>>>>>>>> kernel config:
>>>>>>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
>>>>>>>> dashboard link:
>>>>>>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
>>>>>>>> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils
>>>>>>>> for Debian) 2.35.2
>>>>>>>> syz repro:
>>>>>>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
>>>>>>>>
>>>>>>>> Downloadable assets:
>>>>>>>> disk image:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
>>>>>>>> vmlinux:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
>>>>>>>> kernel image:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
>>>>>>>> mounted in repro:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
>>>>>>>>
>>>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the
>>>>>>>> commit:
>>>>>>>> Reported-by: [email protected]
>>>>>>>>
>>>>>>>> ==================================================================
>>>>>>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
>>>>>>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
>>>>>>>>
>>>>>>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
>>>>>>>> 6.2.0-rc8-syzkaller #0
>>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>>>>>> BIOS Google 01/21/2023
>>>>>>>> Call Trace:
>>>>>>>> <TASK>
>>>>>>>> __dump_stack lib/dump_stack.c:88 [inline]
>>>>>>>> dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
>>>>>>>> print_address_description mm/kasan/report.c:306 [inline]
>>>>>>>> print_report+0x163/0x4f0 mm/kasan/report.c:417
>>>>>>>> kasan_report+0x13a/0x170 mm/kasan/report.c:517
>>>>>>>> crc16+0x1fb/0x280 lib/crc16.c:58
>>>>>>>> ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
>>>>>>>> ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
>>>>>>>> ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
>>>>>>>> ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
>>>>>>>> ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
>>>>>>>> ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
>>>>>>>> ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
>>>>>>>> ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
>>>>>>>> ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
>>>>>>>> ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
>>>>>>>> evict+0x2a4/0x620 fs/inode.c:664
>>>>>>>> do_unlinkat+0x4f1/0x930 fs/namei.c:4327
>>>>>>>> __do_sys_unlink fs/namei.c:4368 [inline]
>>>>>>>> __se_sys_unlink fs/namei.c:4366 [inline]
>>>>>>>> __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
>>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>> RIP: 0033:0x7fbc85a8c0f9
>>>>>>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
>>>>>>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
>>>>>>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>>>>>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
>>>>>>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
>>>>>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
>>>>>>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
>>>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>>>>>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
>>>>>>>> </TASK>
>>>>>>>>
>>>>>>>> The buggy address belongs to the physical page:
>>>>>>>> page:ffffea0001f78000 refcount:0 mapcount:-128
>>>>>>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
>>>>>>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
>>>>>>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
>>>>>>>> 0000000000000000
>>>>>>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
>>>>>>>> 0000000000000000
>>>>>>>> page dumped because: kasan: bad access detected
>>>>>>>> page_owner tracks the page as freed
>>>>>>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
>>>>>>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
>>>>>>>> prep_new_page mm/page_alloc.c:2531 [inline]
>>>>>>>> get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
>>>>>>>> __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
>>>>>>>> alloc_slab_page+0x6a/0x160 mm/slub.c:1851
>>>>>>>> allocate_slab mm/slub.c:1998 [inline]
>>>>>>>> new_slab+0x84/0x2f0 mm/slub.c:2051
>>>>>>>> ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
>>>>>>>> __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
>>>>>>>> kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
>>>>>>>> mt_alloc_bulk lib/maple_tree.c:157 [inline]
>>>>>>>> mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
>>>>>>>> mas_node_count_gfp lib/maple_tree.c:1316 [inline]
>>>>>>>> mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
>>>>>>>> vma_expand+0x277/0x850 mm/mmap.c:541
>>>>>>>> mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
>>>>>>>> do_mmap+0x8c9/0xf70 mm/mmap.c:1411
>>>>>>>> vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
>>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>> page last free stack trace:
>>>>>>>> reset_page_owner include/linux/page_owner.h:24 [inline]
>>>>>>>> free_pages_prepare mm/page_alloc.c:1446 [inline]
>>>>>>>> free_pcp_prepare mm/page_alloc.c:1496 [inline]
>>>>>>>> free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
>>>>>>>> free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
>>>>>>>> qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
>>>>>>>> kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
>>>>>>>> __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
>>>>>>>> kasan_slab_alloc include/linux/kasan.h:201 [inline]
>>>>>>>> slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
>>>>>>>> slab_alloc_node mm/slub.c:3452 [inline]
>>>>>>>> kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
>>>>>>>> __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
>>>>>>>> alloc_skb include/linux/skbuff.h:1270 [inline]
>>>>>>>> alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
>>>>>>>> sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
>>>>>>>> unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
>>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>> sock_sendmsg net/socket.c:734 [inline]
>>>>>>>> __sys_sendto+0x475/0x5f0 net/socket.c:2117
>>>>>>>> __do_sys_sendto net/socket.c:2129 [inline]
>>>>>>>> __se_sys_sendto net/socket.c:2125 [inline]
>>>>>>>> __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
>>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>>
>>>>>>>> Memory state around the buggy address:
>>>>>>>> ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>>>> ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ^
>>>>>>>> ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ==================================================================
>>>>>>>>
>>>>>>> I think the patch from below should fix it.
>>>>>>>
>>>>>>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
>>>>>>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
>>>>>>> super block in the buffer get corrupted sometime after the .get_tree
>>>>>>> (which eventually calls __ext4_fill_super()) is called. So instead of
>>>>>>> relying on the contents of the buffer, we should instead rely on the
>>>>>>> s_desc_size initialized at the __ext4_fill_super() time.
>>>>>>>
>>>>>>> If someone finds this good (or bad), or has a more in depth explanation,
>>>>>>> please let me know, it will help me better understand the subsystem. In
>>>>>>> the meantime I'll continue to investigate this and prepare a patch for
>>>>>>> it.
>>>>>> If there's something corrupting the superblock while the filesystem is
>>>>>> mounted, we need to find what is corrupting the SB and fix *that*. Not
>>>>>> try
>>>>>> to paper over the problem by not using the on-disk data... Maybe journal
>>>>>> replay is corrupting the value or something like that?
>>>>>>
>>>>>> Honza
>>>>>>
>>>>> Ok, I agree. First thing would be to understand the reproducer and to
>>>>> simplify it if possible. I haven't yet decoded what the syz repro is
>>>>> doing at
>>>>> https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
>>>>> Will reply to this email thread once I understand what's happening. If
>>>>> you or someone else can decode the syz repro faster than me, shoot.
>>>>>
>>>> I can now explain how the contents of the super block of the buffer get
>>>> corrupted. After the ext4 fs is mounted to the target ("./bus"), the
>>>> reproducer maps 6MB of data starting at offset 0 in the target's file
>>>> ("./bus"), then it starts overriding the data with something else, by
>>>> using memcpy, memset, individual byte inits. Does that mean that we
>>>> shouldn't rely on the contents of the super block in the buffer after we
>>>> mount the file system? If so, then my patch stands. I'll be happy to
>>>> extend it if needed. Below one may find a step by step interpretation of
>>>> the reproducer.
>>>>
>>>> We have a strace log for the same bug, but on Android 5.15:
>>>> https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
>>>>
>>>> Look for pid 328. You notice that the bpf() syscalls return error, so I
>>>> commented them out in the c repro to confirm that they are not the
>>>> cause. The bug reproduced without the bpf() calls. One can find the c
>>>> repro at:
>>>> https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
>>>>
>>>> Let's look at these calls, just before the bug was hit:
>>>> [pid 328] open("./bus",
>>>> O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
>>>> 000) = 4
>>>> [pid 328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
>>>> [pid 328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
>>>> [pid 328] mmap(0x20000000, 6291456,
>>>> PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
>>>> 5, 0) = 0x20000000
>>> Yeah, looking at the reproducer, before this the reproducer also mounts
>>> /dev/loop0 as ext4 filesystem.
>>>
>>>> - ./bus is created (if it does not exist), fd 4 is returned.
>>>> - /dev/loop0 is mounted to ./bus
>>>> - then it creates a new file descriptor (5) for the same ./bus
>>>> - then it creates a mapping for ./bus starting at offset zero. The
>>>> mapped area is at 0x20000000 and is of 0x600000ul length.
>>> So the result is that the reproducer modified the block device while it is
>>> mounted by the filesystem. We know cases like this can crash the kernel and
>>> it is inherently difficult to fix. We have to trust the buffer cache
>>> contents as otherwise the performance will be unacceptable. For historical
>>> reasons we also have to allow modifications of buffer cache while ext4 is
>>> mounted because tune2fs uses this to e.g. update the label of a mounted
>>> filesystem.
>>>
>>> Long-term we are moving ext4 in a direction where we can disallow block
>>> device modifications while the fs is mounted but we are not there yet. I've
>>> discussed some shorter-term solution to avoid such known problems with syzbot
>>> developers and what seems plausible would be a kconfig option to disallow
>>> writing to a block device when it is exclusively open by someone else.
>>> But so far I didn't get to trying whether this would reasonably work. Would
>>> you be interested in having a look into this?
>> I am interested in this job. The file system is often damaged by writing
>> block devices, which is a headache. I have always wanted to eradicate
>> this kind of problem. A few months ago, I tried to add a mount parameter
>> to prohibit modification after the block device is mounted.But I
>> encountered several problems that led to the termination of my attempt.
>> First of all, the 32-bit super block flags have been used up and need to
>> be extended. Secondly, I don't know how to handle read-only flag in the
>> case of multiple mount points.
>> "disallow writing to a block device when it is exclusively open by someone
>> else. "
>> -> Perhaps we can add a new IOCTL command to control whether write
>> operations are allowed after the block device has been exclusively
>> opened. I don't know if this is feasible? Do you have any good
>> suggestions?
> Well, ioctl() for syzbot would be possible as well but for start I'd try
> whether the idea with kconfig option will work. Then it will be enough to
> just make sure all kernels used for fuzzing are built with this option set.
> Thanks for having a look into this!
In fact, I also want to solve the problem of file system damage caused
by writing raw disks
in the production environment. Use kconfig directly to control whether
it loses flexibility in
the production environment.
>
> Honza


2023-03-13 14:44:35

by Tudor Ambarus

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

Hi, Jan, Ye,

On 3/13/23 13:01, Jan Kara wrote:
> On Mon 13-03-23 20:27:34, yebin wrote:
>> On 2023/3/13 19:57, Jan Kara wrote:
>>> On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
>>>> On 3/7/23 11:02, Tudor Ambarus wrote:
>>>>> On 3/7/23 10:39, Jan Kara wrote:
>>>>>> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
>>>>>>> On 2/13/23 15:56, syzbot wrote:
>>>>>>>> syzbot has found a reproducer for the following issue on:
>>>>>>>>
>>>>>>>> HEAD commit: ceaa837f96ad Linux 6.2-rc8
>>>>>>>> git tree: upstream
>>>>>>>> console output:
>>>>>>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
>>>>>>>> kernel config:
>>>>>>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
>>>>>>>> dashboard link:
>>>>>>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
>>>>>>>> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils
>>>>>>>> for Debian) 2.35.2
>>>>>>>> syz repro:
>>>>>>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
>>>>>>>>
>>>>>>>> Downloadable assets:
>>>>>>>> disk image:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
>>>>>>>> vmlinux:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
>>>>>>>> kernel image:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
>>>>>>>> mounted in repro:
>>>>>>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
>>>>>>>>
>>>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the
>>>>>>>> commit:
>>>>>>>> Reported-by: [email protected]
>>>>>>>>
>>>>>>>> ==================================================================
>>>>>>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
>>>>>>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
>>>>>>>>
>>>>>>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
>>>>>>>> 6.2.0-rc8-syzkaller #0
>>>>>>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>>>>>>>> BIOS Google 01/21/2023
>>>>>>>> Call Trace:
>>>>>>>> <TASK>
>>>>>>>> __dump_stack lib/dump_stack.c:88 [inline]
>>>>>>>> dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
>>>>>>>> print_address_description mm/kasan/report.c:306 [inline]
>>>>>>>> print_report+0x163/0x4f0 mm/kasan/report.c:417
>>>>>>>> kasan_report+0x13a/0x170 mm/kasan/report.c:517
>>>>>>>> crc16+0x1fb/0x280 lib/crc16.c:58
>>>>>>>> ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
>>>>>>>> ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
>>>>>>>> ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
>>>>>>>> ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
>>>>>>>> ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
>>>>>>>> ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
>>>>>>>> ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
>>>>>>>> ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
>>>>>>>> ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
>>>>>>>> ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
>>>>>>>> evict+0x2a4/0x620 fs/inode.c:664
>>>>>>>> do_unlinkat+0x4f1/0x930 fs/namei.c:4327
>>>>>>>> __do_sys_unlink fs/namei.c:4368 [inline]
>>>>>>>> __se_sys_unlink fs/namei.c:4366 [inline]
>>>>>>>> __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
>>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>> RIP: 0033:0x7fbc85a8c0f9
>>>>>>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
>>>>>>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
>>>>>>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>>>>>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
>>>>>>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
>>>>>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
>>>>>>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
>>>>>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>>>>>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
>>>>>>>> </TASK>
>>>>>>>>
>>>>>>>> The buggy address belongs to the physical page:
>>>>>>>> page:ffffea0001f78000 refcount:0 mapcount:-128
>>>>>>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
>>>>>>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
>>>>>>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
>>>>>>>> 0000000000000000
>>>>>>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
>>>>>>>> 0000000000000000
>>>>>>>> page dumped because: kasan: bad access detected
>>>>>>>> page_owner tracks the page as freed
>>>>>>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
>>>>>>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
>>>>>>>> prep_new_page mm/page_alloc.c:2531 [inline]
>>>>>>>> get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
>>>>>>>> __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
>>>>>>>> alloc_slab_page+0x6a/0x160 mm/slub.c:1851
>>>>>>>> allocate_slab mm/slub.c:1998 [inline]
>>>>>>>> new_slab+0x84/0x2f0 mm/slub.c:2051
>>>>>>>> ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
>>>>>>>> __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
>>>>>>>> kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
>>>>>>>> mt_alloc_bulk lib/maple_tree.c:157 [inline]
>>>>>>>> mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
>>>>>>>> mas_node_count_gfp lib/maple_tree.c:1316 [inline]
>>>>>>>> mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
>>>>>>>> vma_expand+0x277/0x850 mm/mmap.c:541
>>>>>>>> mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
>>>>>>>> do_mmap+0x8c9/0xf70 mm/mmap.c:1411
>>>>>>>> vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
>>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>> page last free stack trace:
>>>>>>>> reset_page_owner include/linux/page_owner.h:24 [inline]
>>>>>>>> free_pages_prepare mm/page_alloc.c:1446 [inline]
>>>>>>>> free_pcp_prepare mm/page_alloc.c:1496 [inline]
>>>>>>>> free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
>>>>>>>> free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
>>>>>>>> qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
>>>>>>>> kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
>>>>>>>> __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
>>>>>>>> kasan_slab_alloc include/linux/kasan.h:201 [inline]
>>>>>>>> slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
>>>>>>>> slab_alloc_node mm/slub.c:3452 [inline]
>>>>>>>> kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
>>>>>>>> __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
>>>>>>>> alloc_skb include/linux/skbuff.h:1270 [inline]
>>>>>>>> alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
>>>>>>>> sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
>>>>>>>> unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
>>>>>>>> sock_sendmsg_nosec net/socket.c:714 [inline]
>>>>>>>> sock_sendmsg net/socket.c:734 [inline]
>>>>>>>> __sys_sendto+0x475/0x5f0 net/socket.c:2117
>>>>>>>> __do_sys_sendto net/socket.c:2129 [inline]
>>>>>>>> __se_sys_sendto net/socket.c:2125 [inline]
>>>>>>>> __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
>>>>>>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>>>>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>>>>>>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>>
>>>>>>>> Memory state around the buggy address:
>>>>>>>> ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>>>> ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ^
>>>>>>>> ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>>>> ==================================================================
>>>>>>>>
>>>>>>>
>>>>>>> I think the patch from below should fix it.
>>>>>>>
>>>>>>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
>>>>>>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
>>>>>>> super block in the buffer get corrupted sometime after the .get_tree
>>>>>>> (which eventually calls __ext4_fill_super()) is called. So instead of
>>>>>>> relying on the contents of the buffer, we should instead rely on the
>>>>>>> s_desc_size initialized at the __ext4_fill_super() time.
>>>>>>>
>>>>>>> If someone finds this good (or bad), or has a more in depth explanation,
>>>>>>> please let me know, it will help me better understand the subsystem. In
>>>>>>> the meantime I'll continue to investigate this and prepare a patch for
>>>>>>> it.
>>>>>> If there's something corrupting the superblock while the filesystem is
>>>>>> mounted, we need to find what is corrupting the SB and fix *that*. Not
>>>>>> try
>>>>>> to paper over the problem by not using the on-disk data... Maybe journal
>>>>>> replay is corrupting the value or something like that?
>>>>>>
>>>>>> Honza
>>>>>>
>>>>> Ok, I agree. First thing would be to understand the reproducer and to
>>>>> simplify it if possible. I haven't yet decoded what the syz repro is
>>>>> doing at
>>>>> https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
>>>>> Will reply to this email thread once I understand what's happening. If
>>>>> you or someone else can decode the syz repro faster than me, shoot.
>>>>>
>>>> I can now explain how the contents of the super block of the buffer get
>>>> corrupted. After the ext4 fs is mounted to the target ("./bus"), the
>>>> reproducer maps 6MB of data starting at offset 0 in the target's file
>>>> ("./bus"), then it starts overriding the data with something else, by
>>>> using memcpy, memset, individual byte inits. Does that mean that we
>>>> shouldn't rely on the contents of the super block in the buffer after we
>>>> mount the file system? If so, then my patch stands. I'll be happy to
>>>> extend it if needed. Below one may find a step by step interpretation of
>>>> the reproducer.
>>>>
>>>> We have a strace log for the same bug, but on Android 5.15:
>>>> https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
>>>>
>>>> Look for pid 328. You notice that the bpf() syscalls return error, so I
>>>> commented them out in the c repro to confirm that they are not the
>>>> cause. The bug reproduced without the bpf() calls. One can find the c
>>>> repro at:
>>>> https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
>>>>
>>>> Let's look at these calls, just before the bug was hit:
>>>> [pid 328] open("./bus",
>>>> O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
>>>> 000) = 4
>>>> [pid 328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
>>>> [pid 328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
>>>> [pid 328] mmap(0x20000000, 6291456,
>>>> PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
>>>> 5, 0) = 0x20000000
>>> Yeah, looking at the reproducer, before this the reproducer also mounts
>>> /dev/loop0 as ext4 filesystem.
>>>
>>>> - ./bus is created (if it does not exist), fd 4 is returned.
>>>> - /dev/loop0 is mounted to ./bus
>>>> - then it creates a new file descriptor (5) for the same ./bus
>>>> - then it creates a mapping for ./bus starting at offset zero. The
>>>> mapped area is at 0x20000000 and is of 0x600000ul length.
>>> So the result is that the reproducer modified the block device while it is
>>> mounted by the filesystem. We know cases like this can crash the kernel and
>>> it is inherently difficult to fix. We have to trust the buffer cache
>>> contents as otherwise the performance will be unacceptable. For historical
>>> reasons we also have to allow modifications of buffer cache while ext4 is
>>> mounted because tune2fs uses this to e.g. update the label of a mounted
>>> filesystem.
>>>
>>> Long-term we are moving ext4 in a direction where we can disallow block
>>> device modifications while the fs is mounted but we are not there yet. I've

sounds good.

>>> discussed some shorter-term solution to avoid such known problems with syzbot
>>> developers and what seems plausible would be a kconfig option to disallow
>>> writing to a block device when it is exclusively open by someone else.

How do we determine when a block device is exclusively open by someone else?

>>> But so far I didn't get to trying whether this would reasonably work. Would
>>> you be interested in having a look into this?
>>
>> I am interested in this job. The file system is often damaged by writing

I'm fine with Ye handling this. If that's not the case I can take a look
too, but I need more pointers than the ones already provided, as I've
recently started skimming over ext4.

>> block devices, which is a headache. I have always wanted to eradicate
>> this kind of problem. A few months ago, I tried to add a mount parameter
>> to prohibit modification after the block device is mounted.But I
>> encountered several problems that led to the termination of my attempt.
>> First of all, the 32-bit super block flags have been used up and need to
>> be extended. Secondly, I don't know how to handle read-only flag in the
>> case of multiple mount points.
>> "disallow writing to a block device when it is exclusively open by someone
>> else. "
>> -> Perhaps we can add a new IOCTL command to control whether write
>> operations are allowed after the block device has been exclusively
>> opened. I don't know if this is feasible? Do you have any good
>> suggestions?
>
> Well, ioctl() for syzbot would be possible as well but for start I'd try
> whether the idea with kconfig option will work. Then it will be enough to
> just make sure all kernels used for fuzzing are built with this option set.

How should we treat such bugs until the kconfig option is introduced? Do
we let them open, do we mark them as won't fix? The kconfig solution
feels a bit as a workaround, the bugs will still be hit by someone not
selecting that config option.

Cheers,
ta

2023-03-13 14:55:19

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

On Mon, 13 Mar 2023 at 12:57, Jan Kara <[email protected]> wrote:
>
> Hi Tudor!
>
> On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> > On 3/7/23 11:02, Tudor Ambarus wrote:
> > > On 3/7/23 10:39, Jan Kara wrote:
> > >> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> > >>> On 2/13/23 15:56, syzbot wrote:
> > >>>> syzbot has found a reproducer for the following issue on:
> > >>>>
> > >>>> HEAD commit: ceaa837f96ad Linux 6.2-rc8
> > >>>> git tree: upstream
> > >>>> console output:
> > >>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > >>>> kernel config:
> > >>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > >>>> dashboard link:
> > >>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > >>>> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils
> > >>>> for Debian) 2.35.2
> > >>>> syz repro:
> > >>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> > >>>>
> > >>>> Downloadable assets:
> > >>>> disk image:
> > >>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > >>>> vmlinux:
> > >>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > >>>> kernel image:
> > >>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > >>>> mounted in repro:
> > >>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> > >>>>
> > >>>> IMPORTANT: if you fix the issue, please add the following tag to the
> > >>>> commit:
> > >>>> Reported-by: [email protected]
> > >>>>
> > >>>> ==================================================================
> > >>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > >>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> > >>>>
> > >>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> > >>>> 6.2.0-rc8-syzkaller #0
> > >>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
> > >>>> BIOS Google 01/21/2023
> > >>>> Call Trace:
> > >>>> <TASK>
> > >>>> __dump_stack lib/dump_stack.c:88 [inline]
> > >>>> dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > >>>> print_address_description mm/kasan/report.c:306 [inline]
> > >>>> print_report+0x163/0x4f0 mm/kasan/report.c:417
> > >>>> kasan_report+0x13a/0x170 mm/kasan/report.c:517
> > >>>> crc16+0x1fb/0x280 lib/crc16.c:58
> > >>>> ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> > >>>> ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> > >>>> ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> > >>>> ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> > >>>> ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> > >>>> ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> > >>>> ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> > >>>> ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> > >>>> ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> > >>>> ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> > >>>> evict+0x2a4/0x620 fs/inode.c:664
> > >>>> do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> > >>>> __do_sys_unlink fs/namei.c:4368 [inline]
> > >>>> __se_sys_unlink fs/namei.c:4366 [inline]
> > >>>> __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> > >>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > >>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > >>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >>>> RIP: 0033:0x7fbc85a8c0f9
> > >>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> > >>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> > >>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > >>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > >>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > >>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > >>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > >>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > >>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> > >>>> </TASK>
> > >>>>
> > >>>> The buggy address belongs to the physical page:
> > >>>> page:ffffea0001f78000 refcount:0 mapcount:-128
> > >>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
> > >>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > >>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> > >>>> 0000000000000000
> > >>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
> > >>>> 0000000000000000
> > >>>> page dumped because: kasan: bad access detected
> > >>>> page_owner tracks the page as freed
> > >>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
> > >>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> > >>>> prep_new_page mm/page_alloc.c:2531 [inline]
> > >>>> get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> > >>>> __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> > >>>> alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> > >>>> allocate_slab mm/slub.c:1998 [inline]
> > >>>> new_slab+0x84/0x2f0 mm/slub.c:2051
> > >>>> ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> > >>>> __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> > >>>> kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> > >>>> mt_alloc_bulk lib/maple_tree.c:157 [inline]
> > >>>> mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> > >>>> mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> > >>>> mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> > >>>> vma_expand+0x277/0x850 mm/mmap.c:541
> > >>>> mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> > >>>> do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> > >>>> vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> > >>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > >>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > >>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >>>> page last free stack trace:
> > >>>> reset_page_owner include/linux/page_owner.h:24 [inline]
> > >>>> free_pages_prepare mm/page_alloc.c:1446 [inline]
> > >>>> free_pcp_prepare mm/page_alloc.c:1496 [inline]
> > >>>> free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> > >>>> free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> > >>>> qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> > >>>> kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> > >>>> __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> > >>>> kasan_slab_alloc include/linux/kasan.h:201 [inline]
> > >>>> slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> > >>>> slab_alloc_node mm/slub.c:3452 [inline]
> > >>>> kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> > >>>> __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> > >>>> alloc_skb include/linux/skbuff.h:1270 [inline]
> > >>>> alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> > >>>> sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> > >>>> unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> > >>>> sock_sendmsg_nosec net/socket.c:714 [inline]
> > >>>> sock_sendmsg net/socket.c:734 [inline]
> > >>>> __sys_sendto+0x475/0x5f0 net/socket.c:2117
> > >>>> __do_sys_sendto net/socket.c:2129 [inline]
> > >>>> __se_sys_sendto net/socket.c:2125 [inline]
> > >>>> __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> > >>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > >>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > >>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > >>>>
> > >>>> Memory state around the buggy address:
> > >>>> ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >>>> ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > >>>> ^
> > >>>> ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > >>>> ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > >>>> ==================================================================
> > >>>>
> > >>>
> > >>>
> > >>> I think the patch from below should fix it.
> > >>>
> > >>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> > >>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> > >>> super block in the buffer get corrupted sometime after the .get_tree
> > >>> (which eventually calls __ext4_fill_super()) is called. So instead of
> > >>> relying on the contents of the buffer, we should instead rely on the
> > >>> s_desc_size initialized at the __ext4_fill_super() time.
> > >>>
> > >>> If someone finds this good (or bad), or has a more in depth explanation,
> > >>> please let me know, it will help me better understand the subsystem. In
> > >>> the meantime I'll continue to investigate this and prepare a patch for
> > >>> it.
> > >>
> > >> If there's something corrupting the superblock while the filesystem is
> > >> mounted, we need to find what is corrupting the SB and fix *that*. Not
> > >> try
> > >> to paper over the problem by not using the on-disk data... Maybe journal
> > >> replay is corrupting the value or something like that?
> > >>
> > >> Honza
> > >>
> > >
> > > Ok, I agree. First thing would be to understand the reproducer and to
> > > simplify it if possible. I haven't yet decoded what the syz repro is
> > > doing at
> > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > > Will reply to this email thread once I understand what's happening. If
> > > you or someone else can decode the syz repro faster than me, shoot.
> > >
> >
> > I can now explain how the contents of the super block of the buffer get
> > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > reproducer maps 6MB of data starting at offset 0 in the target's file
> > ("./bus"), then it starts overriding the data with something else, by
> > using memcpy, memset, individual byte inits. Does that mean that we
> > shouldn't rely on the contents of the super block in the buffer after we
> > mount the file system? If so, then my patch stands. I'll be happy to
> > extend it if needed. Below one may find a step by step interpretation of
> > the reproducer.
> >
> > We have a strace log for the same bug, but on Android 5.15:
> > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
> >
> > Look for pid 328. You notice that the bpf() syscalls return error, so I
> > commented them out in the c repro to confirm that they are not the
> > cause. The bug reproduced without the bpf() calls. One can find the c
> > repro at:
> > https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
> >
> > Let's look at these calls, just before the bug was hit:
> > [pid 328] open("./bus",
> > O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> > 000) = 4
> > [pid 328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> > [pid 328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> > [pid 328] mmap(0x20000000, 6291456,
> > PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> > 5, 0) = 0x20000000
>
> Yeah, looking at the reproducer, before this the reproducer also mounts
> /dev/loop0 as ext4 filesystem.
>
> > - ./bus is created (if it does not exist), fd 4 is returned.
> > - /dev/loop0 is mounted to ./bus
> > - then it creates a new file descriptor (5) for the same ./bus
> > - then it creates a mapping for ./bus starting at offset zero. The
> > mapped area is at 0x20000000 and is of 0x600000ul length.
>
> So the result is that the reproducer modified the block device while it is
> mounted by the filesystem. We know cases like this can crash the kernel and
> it is inherently difficult to fix. We have to trust the buffer cache
> contents as otherwise the performance will be unacceptable. For historical
> reasons we also have to allow modifications of buffer cache while ext4 is
> mounted because tune2fs uses this to e.g. update the label of a mounted
> filesystem.
>
> Long-term we are moving ext4 in a direction where we can disallow block
> device modifications while the fs is mounted but we are not there yet. I've
> discussed some shorter-term solution to avoid such known problems with syzbot
> developers and what seems plausible would be a kconfig option to disallow
> writing to a block device when it is exclusively open by someone else.
> But so far I didn't get to trying whether this would reasonably work. Would
> you be interested in having a look into this?

Hi Jan,

Does this affect only the loop device or also USB storage devices?
Say, if the USB device returns different contents during mount and on
subsequent reads?

2023-03-14 02:27:30

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

On Mon, Mar 13, 2023 at 03:53:57PM +0100, Dmitry Vyukov wrote:
> > Long-term we are moving ext4 in a direction where we can disallow block
> > device modifications while the fs is mounted but we are not there yet. I've
> > discussed some shorter-term solution to avoid such known problems with syzbot
> > developers and what seems plausible would be a kconfig option to disallow
> > writing to a block device when it is exclusively open by someone else.
> > But so far I didn't get to trying whether this would reasonably work. Would
> > you be interested in having a look into this?
>
> Does this affect only the loop device or also USB storage devices?
> Say, if the USB device returns different contents during mount and on
> subsequent reads?

Modifying the block device while the file system is mounted is
something that we have to allow for now because tune2fs uses it to
modify the superblock. It has historically also been used (rarely) by
people who know what they are doing to do surgery on a mounted file
system. If we create a way for tune2fs to be able to update the
superblock via some kind of ioctl, we could disallow modifying the
block device while the file system is mounted. Of course, it would
require waiting at least 5-6 years since sometimes people will update
the kernel without updating userspace. We'd also need to check to
make sure there aren't boot loader installer (such as grub-install)
that depend on being able to modify the block device while the root
file system is mounted, at least in some rare cases.

The "how" to exclude mounted file systems is relatively easy. The
kernel already knows when the file system is mounted, and it is
already a supported feature that a userspace application that wants to
be careful can open a block device with O_EXCL, and if it is in use by
the kernel --- mounted by a file system, being used by dm-thin, et. al
-- the open(2) system call will fail. From the open(2) man page.

In general, the behavior of O_EXCL is undefined if it is used without
O_CREAT. There is one exception: on Linux 2.6 and later, O_EXCL can
be used without O_CREAT if pathname refers to a block device. If the
block device is in use by the system (e.g., mounted), open() fails
with the error EBUSY.

Something which the syzbot could to do today is to simply use O_EXCL
whenever trying to open a block device. This would avoid a class of
syzbot false positives, since normally it requires root privileges
and/or an experienced sysadmin to try to modify a block device while
it is mounted and/or in use by LVM.

- Ted

P.S. Trivia note: Aproximately month after I started work at VA Linux
Systems, a sysadmin intern which was given the root password to
sourceforge.net, while trying to fix a disk-to-disk backup, ran
mkfs.ext3 on /dev/hdXX, which was also being used as one-half of a
RAID 0 setup on which open source code critical to the community
(including, for example, OpenGL) was mounted and serving. The intern
got about 50% the way through zeroing the inode table on /dev/hdXX
before the file system noticed and threw an error, at which point
wiser heads stopped what the intern was doing and tried to clean up
the mess. Of course, there were no backups, since that was what the
intern was trying to fix!

There are a couple of things that we could learn from this incident.
One was that giving the root password to an untrained intern not
familiar with the setup on the serving system was... an unfortunate
choice. Another was that adding the above-mentioned O_EXCL feature
and teaching mkfs to use it was an obvious post-mortem action item to
prevent this kind of problem in the future...

2023-03-14 08:50:26

by Jan Kara

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

On Mon 13-03-23 15:53:57, Dmitry Vyukov wrote:
> On Mon, 13 Mar 2023 at 12:57, Jan Kara <[email protected]> wrote:
> >
> > Hi Tudor!
> >
> > On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> > > On 3/7/23 11:02, Tudor Ambarus wrote:
> > > > On 3/7/23 10:39, Jan Kara wrote:
> > > >> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> > > >>> On 2/13/23 15:56, syzbot wrote:
> > > >>>> syzbot has found a reproducer for the following issue on:
> > > >>>>
> > > >>>> HEAD commit: ceaa837f96ad Linux 6.2-rc8
> > > >>>> git tree: upstream
> > > >>>> console output:
> > > >>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > > >>>> kernel config:
> > > >>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > > >>>> dashboard link:
> > > >>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > > >>>> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils
> > > >>>> for Debian) 2.35.2
> > > >>>> syz repro:
> > > >>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> > > >>>>
> > > >>>> Downloadable assets:
> > > >>>> disk image:
> > > >>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > > >>>> vmlinux:
> > > >>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > > >>>> kernel image:
> > > >>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > > >>>> mounted in repro:
> > > >>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> > > >>>>
> > > >>>> IMPORTANT: if you fix the issue, please add the following tag to the
> > > >>>> commit:
> > > >>>> Reported-by: [email protected]
> > > >>>>
> > > >>>> ==================================================================
> > > >>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > > >>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> > > >>>>
> > > >>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> > > >>>> 6.2.0-rc8-syzkaller #0
> > > >>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > >>>> BIOS Google 01/21/2023
> > > >>>> Call Trace:
> > > >>>> <TASK>
> > > >>>> __dump_stack lib/dump_stack.c:88 [inline]
> > > >>>> dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > > >>>> print_address_description mm/kasan/report.c:306 [inline]
> > > >>>> print_report+0x163/0x4f0 mm/kasan/report.c:417
> > > >>>> kasan_report+0x13a/0x170 mm/kasan/report.c:517
> > > >>>> crc16+0x1fb/0x280 lib/crc16.c:58
> > > >>>> ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> > > >>>> ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> > > >>>> ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> > > >>>> ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> > > >>>> ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> > > >>>> ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> > > >>>> ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> > > >>>> ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> > > >>>> ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> > > >>>> ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> > > >>>> evict+0x2a4/0x620 fs/inode.c:664
> > > >>>> do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> > > >>>> __do_sys_unlink fs/namei.c:4368 [inline]
> > > >>>> __se_sys_unlink fs/namei.c:4366 [inline]
> > > >>>> __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> > > >>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > >>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > >>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > >>>> RIP: 0033:0x7fbc85a8c0f9
> > > >>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> > > >>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> > > >>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > >>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > > >>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > > >>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > > >>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > > >>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > >>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> > > >>>> </TASK>
> > > >>>>
> > > >>>> The buggy address belongs to the physical page:
> > > >>>> page:ffffea0001f78000 refcount:0 mapcount:-128
> > > >>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
> > > >>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > >>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> > > >>>> 0000000000000000
> > > >>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
> > > >>>> 0000000000000000
> > > >>>> page dumped because: kasan: bad access detected
> > > >>>> page_owner tracks the page as freed
> > > >>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
> > > >>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> > > >>>> prep_new_page mm/page_alloc.c:2531 [inline]
> > > >>>> get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> > > >>>> __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> > > >>>> alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> > > >>>> allocate_slab mm/slub.c:1998 [inline]
> > > >>>> new_slab+0x84/0x2f0 mm/slub.c:2051
> > > >>>> ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> > > >>>> __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> > > >>>> kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> > > >>>> mt_alloc_bulk lib/maple_tree.c:157 [inline]
> > > >>>> mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> > > >>>> mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> > > >>>> mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> > > >>>> vma_expand+0x277/0x850 mm/mmap.c:541
> > > >>>> mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> > > >>>> do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> > > >>>> vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> > > >>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > >>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > >>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > >>>> page last free stack trace:
> > > >>>> reset_page_owner include/linux/page_owner.h:24 [inline]
> > > >>>> free_pages_prepare mm/page_alloc.c:1446 [inline]
> > > >>>> free_pcp_prepare mm/page_alloc.c:1496 [inline]
> > > >>>> free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> > > >>>> free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> > > >>>> qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> > > >>>> kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> > > >>>> __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> > > >>>> kasan_slab_alloc include/linux/kasan.h:201 [inline]
> > > >>>> slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> > > >>>> slab_alloc_node mm/slub.c:3452 [inline]
> > > >>>> kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> > > >>>> __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> > > >>>> alloc_skb include/linux/skbuff.h:1270 [inline]
> > > >>>> alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> > > >>>> sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> > > >>>> unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> > > >>>> sock_sendmsg_nosec net/socket.c:714 [inline]
> > > >>>> sock_sendmsg net/socket.c:734 [inline]
> > > >>>> __sys_sendto+0x475/0x5f0 net/socket.c:2117
> > > >>>> __do_sys_sendto net/socket.c:2129 [inline]
> > > >>>> __se_sys_sendto net/socket.c:2125 [inline]
> > > >>>> __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> > > >>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > >>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > >>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > >>>>
> > > >>>> Memory state around the buggy address:
> > > >>>> ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > >>>> ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > >>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > >>>> ^
> > > >>>> ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > >>>> ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > >>>> ==================================================================
> > > >>>>
> > > >>>
> > > >>>
> > > >>> I think the patch from below should fix it.
> > > >>>
> > > >>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> > > >>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> > > >>> super block in the buffer get corrupted sometime after the .get_tree
> > > >>> (which eventually calls __ext4_fill_super()) is called. So instead of
> > > >>> relying on the contents of the buffer, we should instead rely on the
> > > >>> s_desc_size initialized at the __ext4_fill_super() time.
> > > >>>
> > > >>> If someone finds this good (or bad), or has a more in depth explanation,
> > > >>> please let me know, it will help me better understand the subsystem. In
> > > >>> the meantime I'll continue to investigate this and prepare a patch for
> > > >>> it.
> > > >>
> > > >> If there's something corrupting the superblock while the filesystem is
> > > >> mounted, we need to find what is corrupting the SB and fix *that*. Not
> > > >> try
> > > >> to paper over the problem by not using the on-disk data... Maybe journal
> > > >> replay is corrupting the value or something like that?
> > > >>
> > > >> Honza
> > > >>
> > > >
> > > > Ok, I agree. First thing would be to understand the reproducer and to
> > > > simplify it if possible. I haven't yet decoded what the syz repro is
> > > > doing at
> > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > > > Will reply to this email thread once I understand what's happening. If
> > > > you or someone else can decode the syz repro faster than me, shoot.
> > > >
> > >
> > > I can now explain how the contents of the super block of the buffer get
> > > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > > reproducer maps 6MB of data starting at offset 0 in the target's file
> > > ("./bus"), then it starts overriding the data with something else, by
> > > using memcpy, memset, individual byte inits. Does that mean that we
> > > shouldn't rely on the contents of the super block in the buffer after we
> > > mount the file system? If so, then my patch stands. I'll be happy to
> > > extend it if needed. Below one may find a step by step interpretation of
> > > the reproducer.
> > >
> > > We have a strace log for the same bug, but on Android 5.15:
> > > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
> > >
> > > Look for pid 328. You notice that the bpf() syscalls return error, so I
> > > commented them out in the c repro to confirm that they are not the
> > > cause. The bug reproduced without the bpf() calls. One can find the c
> > > repro at:
> > > https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
> > >
> > > Let's look at these calls, just before the bug was hit:
> > > [pid 328] open("./bus",
> > > O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> > > 000) = 4
> > > [pid 328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> > > [pid 328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> > > [pid 328] mmap(0x20000000, 6291456,
> > > PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> > > 5, 0) = 0x20000000
> >
> > Yeah, looking at the reproducer, before this the reproducer also mounts
> > /dev/loop0 as ext4 filesystem.
> >
> > > - ./bus is created (if it does not exist), fd 4 is returned.
> > > - /dev/loop0 is mounted to ./bus
> > > - then it creates a new file descriptor (5) for the same ./bus
> > > - then it creates a mapping for ./bus starting at offset zero. The
> > > mapped area is at 0x20000000 and is of 0x600000ul length.
> >
> > So the result is that the reproducer modified the block device while it is
> > mounted by the filesystem. We know cases like this can crash the kernel and
> > it is inherently difficult to fix. We have to trust the buffer cache
> > contents as otherwise the performance will be unacceptable. For historical
> > reasons we also have to allow modifications of buffer cache while ext4 is
> > mounted because tune2fs uses this to e.g. update the label of a mounted
> > filesystem.
> >
> > Long-term we are moving ext4 in a direction where we can disallow block
> > device modifications while the fs is mounted but we are not there yet. I've
> > discussed some shorter-term solution to avoid such known problems with syzbot
> > developers and what seems plausible would be a kconfig option to disallow
> > writing to a block device when it is exclusively open by someone else.
> > But so far I didn't get to trying whether this would reasonably work. Would
> > you be interested in having a look into this?
>
> Does this affect only the loop device or also USB storage devices?
> Say, if the USB device returns different contents during mount and on
> subsequent reads?

So if USB returns a different content, we are fine because we verify the
content each time when loading it into the buffer cache. But if something
in the software opens the block device and modifies it, it modifies
directly the buffer cache and thus bypasses any checks we do when loading
data from the storage.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-03-14 09:33:49

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

On Tue, 14 Mar 2023 at 09:49, Jan Kara <[email protected]> wrote:
>
> On Mon 13-03-23 15:53:57, Dmitry Vyukov wrote:
> > On Mon, 13 Mar 2023 at 12:57, Jan Kara <[email protected]> wrote:
> > >
> > > Hi Tudor!
> > >
> > > On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> > > > On 3/7/23 11:02, Tudor Ambarus wrote:
> > > > > On 3/7/23 10:39, Jan Kara wrote:
> > > > >> On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> > > > >>> On 2/13/23 15:56, syzbot wrote:
> > > > >>>> syzbot has found a reproducer for the following issue on:
> > > > >>>>
> > > > >>>> HEAD commit: ceaa837f96ad Linux 6.2-rc8
> > > > >>>> git tree: upstream
> > > > >>>> console output:
> > > > >>>> https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > > > >>>> kernel config:
> > > > >>>> https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > > > >>>> dashboard link:
> > > > >>>> https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > > > >>>> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils
> > > > >>>> for Debian) 2.35.2
> > > > >>>> syz repro:
> > > > >>>> https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> > > > >>>>
> > > > >>>> Downloadable assets:
> > > > >>>> disk image:
> > > > >>>> https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > > > >>>> vmlinux:
> > > > >>>> https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > > > >>>> kernel image:
> > > > >>>> https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > > > >>>> mounted in repro:
> > > > >>>> https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> > > > >>>>
> > > > >>>> IMPORTANT: if you fix the issue, please add the following tag to the
> > > > >>>> commit:
> > > > >>>> Reported-by: [email protected]
> > > > >>>>
> > > > >>>> ==================================================================
> > > > >>>> BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > > > >>>> Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> > > > >>>>
> > > > >>>> CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> > > > >>>> 6.2.0-rc8-syzkaller #0
> > > > >>>> Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > >>>> BIOS Google 01/21/2023
> > > > >>>> Call Trace:
> > > > >>>> <TASK>
> > > > >>>> __dump_stack lib/dump_stack.c:88 [inline]
> > > > >>>> dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > > > >>>> print_address_description mm/kasan/report.c:306 [inline]
> > > > >>>> print_report+0x163/0x4f0 mm/kasan/report.c:417
> > > > >>>> kasan_report+0x13a/0x170 mm/kasan/report.c:517
> > > > >>>> crc16+0x1fb/0x280 lib/crc16.c:58
> > > > >>>> ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> > > > >>>> ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> > > > >>>> ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> > > > >>>> ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> > > > >>>> ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> > > > >>>> ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> > > > >>>> ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> > > > >>>> ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> > > > >>>> ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> > > > >>>> ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> > > > >>>> evict+0x2a4/0x620 fs/inode.c:664
> > > > >>>> do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> > > > >>>> __do_sys_unlink fs/namei.c:4368 [inline]
> > > > >>>> __se_sys_unlink fs/namei.c:4366 [inline]
> > > > >>>> __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> > > > >>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > >>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > >>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > >>>> RIP: 0033:0x7fbc85a8c0f9
> > > > >>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> > > > >>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> > > > >>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > >>>> RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > > > >>>> RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > > > >>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > > > >>>> RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > > > >>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > > >>>> R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> > > > >>>> </TASK>
> > > > >>>>
> > > > >>>> The buggy address belongs to the physical page:
> > > > >>>> page:ffffea0001f78000 refcount:0 mapcount:-128
> > > > >>>> mapping:0000000000000000 index:0x0 pfn:0x7de00
> > > > >>>> flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > > >>>> raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> > > > >>>> 0000000000000000
> > > > >>>> raw: 0000000000000000 0000000000000001 00000000ffffff7f
> > > > >>>> 0000000000000000
> > > > >>>> page dumped because: kasan: bad access detected
> > > > >>>> page_owner tracks the page as freed
> > > > >>>> page last allocated via order 1, migratetype Unmovable, gfp_mask
> > > > >>>> 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> > > > >>>> prep_new_page mm/page_alloc.c:2531 [inline]
> > > > >>>> get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> > > > >>>> __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> > > > >>>> alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> > > > >>>> allocate_slab mm/slub.c:1998 [inline]
> > > > >>>> new_slab+0x84/0x2f0 mm/slub.c:2051
> > > > >>>> ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> > > > >>>> __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> > > > >>>> kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> > > > >>>> mt_alloc_bulk lib/maple_tree.c:157 [inline]
> > > > >>>> mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> > > > >>>> mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> > > > >>>> mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> > > > >>>> vma_expand+0x277/0x850 mm/mmap.c:541
> > > > >>>> mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> > > > >>>> do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> > > > >>>> vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> > > > >>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > >>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > >>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > >>>> page last free stack trace:
> > > > >>>> reset_page_owner include/linux/page_owner.h:24 [inline]
> > > > >>>> free_pages_prepare mm/page_alloc.c:1446 [inline]
> > > > >>>> free_pcp_prepare mm/page_alloc.c:1496 [inline]
> > > > >>>> free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> > > > >>>> free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> > > > >>>> qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> > > > >>>> kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> > > > >>>> __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> > > > >>>> kasan_slab_alloc include/linux/kasan.h:201 [inline]
> > > > >>>> slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> > > > >>>> slab_alloc_node mm/slub.c:3452 [inline]
> > > > >>>> kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> > > > >>>> __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> > > > >>>> alloc_skb include/linux/skbuff.h:1270 [inline]
> > > > >>>> alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> > > > >>>> sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> > > > >>>> unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> > > > >>>> sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > >>>> sock_sendmsg net/socket.c:734 [inline]
> > > > >>>> __sys_sendto+0x475/0x5f0 net/socket.c:2117
> > > > >>>> __do_sys_sendto net/socket.c:2129 [inline]
> > > > >>>> __se_sys_sendto net/socket.c:2125 [inline]
> > > > >>>> __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> > > > >>>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > >>>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > >>>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > >>>>
> > > > >>>> Memory state around the buggy address:
> > > > >>>> ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > >>>> ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > >>>>> ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > >>>> ^
> > > > >>>> ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > >>>> ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > >>>> ==================================================================
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>> I think the patch from below should fix it.
> > > > >>>
> > > > >>> I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> > > > >>> EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> > > > >>> super block in the buffer get corrupted sometime after the .get_tree
> > > > >>> (which eventually calls __ext4_fill_super()) is called. So instead of
> > > > >>> relying on the contents of the buffer, we should instead rely on the
> > > > >>> s_desc_size initialized at the __ext4_fill_super() time.
> > > > >>>
> > > > >>> If someone finds this good (or bad), or has a more in depth explanation,
> > > > >>> please let me know, it will help me better understand the subsystem. In
> > > > >>> the meantime I'll continue to investigate this and prepare a patch for
> > > > >>> it.
> > > > >>
> > > > >> If there's something corrupting the superblock while the filesystem is
> > > > >> mounted, we need to find what is corrupting the SB and fix *that*. Not
> > > > >> try
> > > > >> to paper over the problem by not using the on-disk data... Maybe journal
> > > > >> replay is corrupting the value or something like that?
> > > > >>
> > > > >> Honza
> > > > >>
> > > > >
> > > > > Ok, I agree. First thing would be to understand the reproducer and to
> > > > > simplify it if possible. I haven't yet decoded what the syz repro is
> > > > > doing at
> > > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > > > > Will reply to this email thread once I understand what's happening. If
> > > > > you or someone else can decode the syz repro faster than me, shoot.
> > > > >
> > > >
> > > > I can now explain how the contents of the super block of the buffer get
> > > > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > > > reproducer maps 6MB of data starting at offset 0 in the target's file
> > > > ("./bus"), then it starts overriding the data with something else, by
> > > > using memcpy, memset, individual byte inits. Does that mean that we
> > > > shouldn't rely on the contents of the super block in the buffer after we
> > > > mount the file system? If so, then my patch stands. I'll be happy to
> > > > extend it if needed. Below one may find a step by step interpretation of
> > > > the reproducer.
> > > >
> > > > We have a strace log for the same bug, but on Android 5.15:
> > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
> > > >
> > > > Look for pid 328. You notice that the bpf() syscalls return error, so I
> > > > commented them out in the c repro to confirm that they are not the
> > > > cause. The bug reproduced without the bpf() calls. One can find the c
> > > > repro at:
> > > > https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
> > > >
> > > > Let's look at these calls, just before the bug was hit:
> > > > [pid 328] open("./bus",
> > > > O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> > > > 000) = 4
> > > > [pid 328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> > > > [pid 328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> > > > [pid 328] mmap(0x20000000, 6291456,
> > > > PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> > > > 5, 0) = 0x20000000
> > >
> > > Yeah, looking at the reproducer, before this the reproducer also mounts
> > > /dev/loop0 as ext4 filesystem.
> > >
> > > > - ./bus is created (if it does not exist), fd 4 is returned.
> > > > - /dev/loop0 is mounted to ./bus
> > > > - then it creates a new file descriptor (5) for the same ./bus
> > > > - then it creates a mapping for ./bus starting at offset zero. The
> > > > mapped area is at 0x20000000 and is of 0x600000ul length.
> > >
> > > So the result is that the reproducer modified the block device while it is
> > > mounted by the filesystem. We know cases like this can crash the kernel and
> > > it is inherently difficult to fix. We have to trust the buffer cache
> > > contents as otherwise the performance will be unacceptable. For historical
> > > reasons we also have to allow modifications of buffer cache while ext4 is
> > > mounted because tune2fs uses this to e.g. update the label of a mounted
> > > filesystem.
> > >
> > > Long-term we are moving ext4 in a direction where we can disallow block
> > > device modifications while the fs is mounted but we are not there yet. I've
> > > discussed some shorter-term solution to avoid such known problems with syzbot
> > > developers and what seems plausible would be a kconfig option to disallow
> > > writing to a block device when it is exclusively open by someone else.
> > > But so far I didn't get to trying whether this would reasonably work. Would
> > > you be interested in having a look into this?
> >
> > Does this affect only the loop device or also USB storage devices?
> > Say, if the USB device returns different contents during mount and on
> > subsequent reads?
>
> So if USB returns a different content, we are fine because we verify the
> content each time when loading it into the buffer cache. But if something
> in the software opens the block device and modifies it, it modifies
> directly the buffer cache and thus bypasses any checks we do when loading
> data from the storage.

Thanks, I see. This is good.

2023-03-14 09:46:09

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

On Tue, 14 Mar 2023 at 03:26, Theodore Ts'o <[email protected]> wrote:
>
> On Mon, Mar 13, 2023 at 03:53:57PM +0100, Dmitry Vyukov wrote:
> > > Long-term we are moving ext4 in a direction where we can disallow block
> > > device modifications while the fs is mounted but we are not there yet. I've
> > > discussed some shorter-term solution to avoid such known problems with syzbot
> > > developers and what seems plausible would be a kconfig option to disallow
> > > writing to a block device when it is exclusively open by someone else.
> > > But so far I didn't get to trying whether this would reasonably work. Would
> > > you be interested in having a look into this?
> >
> > Does this affect only the loop device or also USB storage devices?
> > Say, if the USB device returns different contents during mount and on
> > subsequent reads?
>
> Modifying the block device while the file system is mounted is
> something that we have to allow for now because tune2fs uses it to
> modify the superblock. It has historically also been used (rarely) by
> people who know what they are doing to do surgery on a mounted file
> system. If we create a way for tune2fs to be able to update the
> superblock via some kind of ioctl, we could disallow modifying the
> block device while the file system is mounted. Of course, it would
> require waiting at least 5-6 years since sometimes people will update
> the kernel without updating userspace. We'd also need to check to
> make sure there aren't boot loader installer (such as grub-install)
> that depend on being able to modify the block device while the root
> file system is mounted, at least in some rare cases.
>
> The "how" to exclude mounted file systems is relatively easy. The
> kernel already knows when the file system is mounted, and it is
> already a supported feature that a userspace application that wants to
> be careful can open a block device with O_EXCL, and if it is in use by
> the kernel --- mounted by a file system, being used by dm-thin, et. al
> -- the open(2) system call will fail. From the open(2) man page.
>
> In general, the behavior of O_EXCL is undefined if it is used without
> O_CREAT. There is one exception: on Linux 2.6 and later, O_EXCL can
> be used without O_CREAT if pathname refers to a block device. If the
> block device is in use by the system (e.g., mounted), open() fails
> with the error EBUSY.
>
> Something which the syzbot could to do today is to simply use O_EXCL
> whenever trying to open a block device. This would avoid a class of
> syzbot false positives, since normally it requires root privileges
> and/or an experienced sysadmin to try to modify a block device while
> it is mounted and/or in use by LVM.
>
> - Ted
>
> P.S. Trivia note: Aproximately month after I started work at VA Linux
> Systems, a sysadmin intern which was given the root password to
> sourceforge.net, while trying to fix a disk-to-disk backup, ran
> mkfs.ext3 on /dev/hdXX, which was also being used as one-half of a
> RAID 0 setup on which open source code critical to the community
> (including, for example, OpenGL) was mounted and serving. The intern
> got about 50% the way through zeroing the inode table on /dev/hdXX
> before the file system noticed and threw an error, at which point
> wiser heads stopped what the intern was doing and tried to clean up
> the mess. Of course, there were no backups, since that was what the
> intern was trying to fix!
>
> There are a couple of things that we could learn from this incident.
> One was that giving the root password to an untrained intern not
> familiar with the setup on the serving system was... an unfortunate
> choice. Another was that adding the above-mentioned O_EXCL feature
> and teaching mkfs to use it was an obvious post-mortem action item to
> prevent this kind of problem in the future...

I am struggling to make my mind re how to think about this case.

"root" is very overloaded, but generally it does not mean "randomly
corrupting memory". Normally it gives access to system-wide changes
but with the same protection/consistency guarantees as for
unprivileged system calls.

There are, of course, things like /dev/{mem,kmem}. But at the same
time there is also lockdown LSM and more distros today enable it.

Btw, should this "prohibit writes to mounted device" be part of
LOCKDOWN_INTEGRITY? It looks like it gives capabilities similar to
/dev/{mem,kmem}.

Disabling in testing something that's enabled in production is
generally not very useful.
So one option is to do nothing about this for now.
If it's a true recognized issue that is in the process of fixing,
syzbot will just show that it's still present. One of the goals of
syzbot is to show the current state of things in an objective manner.
If some kernel developers are aware of an issue, it does not mean that
most distros/users are aware.

It makes sense to disable in testing things that are also recommended
to be disabled in production settings.
And LOCKDOWN_INTEGRITY may play such a role: we include this
restriction into LOCKDOWN_INTEGRITY and enable it on syzbot.
Though, unfortunately, we still don't enable it because it prohibits
access to debugfs, which is required for fuzzing. Need to ask lockdown
maintainers what they think about
LOCKDOWN_TEST_ONLY_DONT_ENABLE_IN_PROD_INTEGRITY which would whitelist
debugfs.

2023-03-14 10:05:41

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

On Tue, 14 Mar 2023 at 10:45, Dmitry Vyukov <[email protected]> wrote:
>
> On Tue, 14 Mar 2023 at 03:26, Theodore Ts'o <[email protected]> wrote:
> >
> > On Mon, Mar 13, 2023 at 03:53:57PM +0100, Dmitry Vyukov wrote:
> > > > Long-term we are moving ext4 in a direction where we can disallow block
> > > > device modifications while the fs is mounted but we are not there yet. I've
> > > > discussed some shorter-term solution to avoid such known problems with syzbot
> > > > developers and what seems plausible would be a kconfig option to disallow
> > > > writing to a block device when it is exclusively open by someone else.
> > > > But so far I didn't get to trying whether this would reasonably work. Would
> > > > you be interested in having a look into this?
> > >
> > > Does this affect only the loop device or also USB storage devices?
> > > Say, if the USB device returns different contents during mount and on
> > > subsequent reads?
> >
> > Modifying the block device while the file system is mounted is
> > something that we have to allow for now because tune2fs uses it to
> > modify the superblock. It has historically also been used (rarely) by
> > people who know what they are doing to do surgery on a mounted file
> > system. If we create a way for tune2fs to be able to update the
> > superblock via some kind of ioctl, we could disallow modifying the
> > block device while the file system is mounted. Of course, it would
> > require waiting at least 5-6 years since sometimes people will update
> > the kernel without updating userspace. We'd also need to check to
> > make sure there aren't boot loader installer (such as grub-install)
> > that depend on being able to modify the block device while the root
> > file system is mounted, at least in some rare cases.
> >
> > The "how" to exclude mounted file systems is relatively easy. The
> > kernel already knows when the file system is mounted, and it is
> > already a supported feature that a userspace application that wants to
> > be careful can open a block device with O_EXCL, and if it is in use by
> > the kernel --- mounted by a file system, being used by dm-thin, et. al
> > -- the open(2) system call will fail. From the open(2) man page.
> >
> > In general, the behavior of O_EXCL is undefined if it is used without
> > O_CREAT. There is one exception: on Linux 2.6 and later, O_EXCL can
> > be used without O_CREAT if pathname refers to a block device. If the
> > block device is in use by the system (e.g., mounted), open() fails
> > with the error EBUSY.
> >
> > Something which the syzbot could to do today is to simply use O_EXCL
> > whenever trying to open a block device. This would avoid a class of
> > syzbot false positives, since normally it requires root privileges
> > and/or an experienced sysadmin to try to modify a block device while
> > it is mounted and/or in use by LVM.
> >
> > - Ted
> >
> > P.S. Trivia note: Aproximately month after I started work at VA Linux
> > Systems, a sysadmin intern which was given the root password to
> > sourceforge.net, while trying to fix a disk-to-disk backup, ran
> > mkfs.ext3 on /dev/hdXX, which was also being used as one-half of a
> > RAID 0 setup on which open source code critical to the community
> > (including, for example, OpenGL) was mounted and serving. The intern
> > got about 50% the way through zeroing the inode table on /dev/hdXX
> > before the file system noticed and threw an error, at which point
> > wiser heads stopped what the intern was doing and tried to clean up
> > the mess. Of course, there were no backups, since that was what the
> > intern was trying to fix!
> >
> > There are a couple of things that we could learn from this incident.
> > One was that giving the root password to an untrained intern not
> > familiar with the setup on the serving system was... an unfortunate
> > choice. Another was that adding the above-mentioned O_EXCL feature
> > and teaching mkfs to use it was an obvious post-mortem action item to
> > prevent this kind of problem in the future...
>
> I am struggling to make my mind re how to think about this case.
>
> "root" is very overloaded, but generally it does not mean "randomly
> corrupting memory". Normally it gives access to system-wide changes
> but with the same protection/consistency guarantees as for
> unprivileged system calls.
>
> There are, of course, things like /dev/{mem,kmem}. But at the same
> time there is also lockdown LSM and more distros today enable it.
>
> Btw, should this "prohibit writes to mounted device" be part of
> LOCKDOWN_INTEGRITY? It looks like it gives capabilities similar to
> /dev/{mem,kmem}.
>
> Disabling in testing something that's enabled in production is
> generally not very useful.
> So one option is to do nothing about this for now.
> If it's a true recognized issue that is in the process of fixing,
> syzbot will just show that it's still present. One of the goals of
> syzbot is to show the current state of things in an objective manner.
> If some kernel developers are aware of an issue, it does not mean that
> most distros/users are aware.
>
> It makes sense to disable in testing things that are also recommended
> to be disabled in production settings.
> And LOCKDOWN_INTEGRITY may play such a role: we include this
> restriction into LOCKDOWN_INTEGRITY and enable it on syzbot.
> Though, unfortunately, we still don't enable it because it prohibits
> access to debugfs, which is required for fuzzing. Need to ask lockdown
> maintainers what they think about
> LOCKDOWN_TEST_ONLY_DONT_ENABLE_IN_PROD_INTEGRITY which would whitelist
> debugfs.

Asked lockdown maintainers about adding this it lockdown and adding
special mode for fuzzing:
https://lore.kernel.org/all/CACT4Y+Z-9KCgKwkktvdJwNJZxxeA1f74zkP7KD6c=OmKXxXfjw@mail.gmail.com/

2023-03-14 11:20:43

by Jan Kara

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

On Mon 13-03-23 21:17:57, yebin (H) wrote:
> On 2023/3/13 21:01, Jan Kara wrote:
> > On Mon 13-03-23 20:27:34, yebin wrote:
> > > On 2023/3/13 19:57, Jan Kara wrote:
> > > > On Mon 13-03-23 11:11:18, Tudor Ambarus wrote:
> > > > > On 3/7/23 11:02, Tudor Ambarus wrote:
> > > > > > On 3/7/23 10:39, Jan Kara wrote:
> > > > > > > On Wed 01-03-23 12:13:51, Tudor Ambarus wrote:
> > > > > > > > On 2/13/23 15:56, syzbot wrote:
> > > > > > > > > syzbot has found a reproducer for the following issue on:
> > > > > > > > >
> > > > > > > > > HEAD commit: ceaa837f96ad Linux 6.2-rc8
> > > > > > > > > git tree: upstream
> > > > > > > > > console output:
> > > > > > > > > https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
> > > > > > > > > kernel config:
> > > > > > > > > https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
> > > > > > > > > dashboard link:
> > > > > > > > > https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
> > > > > > > > > compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils
> > > > > > > > > for Debian) 2.35.2
> > > > > > > > > syz repro:
> > > > > > > > > https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000
> > > > > > > > >
> > > > > > > > > Downloadable assets:
> > > > > > > > > disk image:
> > > > > > > > > https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
> > > > > > > > > vmlinux:
> > > > > > > > > https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
> > > > > > > > > kernel image:
> > > > > > > > > https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
> > > > > > > > > mounted in repro:
> > > > > > > > > https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz
> > > > > > > > >
> > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the
> > > > > > > > > commit:
> > > > > > > > > Reported-by: [email protected]
> > > > > > > > >
> > > > > > > > > ==================================================================
> > > > > > > > > BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
> > > > > > > > > Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339
> > > > > > > > >
> > > > > > > > > CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted
> > > > > > > > > 6.2.0-rc8-syzkaller #0
> > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > > > > > > > > BIOS Google 01/21/2023
> > > > > > > > > Call Trace:
> > > > > > > > > <TASK>
> > > > > > > > > __dump_stack lib/dump_stack.c:88 [inline]
> > > > > > > > > dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
> > > > > > > > > print_address_description mm/kasan/report.c:306 [inline]
> > > > > > > > > print_report+0x163/0x4f0 mm/kasan/report.c:417
> > > > > > > > > kasan_report+0x13a/0x170 mm/kasan/report.c:517
> > > > > > > > > crc16+0x1fb/0x280 lib/crc16.c:58
> > > > > > > > > ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
> > > > > > > > > ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
> > > > > > > > > ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
> > > > > > > > > ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173
> > > > > > > > > ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
> > > > > > > > > ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
> > > > > > > > > ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
> > > > > > > > > ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
> > > > > > > > > ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
> > > > > > > > > ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
> > > > > > > > > evict+0x2a4/0x620 fs/inode.c:664
> > > > > > > > > do_unlinkat+0x4f1/0x930 fs/namei.c:4327
> > > > > > > > > __do_sys_unlink fs/namei.c:4368 [inline]
> > > > > > > > > __se_sys_unlink fs/namei.c:4366 [inline]
> > > > > > > > > __x64_sys_unlink+0x49/0x50 fs/namei.c:4366
> > > > > > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > > > > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > RIP: 0033:0x7fbc85a8c0f9
> > > > > > > > > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48
> > > > > > > > > 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> > > > > > > > > 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> > > > > > > > > RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
> > > > > > > > > RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
> > > > > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
> > > > > > > > > RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000
> > > > > > > > > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > > R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
> > > > > > > > > </TASK>
> > > > > > > > >
> > > > > > > > > The buggy address belongs to the physical page:
> > > > > > > > > page:ffffea0001f78000 refcount:0 mapcount:-128
> > > > > > > > > mapping:0000000000000000 index:0x0 pfn:0x7de00
> > > > > > > > > flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
> > > > > > > > > raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08
> > > > > > > > > 0000000000000000
> > > > > > > > > raw: 0000000000000000 0000000000000001 00000000ffffff7f
> > > > > > > > > 0000000000000000
> > > > > > > > > page dumped because: kasan: bad access detected
> > > > > > > > > page_owner tracks the page as freed
> > > > > > > > > page last allocated via order 1, migratetype Unmovable, gfp_mask
> > > > > > > > > 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
> > > > > > > > > prep_new_page mm/page_alloc.c:2531 [inline]
> > > > > > > > > get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
> > > > > > > > > __alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
> > > > > > > > > alloc_slab_page+0x6a/0x160 mm/slub.c:1851
> > > > > > > > > allocate_slab mm/slub.c:1998 [inline]
> > > > > > > > > new_slab+0x84/0x2f0 mm/slub.c:2051
> > > > > > > > > ___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
> > > > > > > > > __kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
> > > > > > > > > kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
> > > > > > > > > mt_alloc_bulk lib/maple_tree.c:157 [inline]
> > > > > > > > > mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
> > > > > > > > > mas_node_count_gfp lib/maple_tree.c:1316 [inline]
> > > > > > > > > mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
> > > > > > > > > vma_expand+0x277/0x850 mm/mmap.c:541
> > > > > > > > > mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
> > > > > > > > > do_mmap+0x8c9/0xf70 mm/mmap.c:1411
> > > > > > > > > vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
> > > > > > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > > > > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > > page last free stack trace:
> > > > > > > > > reset_page_owner include/linux/page_owner.h:24 [inline]
> > > > > > > > > free_pages_prepare mm/page_alloc.c:1446 [inline]
> > > > > > > > > free_pcp_prepare mm/page_alloc.c:1496 [inline]
> > > > > > > > > free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
> > > > > > > > > free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
> > > > > > > > > qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
> > > > > > > > > kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
> > > > > > > > > __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
> > > > > > > > > kasan_slab_alloc include/linux/kasan.h:201 [inline]
> > > > > > > > > slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
> > > > > > > > > slab_alloc_node mm/slub.c:3452 [inline]
> > > > > > > > > kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
> > > > > > > > > __alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
> > > > > > > > > alloc_skb include/linux/skbuff.h:1270 [inline]
> > > > > > > > > alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
> > > > > > > > > sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
> > > > > > > > > unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943
> > > > > > > > > sock_sendmsg_nosec net/socket.c:714 [inline]
> > > > > > > > > sock_sendmsg net/socket.c:734 [inline]
> > > > > > > > > __sys_sendto+0x475/0x5f0 net/socket.c:2117
> > > > > > > > > __do_sys_sendto net/socket.c:2129 [inline]
> > > > > > > > > __se_sys_sendto net/socket.c:2125 [inline]
> > > > > > > > > __x64_sys_sendto+0xde/0xf0 net/socket.c:2125
> > > > > > > > > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > > > > > > > > do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> > > > > > > > > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > > > > > > >
> > > > > > > > > Memory state around the buggy address:
> > > > > > > > > ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > > > > ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > > > > > ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > > > ^
> > > > > > > > > ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > > > ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > > > > ==================================================================
> > > > > > > > >
> > > > > > > > I think the patch from below should fix it.
> > > > > > > >
> > > > > > > > I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
> > > > > > > > EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
> > > > > > > > super block in the buffer get corrupted sometime after the .get_tree
> > > > > > > > (which eventually calls __ext4_fill_super()) is called. So instead of
> > > > > > > > relying on the contents of the buffer, we should instead rely on the
> > > > > > > > s_desc_size initialized at the __ext4_fill_super() time.
> > > > > > > >
> > > > > > > > If someone finds this good (or bad), or has a more in depth explanation,
> > > > > > > > please let me know, it will help me better understand the subsystem. In
> > > > > > > > the meantime I'll continue to investigate this and prepare a patch for
> > > > > > > > it.
> > > > > > > If there's something corrupting the superblock while the filesystem is
> > > > > > > mounted, we need to find what is corrupting the SB and fix *that*. Not
> > > > > > > try
> > > > > > > to paper over the problem by not using the on-disk data... Maybe journal
> > > > > > > replay is corrupting the value or something like that?
> > > > > > >
> > > > > > > Honza
> > > > > > >
> > > > > > Ok, I agree. First thing would be to understand the reproducer and to
> > > > > > simplify it if possible. I haven't yet decoded what the syz repro is
> > > > > > doing at
> > > > > > https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
> > > > > > Will reply to this email thread once I understand what's happening. If
> > > > > > you or someone else can decode the syz repro faster than me, shoot.
> > > > > >
> > > > > I can now explain how the contents of the super block of the buffer get
> > > > > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > > > > reproducer maps 6MB of data starting at offset 0 in the target's file
> > > > > ("./bus"), then it starts overriding the data with something else, by
> > > > > using memcpy, memset, individual byte inits. Does that mean that we
> > > > > shouldn't rely on the contents of the super block in the buffer after we
> > > > > mount the file system? If so, then my patch stands. I'll be happy to
> > > > > extend it if needed. Below one may find a step by step interpretation of
> > > > > the reproducer.
> > > > >
> > > > > We have a strace log for the same bug, but on Android 5.15:
> > > > > https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000
> > > > >
> > > > > Look for pid 328. You notice that the bpf() syscalls return error, so I
> > > > > commented them out in the c repro to confirm that they are not the
> > > > > cause. The bug reproduced without the bpf() calls. One can find the c
> > > > > repro at:
> > > > > https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000
> > > > >
> > > > > Let's look at these calls, just before the bug was hit:
> > > > > [pid 328] open("./bus",
> > > > > O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
> > > > > 000) = 4
> > > > > [pid 328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
> > > > > [pid 328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
> > > > > [pid 328] mmap(0x20000000, 6291456,
> > > > > PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
> > > > > 5, 0) = 0x20000000
> > > > Yeah, looking at the reproducer, before this the reproducer also mounts
> > > > /dev/loop0 as ext4 filesystem.
> > > >
> > > > > - ./bus is created (if it does not exist), fd 4 is returned.
> > > > > - /dev/loop0 is mounted to ./bus
> > > > > - then it creates a new file descriptor (5) for the same ./bus
> > > > > - then it creates a mapping for ./bus starting at offset zero. The
> > > > > mapped area is at 0x20000000 and is of 0x600000ul length.
> > > > So the result is that the reproducer modified the block device while it is
> > > > mounted by the filesystem. We know cases like this can crash the kernel and
> > > > it is inherently difficult to fix. We have to trust the buffer cache
> > > > contents as otherwise the performance will be unacceptable. For historical
> > > > reasons we also have to allow modifications of buffer cache while ext4 is
> > > > mounted because tune2fs uses this to e.g. update the label of a mounted
> > > > filesystem.
> > > >
> > > > Long-term we are moving ext4 in a direction where we can disallow block
> > > > device modifications while the fs is mounted but we are not there yet. I've
> > > > discussed some shorter-term solution to avoid such known problems with syzbot
> > > > developers and what seems plausible would be a kconfig option to disallow
> > > > writing to a block device when it is exclusively open by someone else.
> > > > But so far I didn't get to trying whether this would reasonably work. Would
> > > > you be interested in having a look into this?
> > > I am interested in this job. The file system is often damaged by writing
> > > block devices, which is a headache. I have always wanted to eradicate
> > > this kind of problem. A few months ago, I tried to add a mount parameter
> > > to prohibit modification after the block device is mounted.But I
> > > encountered several problems that led to the termination of my attempt.
> > > First of all, the 32-bit super block flags have been used up and need to
> > > be extended. Secondly, I don't know how to handle read-only flag in the
> > > case of multiple mount points.
> > > "disallow writing to a block device when it is exclusively open by someone
> > > else. "
> > > -> Perhaps we can add a new IOCTL command to control whether write
> > > operations are allowed after the block device has been exclusively
> > > opened. I don't know if this is feasible? Do you have any good
> > > suggestions?
> > Well, ioctl() for syzbot would be possible as well but for start I'd try
> > whether the idea with kconfig option will work. Then it will be enough to
> > just make sure all kernels used for fuzzing are built with this option set.
> > Thanks for having a look into this!
>
> In fact, I also want to solve the problem of file system damage caused by
> writing raw disks in the production environment. Use kconfig directly to
> control whether it loses flexibility in the production environment.

I see. But which protections do you exactly want in production? Since you
need to add somewhere the call to ioctl(2) to write-protect the device, you
could as well just "chmod ugo-w <device>" instead, couldn't you? And the
level of protection would be similar.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR