2023-04-24 07:22:38

by syzbot

[permalink] [raw]
Subject: [syzbot] [io-uring?] KCSAN: data-race in __io_fill_cqe_req / io_timeout

Hello,

syzbot found the following issue on:

HEAD commit: 3a93e40326c8 Merge tag 'for-linus' of git://git.kernel.org..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1280071ec80000
kernel config: https://syzkaller.appspot.com/x/.config?x=f7350c77b8056a38
dashboard link: https://syzkaller.appspot.com/bug?extid=cb265db2f3f3468ef436
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/2122926bc9fe/disk-3a93e403.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/8392992358bc/vmlinux-3a93e403.xz
kernel image: https://storage.googleapis.com/syzbot-assets/6398a2d19a7e/bzImage-3a93e403.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KCSAN: data-race in __io_fill_cqe_req / io_timeout

read-write to 0xffff888108bf8310 of 4 bytes by task 20447 on cpu 0:
io_get_cqe_overflow io_uring/io_uring.h:112 [inline]
io_get_cqe io_uring/io_uring.h:124 [inline]
__io_fill_cqe_req+0x6c/0x4d0 io_uring/io_uring.h:137
io_fill_cqe_req io_uring/io_uring.h:165 [inline]
__io_req_complete_post+0x67/0x790 io_uring/io_uring.c:969
io_req_complete_post io_uring/io_uring.c:1006 [inline]
io_req_task_complete+0xb9/0x110 io_uring/io_uring.c:1654
handle_tw_list io_uring/io_uring.c:1184 [inline]
tctx_task_work+0x1fe/0x4d0 io_uring/io_uring.c:1246
task_work_run+0x123/0x160 kernel/task_work.c:179
get_signal+0xe5c/0xfe0 kernel/signal.c:2635
arch_do_signal_or_restart+0x89/0x2b0 arch/x86/kernel/signal.c:306
exit_to_user_mode_loop+0x6d/0xe0 kernel/entry/common.c:168
exit_to_user_mode_prepare+0x6a/0xa0 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x26/0x140 kernel/entry/common.c:297
do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffff888108bf8310 of 4 bytes by task 20448 on cpu 1:
io_timeout+0x88/0x270 io_uring/timeout.c:546
io_issue_sqe+0x147/0x660 io_uring/io_uring.c:1907
io_queue_sqe io_uring/io_uring.c:2079 [inline]
io_submit_sqe io_uring/io_uring.c:2340 [inline]
io_submit_sqes+0x689/0xfe0 io_uring/io_uring.c:2450
__do_sys_io_uring_enter io_uring/io_uring.c:3458 [inline]
__se_sys_io_uring_enter+0x1e5/0x1b70 io_uring/io_uring.c:3392
__x64_sys_io_uring_enter+0x78/0x90 io_uring/io_uring.c:3392
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x00000c75 -> 0x00000c76

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 20448 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00025-g3a93e40326c8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2023-04-24 07:40:35

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] [io-uring?] KCSAN: data-race in __io_fill_cqe_req / io_timeout

On Mon, 24 Apr 2023 at 09:19, syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 3a93e40326c8 Merge tag 'for-linus' of git://git.kernel.org..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1280071ec80000
> kernel config: https://syzkaller.appspot.com/x/.config?x=f7350c77b8056a38
> dashboard link: https://syzkaller.appspot.com/bug?extid=cb265db2f3f3468ef436
> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/2122926bc9fe/disk-3a93e403.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/8392992358bc/vmlinux-3a93e403.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/6398a2d19a7e/bzImage-3a93e403.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]

I did not fully grasp what happens here, but it looks suspicious.
The comment in io_timeout() says that it computes "events that need to
occur for this timeout event to be satisfied". But if the range is
racing, it marks a random set of events?


> ==================================================================
> BUG: KCSAN: data-race in __io_fill_cqe_req / io_timeout
>
> read-write to 0xffff888108bf8310 of 4 bytes by task 20447 on cpu 0:
> io_get_cqe_overflow io_uring/io_uring.h:112 [inline]
> io_get_cqe io_uring/io_uring.h:124 [inline]
> __io_fill_cqe_req+0x6c/0x4d0 io_uring/io_uring.h:137
> io_fill_cqe_req io_uring/io_uring.h:165 [inline]
> __io_req_complete_post+0x67/0x790 io_uring/io_uring.c:969
> io_req_complete_post io_uring/io_uring.c:1006 [inline]
> io_req_task_complete+0xb9/0x110 io_uring/io_uring.c:1654
> handle_tw_list io_uring/io_uring.c:1184 [inline]
> tctx_task_work+0x1fe/0x4d0 io_uring/io_uring.c:1246
> task_work_run+0x123/0x160 kernel/task_work.c:179
> get_signal+0xe5c/0xfe0 kernel/signal.c:2635
> arch_do_signal_or_restart+0x89/0x2b0 arch/x86/kernel/signal.c:306
> exit_to_user_mode_loop+0x6d/0xe0 kernel/entry/common.c:168
> exit_to_user_mode_prepare+0x6a/0xa0 kernel/entry/common.c:204
> __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
> syscall_exit_to_user_mode+0x26/0x140 kernel/entry/common.c:297
> do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> read to 0xffff888108bf8310 of 4 bytes by task 20448 on cpu 1:
> io_timeout+0x88/0x270 io_uring/timeout.c:546
> io_issue_sqe+0x147/0x660 io_uring/io_uring.c:1907
> io_queue_sqe io_uring/io_uring.c:2079 [inline]
> io_submit_sqe io_uring/io_uring.c:2340 [inline]
> io_submit_sqes+0x689/0xfe0 io_uring/io_uring.c:2450
> __do_sys_io_uring_enter io_uring/io_uring.c:3458 [inline]
> __se_sys_io_uring_enter+0x1e5/0x1b70 io_uring/io_uring.c:3392
> __x64_sys_io_uring_enter+0x78/0x90 io_uring/io_uring.c:3392
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> value changed: 0x00000c75 -> 0x00000c76
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 1 PID: 20448 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00025-g3a93e40326c8 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
> ==================================================================
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000d7848305fa0fd413%40google.com.

2023-04-24 13:13:04

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [syzbot] [io-uring?] KCSAN: data-race in __io_fill_cqe_req / io_timeout

On 4/24/23 08:37, Dmitry Vyukov wrote:
> On Mon, 24 Apr 2023 at 09:19, syzbot
> <[email protected]> wrote:
>>
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit: 3a93e40326c8 Merge tag 'for-linus' of git://git.kernel.org..
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1280071ec80000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=f7350c77b8056a38
>> dashboard link: https://syzkaller.appspot.com/bug?extid=cb265db2f3f3468ef436
>> compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/2122926bc9fe/disk-3a93e403.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/8392992358bc/vmlinux-3a93e403.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/6398a2d19a7e/bzImage-3a93e403.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: [email protected]
>
> I did not fully grasp what happens here, but it looks suspicious.
> The comment in io_timeout() says that it computes "events that need to
> occur for this timeout event to be satisfied". But if the range is
> racing, it marks a random set of events?

Right, it's random in a good bunch of cases. There has never been
a fixed semantics for how it's supposed to work, i.e. cqe caching
infra should have changed it back in the day.

The good news, I don't think anyone uses it, and the worst case
scenario is userspace hangs on waiting. I'll drop data_race() /
spinlocks to make syzbot happy and would also say that we need
to deprecate and eventually to root it out.


>> ==================================================================
>> BUG: KCSAN: data-race in __io_fill_cqe_req / io_timeout
>>
>> read-write to 0xffff888108bf8310 of 4 bytes by task 20447 on cpu 0:
>> io_get_cqe_overflow io_uring/io_uring.h:112 [inline]
>> io_get_cqe io_uring/io_uring.h:124 [inline]
>> __io_fill_cqe_req+0x6c/0x4d0 io_uring/io_uring.h:137
>> io_fill_cqe_req io_uring/io_uring.h:165 [inline]
>> __io_req_complete_post+0x67/0x790 io_uring/io_uring.c:969
>> io_req_complete_post io_uring/io_uring.c:1006 [inline]
>> io_req_task_complete+0xb9/0x110 io_uring/io_uring.c:1654
>> handle_tw_list io_uring/io_uring.c:1184 [inline]
>> tctx_task_work+0x1fe/0x4d0 io_uring/io_uring.c:1246
>> task_work_run+0x123/0x160 kernel/task_work.c:179
>> get_signal+0xe5c/0xfe0 kernel/signal.c:2635
>> arch_do_signal_or_restart+0x89/0x2b0 arch/x86/kernel/signal.c:306
>> exit_to_user_mode_loop+0x6d/0xe0 kernel/entry/common.c:168
>> exit_to_user_mode_prepare+0x6a/0xa0 kernel/entry/common.c:204
>> __syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
>> syscall_exit_to_user_mode+0x26/0x140 kernel/entry/common.c:297
>> do_syscall_64+0x4d/0xc0 arch/x86/entry/common.c:86
>> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>
>> read to 0xffff888108bf8310 of 4 bytes by task 20448 on cpu 1:
>> io_timeout+0x88/0x270 io_uring/timeout.c:546
>> io_issue_sqe+0x147/0x660 io_uring/io_uring.c:1907
>> io_queue_sqe io_uring/io_uring.c:2079 [inline]
>> io_submit_sqe io_uring/io_uring.c:2340 [inline]
>> io_submit_sqes+0x689/0xfe0 io_uring/io_uring.c:2450
>> __do_sys_io_uring_enter io_uring/io_uring.c:3458 [inline]
>> __se_sys_io_uring_enter+0x1e5/0x1b70 io_uring/io_uring.c:3392
>> __x64_sys_io_uring_enter+0x78/0x90 io_uring/io_uring.c:3392
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x63/0xcd

--
Pavel Begunkov