2021-03-11 11:00:56

by syzbot

[permalink] [raw]
Subject: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle

Hello,

syzbot found the following issue on:

HEAD commit: a74e6a01 Merge tag 's390-5.12-3' of git://git.kernel.org/p..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=159f69ecd00000
kernel config: https://syzkaller.appspot.com/x/.config?x=de394bbaade74fb7
dashboard link: https://syzkaller.appspot.com/bug?extid=30774a6acf6a2cf6d535
compiler: Debian clang version 11.0.1-2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KCSAN: data-race in start_this_handle / start_this_handle

write to 0xffff888103880870 of 8 bytes by task 29956 on cpu 1:
jbd2_get_transaction fs/jbd2/transaction.c:125 [inline]
start_this_handle+0xceb/0x1010 fs/jbd2/transaction.c:400
jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
__ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
__ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
call_write_iter include/linux/fs.h:1977 [inline]
do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
do_iter_write+0x112/0x4c0 fs/read_write.c:866
vfs_iter_write+0x4c/0x70 fs/read_write.c:907
iter_file_splice_write+0x40a/0x750 fs/splice.c:689
do_splice_from fs/splice.c:767 [inline]
direct_splice_actor+0x80/0xa0 fs/splice.c:936
splice_direct_to_actor+0x345/0x650 fs/splice.c:891
do_splice_direct+0xf5/0x170 fs/splice.c:979
do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
__do_sys_sendfile64 fs/read_write.c:1319 [inline]
__se_sys_sendfile64 fs/read_write.c:1311 [inline]
__x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888103880870 of 8 bytes by task 29936 on cpu 0:
start_this_handle+0x1c1/0x1010 fs/jbd2/transaction.c:352
jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
__ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
__ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
call_write_iter include/linux/fs.h:1977 [inline]
do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
do_iter_write+0x112/0x4c0 fs/read_write.c:866
vfs_iter_write+0x4c/0x70 fs/read_write.c:907
iter_file_splice_write+0x40a/0x750 fs/splice.c:689
do_splice_from fs/splice.c:767 [inline]
direct_splice_actor+0x80/0xa0 fs/splice.c:936
splice_direct_to_actor+0x345/0x650 fs/splice.c:891
do_splice_direct+0xf5/0x170 fs/splice.c:979
do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
__do_sys_sendfile64 fs/read_write.c:1319 [inline]
__se_sys_sendfile64 fs/read_write.c:1311 [inline]
__x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 29936 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2021-03-11 14:26:54

by Jan Kara

[permalink] [raw]
Subject: Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle

On Thu 11-03-21 02:59:14, syzbot wrote:
> HEAD commit: a74e6a01 Merge tag 's390-5.12-3' of git://git.kernel.org/p..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=159f69ecd00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=de394bbaade74fb7
> dashboard link: https://syzkaller.appspot.com/bug?extid=30774a6acf6a2cf6d535
> compiler: Debian clang version 11.0.1-2
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> ==================================================================
> BUG: KCSAN: data-race in start_this_handle / start_this_handle
>
> write to 0xffff888103880870 of 8 bytes by task 29956 on cpu 1:
> jbd2_get_transaction fs/jbd2/transaction.c:125 [inline]
> start_this_handle+0xceb/0x1010 fs/jbd2/transaction.c:400
> jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
> __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
> __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
> ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
> generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
> ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
> ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
> call_write_iter include/linux/fs.h:1977 [inline]
> do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
> do_iter_write+0x112/0x4c0 fs/read_write.c:866
> vfs_iter_write+0x4c/0x70 fs/read_write.c:907
> iter_file_splice_write+0x40a/0x750 fs/splice.c:689
> do_splice_from fs/splice.c:767 [inline]
> direct_splice_actor+0x80/0xa0 fs/splice.c:936
> splice_direct_to_actor+0x345/0x650 fs/splice.c:891
> do_splice_direct+0xf5/0x170 fs/splice.c:979
> do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
> __do_sys_sendfile64 fs/read_write.c:1319 [inline]
> __se_sys_sendfile64 fs/read_write.c:1311 [inline]
> __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
> do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> read to 0xffff888103880870 of 8 bytes by task 29936 on cpu 0:
> start_this_handle+0x1c1/0x1010 fs/jbd2/transaction.c:352
> jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
> __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
> __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
> ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
> generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
> ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
> ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
> call_write_iter include/linux/fs.h:1977 [inline]
> do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
> do_iter_write+0x112/0x4c0 fs/read_write.c:866
> vfs_iter_write+0x4c/0x70 fs/read_write.c:907
> iter_file_splice_write+0x40a/0x750 fs/splice.c:689
> do_splice_from fs/splice.c:767 [inline]
> direct_splice_actor+0x80/0xa0 fs/splice.c:936
> splice_direct_to_actor+0x345/0x650 fs/splice.c:891
> do_splice_direct+0xf5/0x170 fs/splice.c:979
> do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
> __do_sys_sendfile64 fs/read_write.c:1319 [inline]
> __se_sys_sendfile64 fs/read_write.c:1311 [inline]
> __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
> do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 0 PID: 29936 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> ==================================================================

So this case is harmless. start_this_handle() does indeed check
journal->j_running_transaction without any protection and this is only a
racy check to opportunistically preallocate a transaction if we are likely
to need it. There was some macro to instruct KCSAN that the read is
actually fine, wasn't there?

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-03-11 14:54:31

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle

On Thu, Mar 11, 2021 at 3:25 PM Jan Kara <[email protected]> wrote:
>
> On Thu 11-03-21 02:59:14, syzbot wrote:
> > HEAD commit: a74e6a01 Merge tag 's390-5.12-3' of git://git.kernel.org/p..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=159f69ecd00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=de394bbaade74fb7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=30774a6acf6a2cf6d535
> > compiler: Debian clang version 11.0.1-2
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
> >
> > ==================================================================
> > BUG: KCSAN: data-race in start_this_handle / start_this_handle
> >
> > write to 0xffff888103880870 of 8 bytes by task 29956 on cpu 1:
> > jbd2_get_transaction fs/jbd2/transaction.c:125 [inline]
> > start_this_handle+0xceb/0x1010 fs/jbd2/transaction.c:400
> > jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
> > __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
> > __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
> > ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
> > generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
> > ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
> > ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
> > call_write_iter include/linux/fs.h:1977 [inline]
> > do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
> > do_iter_write+0x112/0x4c0 fs/read_write.c:866
> > vfs_iter_write+0x4c/0x70 fs/read_write.c:907
> > iter_file_splice_write+0x40a/0x750 fs/splice.c:689
> > do_splice_from fs/splice.c:767 [inline]
> > direct_splice_actor+0x80/0xa0 fs/splice.c:936
> > splice_direct_to_actor+0x345/0x650 fs/splice.c:891
> > do_splice_direct+0xf5/0x170 fs/splice.c:979
> > do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
> > __do_sys_sendfile64 fs/read_write.c:1319 [inline]
> > __se_sys_sendfile64 fs/read_write.c:1311 [inline]
> > __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
> > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
> > entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > read to 0xffff888103880870 of 8 bytes by task 29936 on cpu 0:
> > start_this_handle+0x1c1/0x1010 fs/jbd2/transaction.c:352
> > jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
> > __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
> > __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
> > ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
> > generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
> > ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
> > ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
> > call_write_iter include/linux/fs.h:1977 [inline]
> > do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
> > do_iter_write+0x112/0x4c0 fs/read_write.c:866
> > vfs_iter_write+0x4c/0x70 fs/read_write.c:907
> > iter_file_splice_write+0x40a/0x750 fs/splice.c:689
> > do_splice_from fs/splice.c:767 [inline]
> > direct_splice_actor+0x80/0xa0 fs/splice.c:936
> > splice_direct_to_actor+0x345/0x650 fs/splice.c:891
> > do_splice_direct+0xf5/0x170 fs/splice.c:979
> > do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
> > __do_sys_sendfile64 fs/read_write.c:1319 [inline]
> > __se_sys_sendfile64 fs/read_write.c:1311 [inline]
> > __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
> > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
> > entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > Reported by Kernel Concurrency Sanitizer on:
> > CPU: 0 PID: 29936 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > ==================================================================
>
> So this case is harmless. start_this_handle() does indeed check
> journal->j_running_transaction without any protection and this is only a
> racy check to opportunistically preallocate a transaction if we are likely
> to need it. There was some macro to instruct KCSAN that the read is
> actually fine, wasn't there?

+Marco
+Tetsuo, did you want to fix it?

2021-03-11 15:09:54

by Marco Elver

[permalink] [raw]
Subject: Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle

On Thu, Mar 11, 2021 at 03:53PM +0100, Dmitry Vyukov wrote:
> On Thu, Mar 11, 2021 at 3:25 PM Jan Kara <[email protected]> wrote:
> >
> > On Thu 11-03-21 02:59:14, syzbot wrote:
> > > HEAD commit: a74e6a01 Merge tag 's390-5.12-3' of git://git.kernel.org/p..
> > > git tree: upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=159f69ecd00000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=de394bbaade74fb7
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=30774a6acf6a2cf6d535
> > > compiler: Debian clang version 11.0.1-2
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: [email protected]
> > >
> > > ==================================================================
> > > BUG: KCSAN: data-race in start_this_handle / start_this_handle
> > >
> > > write to 0xffff888103880870 of 8 bytes by task 29956 on cpu 1:
> > > jbd2_get_transaction fs/jbd2/transaction.c:125 [inline]
> > > start_this_handle+0xceb/0x1010 fs/jbd2/transaction.c:400
> > > jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
> > > __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
> > > __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
> > > ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
> > > generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
> > > ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
> > > ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
> > > call_write_iter include/linux/fs.h:1977 [inline]
> > > do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
> > > do_iter_write+0x112/0x4c0 fs/read_write.c:866
> > > vfs_iter_write+0x4c/0x70 fs/read_write.c:907
> > > iter_file_splice_write+0x40a/0x750 fs/splice.c:689
> > > do_splice_from fs/splice.c:767 [inline]
> > > direct_splice_actor+0x80/0xa0 fs/splice.c:936
> > > splice_direct_to_actor+0x345/0x650 fs/splice.c:891
> > > do_splice_direct+0xf5/0x170 fs/splice.c:979
> > > do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
> > > __do_sys_sendfile64 fs/read_write.c:1319 [inline]
> > > __se_sys_sendfile64 fs/read_write.c:1311 [inline]
> > > __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
> > > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
> > > entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > read to 0xffff888103880870 of 8 bytes by task 29936 on cpu 0:
> > > start_this_handle+0x1c1/0x1010 fs/jbd2/transaction.c:352
> > > jbd2__journal_start+0x1fc/0x3f0 fs/jbd2/transaction.c:503
> > > __ext4_journal_start_sb+0x159/0x310 fs/ext4/ext4_jbd2.c:105
> > > __ext4_journal_start fs/ext4/ext4_jbd2.h:320 [inline]
> > > ext4_da_write_begin+0x460/0xaf0 fs/ext4/inode.c:2998
> > > generic_perform_write+0x196/0x3a0 mm/filemap.c:3575
> > > ext4_buffered_write_iter+0x2e5/0x3e0 fs/ext4/file.c:269
> > > ext4_file_write_iter+0x48a/0x10b0 fs/ext4/file.c:502
> > > call_write_iter include/linux/fs.h:1977 [inline]
> > > do_iter_readv_writev+0x2cb/0x360 fs/read_write.c:740
> > > do_iter_write+0x112/0x4c0 fs/read_write.c:866
> > > vfs_iter_write+0x4c/0x70 fs/read_write.c:907
> > > iter_file_splice_write+0x40a/0x750 fs/splice.c:689
> > > do_splice_from fs/splice.c:767 [inline]
> > > direct_splice_actor+0x80/0xa0 fs/splice.c:936
> > > splice_direct_to_actor+0x345/0x650 fs/splice.c:891
> > > do_splice_direct+0xf5/0x170 fs/splice.c:979
> > > do_sendfile+0x7a6/0xe20 fs/read_write.c:1260
> > > __do_sys_sendfile64 fs/read_write.c:1319 [inline]
> > > __se_sys_sendfile64 fs/read_write.c:1311 [inline]
> > > __x64_sys_sendfile64+0xa9/0x130 fs/read_write.c:1311
> > > do_syscall_64+0x39/0x80 arch/x86/entry/common.c:46
> > > entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > Reported by Kernel Concurrency Sanitizer on:
> > > CPU: 0 PID: 29936 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > > ==================================================================
> >
> > So this case is harmless. start_this_handle() does indeed check
> > journal->j_running_transaction without any protection and this is only a
> > racy check to opportunistically preallocate a transaction if we are likely
> > to need it. There was some macro to instruct KCSAN that the read is
> > actually fine, wasn't there?

If the outcome of the check does not affect correctness and the code is
entirely fault tolerant to the precise value being read, then a
data_race(!journal->j_running_transaction) marking here would be fine.

If in doubt, this is the latest summary for concurrent access markings:
https://lkml.kernel.org/r/20210220051000.GA457@paulmck-ThinkPad-P72

> +Marco
> +Tetsuo, did you want to fix it?

Thanks,
-- Marco

2021-03-11 15:31:40

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle

On Thu, Mar 11, 2021 at 04:08:30PM +0100, Marco Elver wrote:
> If the outcome of the check does not affect correctness and the code is
> entirely fault tolerant to the precise value being read, then a
> data_race(!journal->j_running_transaction) marking here would be fine.

So a very common coding pattern is to check a value w/o the lock, and
if it looks like we might need to check *with* a lock, we'll grab the
lock and recheck. Does KCSAN understand that this sort of thing is
safe automatically?

In thie particular case, it's a bit more complicated than that; we're
checking a value, and then allocating memory, grabbing the spin lock,
and then re-checking the value, so we don't have to drop the spinlock,
allocate the memory, grab the lock again, and then rechecking the
value. So even if KCSAN catches the simpler case as described above,
we still might need to explicitly mark the data_race explicitly.

But the more we could have the compiler automatically figure out
things without needing an explicit tag, it would seem to me that this
would be better, since manual tagging is going to be more error-prone.

Cheers,

- Ted

2021-03-11 15:56:21

by Marco Elver

[permalink] [raw]
Subject: Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle

+Cc Paul

On Thu, 11 Mar 2021 at 16:30, Theodore Ts'o <[email protected]> wrote:
>
> On Thu, Mar 11, 2021 at 04:08:30PM +0100, Marco Elver wrote:
> > If the outcome of the check does not affect correctness and the code is
> > entirely fault tolerant to the precise value being read, then a
> > data_race(!journal->j_running_transaction) marking here would be fine.
>
> So a very common coding pattern is to check a value w/o the lock, and
> if it looks like we might need to check *with* a lock, we'll grab the
> lock and recheck. Does KCSAN understand that this sort of thing is
> safe automatically?

It doesn't -- these are concurrency patterns that go way beyond the
scope of a data race detector. The main problem, as always with such
patterns, is that in one case it seems fine, yet in the next it isn't.

> In thie particular case, it's a bit more complicated than that; we're
> checking a value, and then allocating memory, grabbing the spin lock,
> and then re-checking the value, so we don't have to drop the spinlock,
> allocate the memory, grab the lock again, and then rechecking the
> value. So even if KCSAN catches the simpler case as described above,
> we still might need to explicitly mark the data_race explicitly.

This seems like a variation of double-checked locking.

> But the more we could have the compiler automatically figure out
> things without needing an explicit tag, it would seem to me that this
> would be better, since manual tagging is going to be more error-prone.

What you're alluding to here would go much further than a data race
detector ("data race" is still just defined by the memory model). The
wish that there was a static analysis tool that would automatically
understand the "concurrency semantics as intended by the developer" is
something that'd be nice to have, but just doesn't seem realistic.
Because how can a tool tell what the developer intended, without input
from that developer?

If there's worry marking accesses is error-prone, then that might be a
signal that the concurrency design is too complex (or the developer
hasn't considered all cases).

For that reason, we need to mark accesses to tell the compiler and
tooling where to expect concurrency, so that 1) the compiler generates
correct code, and 2) tooling such as KCSAN can double-check what the
developer intended is actually what's happening.

Thanks,
-- Marco

2021-03-19 17:24:51

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [syzbot] KCSAN: data-race in start_this_handle / start_this_handle

On Fri, Mar 19, 2021 at 11:15:42PM +0900, Tetsuo Handa wrote:
> On 2021/03/12 0:54, Marco Elver wrote:
> >> But the more we could have the compiler automatically figure out
> >> things without needing an explicit tag, it would seem to me that this
> >> would be better, since manual tagging is going to be more error-prone.
> >
> > What you're alluding to here would go much further than a data race
> > detector ("data race" is still just defined by the memory model). The
> > wish that there was a static analysis tool that would automatically
> > understand the "concurrency semantics as intended by the developer" is
> > something that'd be nice to have, but just doesn't seem realistic.
> > Because how can a tool tell what the developer intended, without input
> > from that developer?
>
> Input from developers is very important for not only compilers and tools
> but also allowing bug-explorers to understand what is happening.
> ext4 currently has
>
> possible deadlock in start_this_handle (2)
> https://syzkaller.appspot.com/bug?id=38c060d5757cbc13fdffd46e80557c645fbe79ba
>
> which even maintainers cannot understand what is happening.
> How can bug-explorers know implicit logic which maintainers believe safe and correct?
> It is possible that some oversight in implicit logic is the cause of
> "possible deadlock in start_this_handle (2)".
> Making implicit assumptions clear helps understanding.

Just to be clear, the above diagnostic is from lockdep rather than KCSAN.

According to the sample crash result, different code paths acquire
jdb2_handle and the __fs_reclaim_map in different orders. It looks
to me that __fs_reclaim_map isn't really a lock, but rather a mode
indicator. If so, lockdep should set it up accordingly, perhaps
in a manner similar to rcu_lock_map.

> Will "KCSAN: data-race in start_this_handle / start_this_handle" be addressed by marking?
> syzbot is already waiting for
> "KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata" at
> https://syzkaller.appspot.com/bug?id=5eb10023f53097f003e72c6a7c1a6f14b7c22929 .

The first thing is to work out what the code should be doing. What KCSAN
is saying is that a variable is being locklessly updated. Is it really
OK for that variable to be locklessly updated? If not, a larger fix
is required.

For more information, please see Marco's LWN series:
https://lwn.net/Articles/816850/ and https://lwn.net/Articles/816854/

Alternatively, you can refer to the documentation being proposed for
the Linux kernel tree:

https://lore.kernel.org/lkml/[email protected]/

> > If there's worry marking accesses is error-prone, then that might be a
> > signal that the concurrency design is too complex (or the developer
> > hasn't considered all cases).
> >
> > For that reason, we need to mark accesses to tell the compiler and
> > tooling where to expect concurrency, so that 1) the compiler generates
> > correct code, and 2) tooling such as KCSAN can double-check what the
> > developer intended is actually what's happening.
>
> and 3) bug-explorers can understand what the developers are assuming/missing.

If the above information doesn't help the bug explorers, please let me
know.

Thanx, Paul