2021-04-04 09:42:00

by Hao Sun

[permalink] [raw]
Subject: KCSAN: data-race in __jbd2_journal_file_buffer / jbd2_journal_dirty_metadata

Hi:

When using Healer(https://github.com/SunHao-0/healer/tree/dev) to fuzz
the Linux kernel, I found a data-race vulnerability in
__jbd2_journal_file_buffer / jbd2_journal_dirty_metadata.
Sorry, data-race is usually difficult to reproduce. I cannot provide
you with a reproducing program.
I hope that the call stack information in the crash log can help you
locate the problem.
Kernel config and full log can be found in the attachment.

Here is the detailed information:
commit: 3b9cdafb5358eb9f3790de2f728f765fef100731
version: linux 5.11
git tree: upstream
report:
==================================================================
BUG: KCSAN: data-race in __jbd2_journal_file_buffer /
jbd2_journal_dirty_metadata
write to 0xffff88800af6da38 of 8 bytes by task 4822 on cpu 1:
__jbd2_journal_file_buffer+0x18d/0x370 linux/fs/jbd2/transaction.c:2518
__jbd2_journal_refile_buffer+0x155/0x230 linux/fs/jbd2/transaction.c:2612
jbd2_journal_commit_transaction+0x24c6/0x3200 linux/fs/jbd2/commit.c:1084
kjournald2+0x253/0x470 linux/fs/jbd2/journal.c:213
kthread+0x1f0/0x220 linux/kernel/kthread.c:292
ret_from_fork+0x1f/0x30 linux/arch/x86/entry/entry_64.S:294


Attachments:
log0 (5.67 kB)
config (215.69 kB)
Download all attachments

2021-04-06 21:39:08

by Jan Kara

[permalink] [raw]
Subject: Re: KCSAN: data-race in __jbd2_journal_file_buffer / jbd2_journal_dirty_metadata

Hello!

On Sun 04-04-21 17:40:44, Hao Sun wrote:
> When using Healer(https://github.com/SunHao-0/healer/tree/dev) to fuzz
> the Linux kernel, I found a data-race vulnerability in
> __jbd2_journal_file_buffer / jbd2_journal_dirty_metadata.
> Sorry, data-race is usually difficult to reproduce. I cannot provide
> you with a reproducing program.
> I hope that the call stack information in the crash log can help you
> locate the problem.
> Kernel config and full log can be found in the attachment.
>
> Here is the detailed information:
> commit: 3b9cdafb5358eb9f3790de2f728f765fef100731
> version: linux 5.11
> git tree: upstream
> report:
> ==================================================================
> BUG: KCSAN: data-race in __jbd2_journal_file_buffer /
> jbd2_journal_dirty_metadata
> write to 0xffff88800af6da38 of 8 bytes by task 4822 on cpu 1:
> __jbd2_journal_file_buffer+0x18d/0x370 linux/fs/jbd2/transaction.c:2518
> __jbd2_journal_refile_buffer+0x155/0x230 linux/fs/jbd2/transaction.c:2612
> jbd2_journal_commit_transaction+0x24c6/0x3200 linux/fs/jbd2/commit.c:1084
> kjournald2+0x253/0x470 linux/fs/jbd2/journal.c:213
> kthread+0x1f0/0x220 linux/kernel/kthread.c:292
> ret_from_fork+0x1f/0x30 linux/arch/x86/entry/entry_64.S:294

Thanks for report but I'm not sure what KCSAN is complaining about - isn't
the report truncated? I'm missing 'read' part of the report... The complaint
is on line:

jh->b_transaction = transaction;

I would guess the complaint is because of the check:

/*
* This and the following assertions are unreliable since we may see jh
* in inconsistent state unless we grab bh_state lock. But this is
* crucial to catch bugs so let's do a reliable check until the
* lockless handling is fully proven.
*/
if (jh->b_transaction != transaction &&
jh->b_next_transaction != transaction) {

And the comment explains, why we do this unreliable check. Again, if we
wanted to silence KCSAN, we could use data_race() macro but AFAIU Ted isn't
very fond of that annotation.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-04-06 22:16:40

by Hao Sun

[permalink] [raw]
Subject: Re: KCSAN: data-race in __jbd2_journal_file_buffer / jbd2_journal_dirty_metadata

> Thanks for report but I'm not sure what KCSAN is complaining about - isn't the report truncated?

Yes, the full KCSAN report is available in the attached log file.
Here is the report :
==================================================================
BUG: KCSAN: data-race in __jbd2_journal_file_buffer /
jbd2_journal_dirty_metadata

write to 0xffff88800af6da38 of 8 bytes by task 4822 on cpu 1:
__jbd2_journal_file_buffer+0x18d/0x370
__jbd2_journal_refile_buffer+0x155/0x230
jbd2_journal_commit_transaction+0x24c6/0x3200
kjournald2+0x253/0x470
kthread+0x1f0/0x220
ret_from_fork+0x1f/0x30

read to 0xffff88800af6da38 of 8 bytes by task 1955 on cpu 0:
jbd2_journal_dirty_metadata+0x17f/0x670
__ext4_handle_dirty_metadata+0xc6/0x590
ext4_mark_iloc_dirty+0x12dd/0x16e0
__ext4_mark_inode_dirty+0x4d2/0x5d0
ext4_writepages+0x1262/0x1e50
do_writepages+0x7b/0x150
__writeback_single_inode+0x84/0x4e0
writeback_sb_inodes+0x69f/0x1020
__writeback_inodes_wb+0xb0/0x2a0
wb_writeback+0x290/0x650
wb_do_writeback+0x582/0x5d0
wb_workfn+0xb8/0x410
process_one_work+0x3e1/0x940
worker_thread+0x64a/0xaa0
kthread+0x1f0/0x220
ret_from_fork+0x1f/0x30

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 1955 Comm: kworker/u5:2 Not tainted 5.11.0+ #5


Sorry, I couldn't symbolize it because the original Linux binary was lost.
Do you think this is an actual bug?

Hao

Jan Kara <[email protected]> 于2021年4月6日周二 下午8:32写道:
>
> Hello!
>
> On Sun 04-04-21 17:40:44, Hao Sun wrote:
> > When using Healer(https://github.com/SunHao-0/healer/tree/dev) to fuzz
> > the Linux kernel, I found a data-race vulnerability in
> > __jbd2_journal_file_buffer / jbd2_journal_dirty_metadata.
> > Sorry, data-race is usually difficult to reproduce. I cannot provide
> > you with a reproducing program.
> > I hope that the call stack information in the crash log can help you
> > locate the problem.
> > Kernel config and full log can be found in the attachment.
> >
> > Here is the detailed information:
> > commit: 3b9cdafb5358eb9f3790de2f728f765fef100731
> > version: linux 5.11
> > git tree: upstream
> > report:
> > ==================================================================
> > BUG: KCSAN: data-race in __jbd2_journal_file_buffer /
> > jbd2_journal_dirty_metadata
> > write to 0xffff88800af6da38 of 8 bytes by task 4822 on cpu 1:
> > __jbd2_journal_file_buffer+0x18d/0x370 linux/fs/jbd2/transaction.c:2518
> > __jbd2_journal_refile_buffer+0x155/0x230 linux/fs/jbd2/transaction.c:2612
> > jbd2_journal_commit_transaction+0x24c6/0x3200 linux/fs/jbd2/commit.c:1084
> > kjournald2+0x253/0x470 linux/fs/jbd2/journal.c:213
> > kthread+0x1f0/0x220 linux/kernel/kthread.c:292
> > ret_from_fork+0x1f/0x30 linux/arch/x86/entry/entry_64.S:294
>
> Thanks for report but I'm not sure what KCSAN is complaining about - isn't
> the report truncated? I'm missing 'read' part of the report... The complaint
> is on line:
>
> jh->b_transaction = transaction;
>
> I would guess the complaint is because of the check:
>
> /*
> * This and the following assertions are unreliable since we may see jh
> * in inconsistent state unless we grab bh_state lock. But this is
> * crucial to catch bugs so let's do a reliable check until the
> * lockless handling is fully proven.
> */
> if (jh->b_transaction != transaction &&
> jh->b_next_transaction != transaction) {
>
> And the comment explains, why we do this unreliable check. Again, if we
> wanted to silence KCSAN, we could use data_race() macro but AFAIU Ted isn't
> very fond of that annotation.
>
> Honza
>
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2021-04-06 22:57:17

by Jan Kara

[permalink] [raw]
Subject: Re: KCSAN: data-race in __jbd2_journal_file_buffer / jbd2_journal_dirty_metadata

On Tue 06-04-21 21:27:48, Hao Sun wrote:
> > Thanks for report but I'm not sure what KCSAN is complaining about - isn't the report truncated?
>
> Yes, the full KCSAN report is available in the attached log file.

Sorry, I missed that in your original email.

> Here is the report :
> ==================================================================
> BUG: KCSAN: data-race in __jbd2_journal_file_buffer /
> jbd2_journal_dirty_metadata
>
> write to 0xffff88800af6da38 of 8 bytes by task 4822 on cpu 1:
> __jbd2_journal_file_buffer+0x18d/0x370
> __jbd2_journal_refile_buffer+0x155/0x230
> jbd2_journal_commit_transaction+0x24c6/0x3200
> kjournald2+0x253/0x470
> kthread+0x1f0/0x220
> ret_from_fork+0x1f/0x30
>
> read to 0xffff88800af6da38 of 8 bytes by task 1955 on cpu 0:
> jbd2_journal_dirty_metadata+0x17f/0x670
> __ext4_handle_dirty_metadata+0xc6/0x590
> ext4_mark_iloc_dirty+0x12dd/0x16e0
> __ext4_mark_inode_dirty+0x4d2/0x5d0
> ext4_writepages+0x1262/0x1e50
> do_writepages+0x7b/0x150
> __writeback_single_inode+0x84/0x4e0
> writeback_sb_inodes+0x69f/0x1020
> __writeback_inodes_wb+0xb0/0x2a0
> wb_writeback+0x290/0x650
> wb_do_writeback+0x582/0x5d0
> wb_workfn+0xb8/0x410
> process_one_work+0x3e1/0x940
> worker_thread+0x64a/0xaa0
> kthread+0x1f0/0x220
> ret_from_fork+0x1f/0x30
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 0 PID: 1955 Comm: kworker/u5:2 Not tainted 5.11.0+ #5
>
>
> Sorry, I couldn't symbolize it because the original Linux binary was lost.
> Do you think this is an actual bug?

So it is difficult to be 100% sure without knowing which particular access
caused the KCSAN warning but I'm quite confident it was caused by one of
unlocked accesses to jh->b_transaction in jbd2_journal_dirty_metadata().
And as the comments explain, these are only assertions which we redo under
proper lock if they look like they would fail. So the code is in fact
correct.

Honza

> Jan Kara <[email protected]> 于2021年4月6日周二 下午8:32写道:
> >
> > Hello!
> >
> > On Sun 04-04-21 17:40:44, Hao Sun wrote:
> > > When using Healer(https://github.com/SunHao-0/healer/tree/dev) to fuzz
> > > the Linux kernel, I found a data-race vulnerability in
> > > __jbd2_journal_file_buffer / jbd2_journal_dirty_metadata.
> > > Sorry, data-race is usually difficult to reproduce. I cannot provide
> > > you with a reproducing program.
> > > I hope that the call stack information in the crash log can help you
> > > locate the problem.
> > > Kernel config and full log can be found in the attachment.
> > >
> > > Here is the detailed information:
> > > commit: 3b9cdafb5358eb9f3790de2f728f765fef100731
> > > version: linux 5.11
> > > git tree: upstream
> > > report:
> > > ==================================================================
> > > BUG: KCSAN: data-race in __jbd2_journal_file_buffer /
> > > jbd2_journal_dirty_metadata
> > > write to 0xffff88800af6da38 of 8 bytes by task 4822 on cpu 1:
> > > __jbd2_journal_file_buffer+0x18d/0x370 linux/fs/jbd2/transaction.c:2518
> > > __jbd2_journal_refile_buffer+0x155/0x230 linux/fs/jbd2/transaction.c:2612
> > > jbd2_journal_commit_transaction+0x24c6/0x3200 linux/fs/jbd2/commit.c:1084
> > > kjournald2+0x253/0x470 linux/fs/jbd2/journal.c:213
> > > kthread+0x1f0/0x220 linux/kernel/kthread.c:292
> > > ret_from_fork+0x1f/0x30 linux/arch/x86/entry/entry_64.S:294
> >
> > Thanks for report but I'm not sure what KCSAN is complaining about - isn't
> > the report truncated? I'm missing 'read' part of the report... The complaint
> > is on line:
> >
> > jh->b_transaction = transaction;
> >
> > I would guess the complaint is because of the check:
> >
> > /*
> > * This and the following assertions are unreliable since we may see jh
> > * in inconsistent state unless we grab bh_state lock. But this is
> > * crucial to catch bugs so let's do a reliable check until the
> > * lockless handling is fully proven.
> > */
> > if (jh->b_transaction != transaction &&
> > jh->b_next_transaction != transaction) {
> >
> > And the comment explains, why we do this unreliable check. Again, if we
> > wanted to silence KCSAN, we could use data_race() macro but AFAIU Ted isn't
> > very fond of that annotation.
> >
> > Honza
> >
> > --
> > Jan Kara <[email protected]>
> > SUSE Labs, CR
--
Jan Kara <[email protected]>
SUSE Labs, CR

2021-04-07 05:35:50

by Theodore Ts'o

[permalink] [raw]
Subject: Re: KCSAN: data-race in __jbd2_journal_file_buffer / jbd2_journal_dirty_metadata

On Tue, Apr 06, 2021 at 02:32:33PM +0200, Jan Kara wrote:
> And the comment explains, why we do this unreliable check. Again, if we
> wanted to silence KCSAN, we could use data_race() macro but AFAIU Ted isn't
> very fond of that annotation.

I'm not fond of the data_race macro, but I like bogus KCSAN reports
even less. My main complaint is if we're going to have to put the
data_race() macro in place, we're going to need to annotate each
location with an explanation of why it's there (suppress a KCSAN false
positive), and why's it's safe. If it's only one or two places, it'll
probably be fine. If it's dozens, then I would say that KCSAN is
becoming a net negative in terms of making the Linux kernel code
maintainable.

- Ted

2021-04-13 06:23:39

by Marco Elver

[permalink] [raw]
Subject: Re: KCSAN: data-race in __jbd2_journal_file_buffer / jbd2_journal_dirty_metadata

On Tue, Apr 06, 2021 at 11:01AM -0400, Theodore Ts'o wrote:
> On Tue, Apr 06, 2021 at 02:32:33PM +0200, Jan Kara wrote:
> > And the comment explains, why we do this unreliable check. Again, if we
> > wanted to silence KCSAN, we could use data_race() macro but AFAIU Ted isn't
> > very fond of that annotation.
>
> I'm not fond of the data_race macro, but I like bogus KCSAN reports
> even less. My main complaint is if we're going to have to put the
> data_race() macro in place, we're going to need to annotate each
> location with an explanation of why it's there (suppress a KCSAN false
> positive), and why's it's safe. If it's only one or two places, it'll
> probably be fine. If it's dozens, then I would say that KCSAN is
> becoming a net negative in terms of making the Linux kernel code
> maintainable.

I've just seen the latest reports on these data races [1], but it seems
the more relevant context is here.
[1] https://lore.kernel.org/linux-ext4/[email protected]/

Let me try to put things in perspective.

No, we do not want maintainability to suffer. Whether or not documenting
the concurrency design via data_race() and a few comments is a negative
or positive is up to you. To me, it'd be a positive because I don't have
to guess what the code is trying to do because concurrent code rarely is
obvious. (In fairness, if you don't like to add comments, just a
data_race() without comment tells a reader more than now; perhaps they'd
then rummage in the git logs.)

Yes, there are currently lots of data-racy accesses in the kernel that
are mostly benign. Yet, they are data races in the memory model's eyes,
and every optimizing compiler is free to screw them up! For example a
lot of those plain read-modify-write bitops ("... |= ...").

Unfortunately tooling cannot determine without hints (like data_race())
whether or not those are safe, since the programmer's intent is unclear.
Crucially, the programmer's intent is also unclear to the compiler!
Which means the compiler _is_ free to screw up those operations.

If we could somehow precisely determine which plain accesses can race,
we'd solve a decades-old problem: optimizing compilers and concurrent
code do not get along. Therefore, C needed a memory model to sort out
this mess, which we have since C11. The Linux kernel, however, doesn't
play by those rules. The Linux Kernel Memory Model (LKMM) tries to
specify the rules the kernel can safely play by.

But since we have KCSAN, which initially tried to follow the LKMM
strictly, various feedback has resulted in taming KCSAN to a subset of
the LKMM. A lot of the data races that are left, yet appear benign,
simply have no obvious rules or patterns (otherwise we wouldn't have the
problem we have with optimizing compilers). I couldn't, in good
conscience, tame KCSAN based on poorly thought-out rules. Because we
know they're data races, and the compiler _is_ free to subject them to
concurrency-unsafe optimizations.

Because we knew that different codes will want different KCSAN exposure
until there is a de-facto LKMM that is to be followed everywhere (one
can dream), KCSAN has lots of knobs. They are described in detail here:
https://lwn.net/Articles/816854/

> I'm not fond of the data_race macro, but I like bogus KCSAN reports
> even less.

While the data_race() macro was meant to be exactly for this case, to
tell tooling "this data race is fine, even if the compiler messes it
up", if there are too many data races for you right now feel free to add
'KCSAN_SANITIZE_file.o := n' to the files you don't want checked. Or
even 'KCSAN_SANITIZE := n' to ignore all files in a directory. It would
avoid the robots sending you reports. Not ideal, but it'd give some time
to see how things evolve elsewhere if you'd rather avoid all this for
now.

Thanks,
-- Marco