2019-11-04 11:32:22

by syzbot

[permalink] [raw]
Subject: KCSAN: data-race in process_srcu / synchronize_srcu

Hello,

syzbot found the following crash on:

HEAD commit: 05f22368 x86, kcsan: Enable KCSAN for x86
git tree: https://github.com/google/ktsan.git kcsan
console output: https://syzkaller.appspot.com/x/log.txt?x=17ade7ef600000
kernel config: https://syzkaller.appspot.com/x/.config?x=87d111955f40591f
dashboard link: https://syzkaller.appspot.com/bug?extid=08f3e9d26e5541e1ecf2
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KCSAN: data-race in process_srcu / synchronize_srcu

write to 0xffffffff8604e8a0 of 8 bytes by task 17 on cpu 1:
srcu_gp_end kernel/rcu/srcutree.c:533 [inline]
srcu_advance_state kernel/rcu/srcutree.c:1146 [inline]
process_srcu+0x207/0x780 kernel/rcu/srcutree.c:1237
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffffffff8604e8a0 of 8 bytes by task 12515 on cpu 0:
srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: events_unbound fsnotify_connector_destroy_workfn
==================================================================
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: events_unbound fsnotify_connector_destroy_workfn
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xf5/0x159 lib/dump_stack.c:113
panic+0x210/0x640 kernel/panic.c:221
kcsan_report.cold+0xc/0x10 kernel/kcsan/report.c:302
__kcsan_setup_watchpoint+0x32e/0x4a0 kernel/kcsan/core.c:411
__tsan_read8 kernel/kcsan/kcsan.c:36 [inline]
__tsan_read8+0x2c/0x30 kernel/kcsan/kcsan.c:36
srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2019-11-04 11:33:51

by Marco Elver

[permalink] [raw]
Subject: Re: KCSAN: data-race in process_srcu / synchronize_srcu

+RCU folks, since this is probably a data race in RCU.

On Mon, 4 Nov 2019 at 12:29, syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: 05f22368 x86, kcsan: Enable KCSAN for x86
> git tree: https://github.com/google/ktsan.git kcsan
> console output: https://syzkaller.appspot.com/x/log.txt?x=17ade7ef600000
> kernel config: https://syzkaller.appspot.com/x/.config?x=87d111955f40591f
> dashboard link: https://syzkaller.appspot.com/bug?extid=08f3e9d26e5541e1ecf2
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]
>
> ==================================================================
> BUG: KCSAN: data-race in process_srcu / synchronize_srcu
>
> write to 0xffffffff8604e8a0 of 8 bytes by task 17 on cpu 1:
> srcu_gp_end kernel/rcu/srcutree.c:533 [inline]
> srcu_advance_state kernel/rcu/srcutree.c:1146 [inline]
> process_srcu+0x207/0x780 kernel/rcu/srcutree.c:1237
> process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
>
> read to 0xffffffff8604e8a0 of 8 bytes by task 12515 on cpu 0:
> srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
> synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
> fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
> process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: events_unbound fsnotify_connector_destroy_workfn
> ==================================================================
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: events_unbound fsnotify_connector_destroy_workfn
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0xf5/0x159 lib/dump_stack.c:113
> panic+0x210/0x640 kernel/panic.c:221
> kcsan_report.cold+0xc/0x10 kernel/kcsan/report.c:302
> __kcsan_setup_watchpoint+0x32e/0x4a0 kernel/kcsan/core.c:411
> __tsan_read8 kernel/kcsan/kcsan.c:36 [inline]
> __tsan_read8+0x2c/0x30 kernel/kcsan/kcsan.c:36
> srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
> synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
> fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
> process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

2019-11-04 16:13:22

by Paul E. McKenney

[permalink] [raw]
Subject: Re: KCSAN: data-race in process_srcu / synchronize_srcu

On Mon, Nov 04, 2019 at 12:31:56PM +0100, Marco Elver wrote:
> +RCU folks, since this is probably a data race in RCU.
>
> On Mon, 4 Nov 2019 at 12:29, syzbot
> <[email protected]> wrote:
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit: 05f22368 x86, kcsan: Enable KCSAN for x86
> > git tree: https://github.com/google/ktsan.git kcsan
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17ade7ef600000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=87d111955f40591f
> > dashboard link: https://syzkaller.appspot.com/bug?extid=08f3e9d26e5541e1ecf2
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Unfortunately, I don't have any reproducer for this crash yet.
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: [email protected]
> >
> > ==================================================================
> > BUG: KCSAN: data-race in process_srcu / synchronize_srcu
> >
> > write to 0xffffffff8604e8a0 of 8 bytes by task 17 on cpu 1:
> > srcu_gp_end kernel/rcu/srcutree.c:533 [inline]
> > srcu_advance_state kernel/rcu/srcutree.c:1146 [inline]
> > process_srcu+0x207/0x780 kernel/rcu/srcutree.c:1237
> > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> >
> > read to 0xffffffff8604e8a0 of 8 bytes by task 12515 on cpu 0:
> > srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
> > synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
> > fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
> > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> >
> > Reported by Kernel Concurrency Sanitizer on:
> > CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > Workqueue: events_unbound fsnotify_connector_destroy_workfn
> > ==================================================================
> > Kernel panic - not syncing: panic_on_warn set ...
> > CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > Workqueue: events_unbound fsnotify_connector_destroy_workfn
> > Call Trace:
> > __dump_stack lib/dump_stack.c:77 [inline]
> > dump_stack+0xf5/0x159 lib/dump_stack.c:113
> > panic+0x210/0x640 kernel/panic.c:221
> > kcsan_report.cold+0xc/0x10 kernel/kcsan/report.c:302
> > __kcsan_setup_watchpoint+0x32e/0x4a0 kernel/kcsan/core.c:411
> > __tsan_read8 kernel/kcsan/kcsan.c:36 [inline]
> > __tsan_read8+0x2c/0x30 kernel/kcsan/kcsan.c:36
> > srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
> > synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
> > fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
> > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > Kernel Offset: disabled
> > Rebooting in 86400 seconds..
> >
> >
> > ---
> > This bug is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at [email protected].
> >
> > syzbot will keep track of this bug report. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Good catch! Does the patch below help?

Thanx, Paul

------------------------------------------------------------------------

commit d2ab457602a59353b8853352735dfd094d223480
Author: Paul E. McKenney <[email protected]>
Date: Mon Nov 4 08:08:30 2019 -0800

srcu: Apply *_ONCE() to ->srcu_last_gp_end

The ->srcu_last_gp_end field is accessed from any CPU at any time
by synchronize_srcu(), so non-initialization references need to use
READ_ONCE() and WRITE_ONCE(). This commit therefore makes that change.

Reported-by: [email protected]
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index d0a9d5b..657e6a7 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -530,7 +530,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
idx = rcu_seq_state(ssp->srcu_gp_seq);
WARN_ON_ONCE(idx != SRCU_STATE_SCAN2);
cbdelay = srcu_get_delay(ssp);
- ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
+ WRITE_ONCE(ssp->srcu_last_gp_end, ktime_get_mono_fast_ns());
rcu_seq_end(&ssp->srcu_gp_seq);
gpseq = rcu_seq_current(&ssp->srcu_gp_seq);
if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq))
@@ -762,6 +762,7 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
unsigned long flags;
struct srcu_data *sdp;
unsigned long t;
+ unsigned long tlast;

/* If the local srcu_data structure has callbacks, not idle. */
local_irq_save(flags);
@@ -780,9 +781,9 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)

/* First, see if enough time has passed since the last GP. */
t = ktime_get_mono_fast_ns();
+ tlast = READ_ONCE(ssp->srcu_last_gp_end);
if (exp_holdoff == 0 ||
- time_in_range_open(t, ssp->srcu_last_gp_end,
- ssp->srcu_last_gp_end + exp_holdoff))
+ time_in_range_open(t, tlast, tlast + exp_holdoff))
return false; /* Too soon after last GP. */

/* Next, check for probable idleness. */

2019-11-04 16:16:31

by Marco Elver

[permalink] [raw]
Subject: Re: KCSAN: data-race in process_srcu / synchronize_srcu

On Mon, 4 Nov 2019 at 17:11, Paul E. McKenney <[email protected]> wrote:
>
> On Mon, Nov 04, 2019 at 12:31:56PM +0100, Marco Elver wrote:
> > +RCU folks, since this is probably a data race in RCU.
> >
> > On Mon, 4 Nov 2019 at 12:29, syzbot
> > <[email protected]> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit: 05f22368 x86, kcsan: Enable KCSAN for x86
> > > git tree: https://github.com/google/ktsan.git kcsan
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=17ade7ef600000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=87d111955f40591f
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=08f3e9d26e5541e1ecf2
> > > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: [email protected]
> > >
> > > ==================================================================
> > > BUG: KCSAN: data-race in process_srcu / synchronize_srcu
> > >
> > > write to 0xffffffff8604e8a0 of 8 bytes by task 17 on cpu 1:
> > > srcu_gp_end kernel/rcu/srcutree.c:533 [inline]
> > > srcu_advance_state kernel/rcu/srcutree.c:1146 [inline]
> > > process_srcu+0x207/0x780 kernel/rcu/srcutree.c:1237
> > > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > >
> > > read to 0xffffffff8604e8a0 of 8 bytes by task 12515 on cpu 0:
> > > srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
> > > synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
> > > fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
> > > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > >
> > > Reported by Kernel Concurrency Sanitizer on:
> > > CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > Google 01/01/2011
> > > Workqueue: events_unbound fsnotify_connector_destroy_workfn
> > > ==================================================================
> > > Kernel panic - not syncing: panic_on_warn set ...
> > > CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > Google 01/01/2011
> > > Workqueue: events_unbound fsnotify_connector_destroy_workfn
> > > Call Trace:
> > > __dump_stack lib/dump_stack.c:77 [inline]
> > > dump_stack+0xf5/0x159 lib/dump_stack.c:113
> > > panic+0x210/0x640 kernel/panic.c:221
> > > kcsan_report.cold+0xc/0x10 kernel/kcsan/report.c:302
> > > __kcsan_setup_watchpoint+0x32e/0x4a0 kernel/kcsan/core.c:411
> > > __tsan_read8 kernel/kcsan/kcsan.c:36 [inline]
> > > __tsan_read8+0x2c/0x30 kernel/kcsan/kcsan.c:36
> > > srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
> > > synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
> > > fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
> > > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > > Kernel Offset: disabled
> > > Rebooting in 86400 seconds..
> > >
> > >
> > > ---
> > > This bug is generated by a bot. It may contain errors.
> > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > syzbot engineers can be reached at [email protected].
> > >
> > > syzbot will keep track of this bug report. See:
> > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> Good catch! Does the patch below help?

Looks good to me, thanks for the quick fix.

Acked-by: Marco Elver <[email protected]>

Thanks,
-- Marco

> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit d2ab457602a59353b8853352735dfd094d223480
> Author: Paul E. McKenney <[email protected]>
> Date: Mon Nov 4 08:08:30 2019 -0800
>
> srcu: Apply *_ONCE() to ->srcu_last_gp_end
>
> The ->srcu_last_gp_end field is accessed from any CPU at any time
> by synchronize_srcu(), so non-initialization references need to use
> READ_ONCE() and WRITE_ONCE(). This commit therefore makes that change.
>
> Reported-by: [email protected]
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index d0a9d5b..657e6a7 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -530,7 +530,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
> idx = rcu_seq_state(ssp->srcu_gp_seq);
> WARN_ON_ONCE(idx != SRCU_STATE_SCAN2);
> cbdelay = srcu_get_delay(ssp);
> - ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
> + WRITE_ONCE(ssp->srcu_last_gp_end, ktime_get_mono_fast_ns());
> rcu_seq_end(&ssp->srcu_gp_seq);
> gpseq = rcu_seq_current(&ssp->srcu_gp_seq);
> if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq))
> @@ -762,6 +762,7 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
> unsigned long flags;
> struct srcu_data *sdp;
> unsigned long t;
> + unsigned long tlast;
>
> /* If the local srcu_data structure has callbacks, not idle. */
> local_irq_save(flags);
> @@ -780,9 +781,9 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
>
> /* First, see if enough time has passed since the last GP. */
> t = ktime_get_mono_fast_ns();
> + tlast = READ_ONCE(ssp->srcu_last_gp_end);
> if (exp_holdoff == 0 ||
> - time_in_range_open(t, ssp->srcu_last_gp_end,
> - ssp->srcu_last_gp_end + exp_holdoff))
> + time_in_range_open(t, tlast, tlast + exp_holdoff))
> return false; /* Too soon after last GP. */
>
> /* Next, check for probable idleness. */

2019-11-04 16:57:00

by Paul E. McKenney

[permalink] [raw]
Subject: Re: KCSAN: data-race in process_srcu / synchronize_srcu

On Mon, Nov 04, 2019 at 05:14:50PM +0100, Marco Elver wrote:
> On Mon, 4 Nov 2019 at 17:11, Paul E. McKenney <[email protected]> wrote:
> >
> > On Mon, Nov 04, 2019 at 12:31:56PM +0100, Marco Elver wrote:
> > > +RCU folks, since this is probably a data race in RCU.
> > >
> > > On Mon, 4 Nov 2019 at 12:29, syzbot
> > > <[email protected]> wrote:
> > > >
> > > > Hello,
> > > >
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit: 05f22368 x86, kcsan: Enable KCSAN for x86
> > > > git tree: https://github.com/google/ktsan.git kcsan
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17ade7ef600000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=87d111955f40591f
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=08f3e9d26e5541e1ecf2
> > > > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > > >
> > > > Unfortunately, I don't have any reproducer for this crash yet.
> > > >
> > > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > > Reported-by: [email protected]
> > > >
> > > > ==================================================================
> > > > BUG: KCSAN: data-race in process_srcu / synchronize_srcu
> > > >
> > > > write to 0xffffffff8604e8a0 of 8 bytes by task 17 on cpu 1:
> > > > srcu_gp_end kernel/rcu/srcutree.c:533 [inline]
> > > > srcu_advance_state kernel/rcu/srcutree.c:1146 [inline]
> > > > process_srcu+0x207/0x780 kernel/rcu/srcutree.c:1237
> > > > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > > > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > > > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > > >
> > > > read to 0xffffffff8604e8a0 of 8 bytes by task 12515 on cpu 0:
> > > > srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
> > > > synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
> > > > fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
> > > > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > > > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > > > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > > >
> > > > Reported by Kernel Concurrency Sanitizer on:
> > > > CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > > Google 01/01/2011
> > > > Workqueue: events_unbound fsnotify_connector_destroy_workfn
> > > > ==================================================================
> > > > Kernel panic - not syncing: panic_on_warn set ...
> > > > CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > > Google 01/01/2011
> > > > Workqueue: events_unbound fsnotify_connector_destroy_workfn
> > > > Call Trace:
> > > > __dump_stack lib/dump_stack.c:77 [inline]
> > > > dump_stack+0xf5/0x159 lib/dump_stack.c:113
> > > > panic+0x210/0x640 kernel/panic.c:221
> > > > kcsan_report.cold+0xc/0x10 kernel/kcsan/report.c:302
> > > > __kcsan_setup_watchpoint+0x32e/0x4a0 kernel/kcsan/core.c:411
> > > > __tsan_read8 kernel/kcsan/kcsan.c:36 [inline]
> > > > __tsan_read8+0x2c/0x30 kernel/kcsan/kcsan.c:36
> > > > srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
> > > > synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
> > > > fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
> > > > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > > > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > > > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > > > Kernel Offset: disabled
> > > > Rebooting in 86400 seconds..
> > > >
> > > >
> > > > ---
> > > > This bug is generated by a bot. It may contain errors.
> > > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > > syzbot engineers can be reached at [email protected].
> > > >
> > > > syzbot will keep track of this bug report. See:
> > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> > Good catch! Does the patch below help?
>
> Looks good to me, thanks for the quick fix.
>
> Acked-by: Marco Elver <[email protected]>

Applied, thank you!

Thanx, Paul

> Thanks,
> -- Marco
>
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit d2ab457602a59353b8853352735dfd094d223480
> > Author: Paul E. McKenney <[email protected]>
> > Date: Mon Nov 4 08:08:30 2019 -0800
> >
> > srcu: Apply *_ONCE() to ->srcu_last_gp_end
> >
> > The ->srcu_last_gp_end field is accessed from any CPU at any time
> > by synchronize_srcu(), so non-initialization references need to use
> > READ_ONCE() and WRITE_ONCE(). This commit therefore makes that change.
> >
> > Reported-by: [email protected]
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index d0a9d5b..657e6a7 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -530,7 +530,7 @@ static void srcu_gp_end(struct srcu_struct *ssp)
> > idx = rcu_seq_state(ssp->srcu_gp_seq);
> > WARN_ON_ONCE(idx != SRCU_STATE_SCAN2);
> > cbdelay = srcu_get_delay(ssp);
> > - ssp->srcu_last_gp_end = ktime_get_mono_fast_ns();
> > + WRITE_ONCE(ssp->srcu_last_gp_end, ktime_get_mono_fast_ns());
> > rcu_seq_end(&ssp->srcu_gp_seq);
> > gpseq = rcu_seq_current(&ssp->srcu_gp_seq);
> > if (ULONG_CMP_LT(ssp->srcu_gp_seq_needed_exp, gpseq))
> > @@ -762,6 +762,7 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
> > unsigned long flags;
> > struct srcu_data *sdp;
> > unsigned long t;
> > + unsigned long tlast;
> >
> > /* If the local srcu_data structure has callbacks, not idle. */
> > local_irq_save(flags);
> > @@ -780,9 +781,9 @@ static bool srcu_might_be_idle(struct srcu_struct *ssp)
> >
> > /* First, see if enough time has passed since the last GP. */
> > t = ktime_get_mono_fast_ns();
> > + tlast = READ_ONCE(ssp->srcu_last_gp_end);
> > if (exp_holdoff == 0 ||
> > - time_in_range_open(t, ssp->srcu_last_gp_end,
> > - ssp->srcu_last_gp_end + exp_holdoff))
> > + time_in_range_open(t, tlast, tlast + exp_holdoff))
> > return false; /* Too soon after last GP. */
> >
> > /* Next, check for probable idleness. */

2019-11-19 02:50:09

by Eric Biggers

[permalink] [raw]
Subject: Re: KCSAN: data-race in process_srcu / synchronize_srcu

On Mon, Nov 04, 2019 at 08:53:03AM -0800, Paul E. McKenney wrote:
> On Mon, Nov 04, 2019 at 05:14:50PM +0100, Marco Elver wrote:
> > On Mon, 4 Nov 2019 at 17:11, Paul E. McKenney <[email protected]> wrote:
> > >
> > > On Mon, Nov 04, 2019 at 12:31:56PM +0100, Marco Elver wrote:
> > > > +RCU folks, since this is probably a data race in RCU.
> > > >
> > > > On Mon, 4 Nov 2019 at 12:29, syzbot
> > > > <[email protected]> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > syzbot found the following crash on:
> > > > >
> > > > > HEAD commit: 05f22368 x86, kcsan: Enable KCSAN for x86
> > > > > git tree: https://github.com/google/ktsan.git kcsan
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17ade7ef600000
> > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=87d111955f40591f
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=08f3e9d26e5541e1ecf2
> > > > > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > > > >
> > > > > Unfortunately, I don't have any reproducer for this crash yet.
> > > > >
> > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > > > Reported-by: [email protected]
> > > > >
> > > > > ==================================================================
> > > > > BUG: KCSAN: data-race in process_srcu / synchronize_srcu
> > > > >
> > > > > write to 0xffffffff8604e8a0 of 8 bytes by task 17 on cpu 1:
> > > > > srcu_gp_end kernel/rcu/srcutree.c:533 [inline]
> > > > > srcu_advance_state kernel/rcu/srcutree.c:1146 [inline]
> > > > > process_srcu+0x207/0x780 kernel/rcu/srcutree.c:1237
> > > > > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > > > > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > > > > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > > > >
> > > > > read to 0xffffffff8604e8a0 of 8 bytes by task 12515 on cpu 0:
> > > > > srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
> > > > > synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
> > > > > fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
> > > > > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > > > > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > > > > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > > > >
> > > > > Reported by Kernel Concurrency Sanitizer on:
> > > > > CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
> > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > > > Google 01/01/2011
> > > > > Workqueue: events_unbound fsnotify_connector_destroy_workfn
> > > > > ==================================================================
> > > > > Kernel panic - not syncing: panic_on_warn set ...
> > > > > CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
> > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > > > Google 01/01/2011
> > > > > Workqueue: events_unbound fsnotify_connector_destroy_workfn
> > > > > Call Trace:
> > > > > __dump_stack lib/dump_stack.c:77 [inline]
> > > > > dump_stack+0xf5/0x159 lib/dump_stack.c:113
> > > > > panic+0x210/0x640 kernel/panic.c:221
> > > > > kcsan_report.cold+0xc/0x10 kernel/kcsan/report.c:302
> > > > > __kcsan_setup_watchpoint+0x32e/0x4a0 kernel/kcsan/core.c:411
> > > > > __tsan_read8 kernel/kcsan/kcsan.c:36 [inline]
> > > > > __tsan_read8+0x2c/0x30 kernel/kcsan/kcsan.c:36
> > > > > srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
> > > > > synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
> > > > > fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
> > > > > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > > > > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > > > > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > > > > Kernel Offset: disabled
> > > > > Rebooting in 86400 seconds..
> > > > >
> > > > >
> > > > > ---
> > > > > This bug is generated by a bot. It may contain errors.
> > > > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > > > syzbot engineers can be reached at [email protected].
> > > > >
> > > > > syzbot will keep track of this bug report. See:
> > > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > >
> > > Good catch! Does the patch below help?
> >
> > Looks good to me, thanks for the quick fix.
> >
> > Acked-by: Marco Elver <[email protected]>
>
> Applied, thank you!
>
> Thanx, Paul

This patch is not in linux-next. What happened to it?

- Eric

2019-11-19 03:57:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: KCSAN: data-race in process_srcu / synchronize_srcu

On Mon, Nov 18, 2019 at 06:48:05PM -0800, Eric Biggers wrote:
> On Mon, Nov 04, 2019 at 08:53:03AM -0800, Paul E. McKenney wrote:
> > On Mon, Nov 04, 2019 at 05:14:50PM +0100, Marco Elver wrote:
> > > On Mon, 4 Nov 2019 at 17:11, Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > On Mon, Nov 04, 2019 at 12:31:56PM +0100, Marco Elver wrote:
> > > > > +RCU folks, since this is probably a data race in RCU.
> > > > >
> > > > > On Mon, 4 Nov 2019 at 12:29, syzbot
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > syzbot found the following crash on:
> > > > > >
> > > > > > HEAD commit: 05f22368 x86, kcsan: Enable KCSAN for x86
> > > > > > git tree: https://github.com/google/ktsan.git kcsan
> > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17ade7ef600000
> > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=87d111955f40591f
> > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=08f3e9d26e5541e1ecf2
> > > > > > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > > > > >
> > > > > > Unfortunately, I don't have any reproducer for this crash yet.
> > > > > >
> > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > > > > Reported-by: [email protected]
> > > > > >
> > > > > > ==================================================================
> > > > > > BUG: KCSAN: data-race in process_srcu / synchronize_srcu
> > > > > >
> > > > > > write to 0xffffffff8604e8a0 of 8 bytes by task 17 on cpu 1:
> > > > > > srcu_gp_end kernel/rcu/srcutree.c:533 [inline]
> > > > > > srcu_advance_state kernel/rcu/srcutree.c:1146 [inline]
> > > > > > process_srcu+0x207/0x780 kernel/rcu/srcutree.c:1237
> > > > > > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > > > > > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > > > > > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > > > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > > > > >
> > > > > > read to 0xffffffff8604e8a0 of 8 bytes by task 12515 on cpu 0:
> > > > > > srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
> > > > > > synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
> > > > > > fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
> > > > > > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > > > > > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > > > > > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > > > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > > > > >
> > > > > > Reported by Kernel Concurrency Sanitizer on:
> > > > > > CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
> > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > > > > Google 01/01/2011
> > > > > > Workqueue: events_unbound fsnotify_connector_destroy_workfn
> > > > > > ==================================================================
> > > > > > Kernel panic - not syncing: panic_on_warn set ...
> > > > > > CPU: 0 PID: 12515 Comm: kworker/u4:8 Not tainted 5.4.0-rc3+ #0
> > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > > > > Google 01/01/2011
> > > > > > Workqueue: events_unbound fsnotify_connector_destroy_workfn
> > > > > > Call Trace:
> > > > > > __dump_stack lib/dump_stack.c:77 [inline]
> > > > > > dump_stack+0xf5/0x159 lib/dump_stack.c:113
> > > > > > panic+0x210/0x640 kernel/panic.c:221
> > > > > > kcsan_report.cold+0xc/0x10 kernel/kcsan/report.c:302
> > > > > > __kcsan_setup_watchpoint+0x32e/0x4a0 kernel/kcsan/core.c:411
> > > > > > __tsan_read8 kernel/kcsan/kcsan.c:36 [inline]
> > > > > > __tsan_read8+0x2c/0x30 kernel/kcsan/kcsan.c:36
> > > > > > srcu_might_be_idle kernel/rcu/srcutree.c:784 [inline]
> > > > > > synchronize_srcu+0x107/0x214 kernel/rcu/srcutree.c:996
> > > > > > fsnotify_connector_destroy_workfn+0x63/0xb0 fs/notify/mark.c:164
> > > > > > process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
> > > > > > worker_thread+0xa0/0x800 kernel/workqueue.c:2415
> > > > > > kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
> > > > > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
> > > > > > Kernel Offset: disabled
> > > > > > Rebooting in 86400 seconds..
> > > > > >
> > > > > >
> > > > > > ---
> > > > > > This bug is generated by a bot. It may contain errors.
> > > > > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > > > > syzbot engineers can be reached at [email protected].
> > > > > >
> > > > > > syzbot will keep track of this bug report. See:
> > > > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > > >
> > > > Good catch! Does the patch below help?
> > >
> > > Looks good to me, thanks for the quick fix.
> > >
> > > Acked-by: Marco Elver <[email protected]>
> >
> > Applied, thank you!
> >
> > Thanx, Paul
>
> This patch is not in linux-next. What happened to it?

It is in -rcu, but arrived too late to make my pull request for
v5.5. It will hit -next shortly after v5.5-rc1.

Thanx, Paul