2020-08-07 19:37:06

by Peter Zijlstra

[permalink] [raw]
Subject: [RFC][PATCH 0/3] Tracing, Idle, RCU and such goodness

Hi,

Syzbot report:

https://lkml.kernel.org/r/[email protected]

triggered a whole bunch of fallout.

These 3 patches are in reverse order of discovery. With them applied the
reproducer no longer triggers and the machine boots without errors.

I also spend half the day trying to rewrite TRACE_IRQFLAGS, but so far haven't
had success. I have the feeling that there's something slightly simpler hiding
in there, but so far it doesn't want to come out.

Please, look carefully, this is all tricky code, and it's been a long and hot
day here, mistakes are almost guaranteed :/

I also want to remove the _rcuidle() from trace_preemptirq.c, but given
this was quite enough, I didn't touch those.


2020-08-10 09:57:42

by Marco Elver

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/3] Tracing, Idle, RCU and such goodness

On Fri, Aug 07, 2020 at 09:23PM +0200, Peter Zijlstra wrote:
> Hi,
>
> Syzbot report:
>
> https://lkml.kernel.org/r/[email protected]
>
> triggered a whole bunch of fallout.
>
> These 3 patches are in reverse order of discovery. With them applied the
> reproducer no longer triggers and the machine boots without errors.
>
> I also spend half the day trying to rewrite TRACE_IRQFLAGS, but so far haven't
> had success. I have the feeling that there's something slightly simpler hiding
> in there, but so far it doesn't want to come out.
>
> Please, look carefully, this is all tricky code, and it's been a long and hot
> day here, mistakes are almost guaranteed :/
>
> I also want to remove the _rcuidle() from trace_preemptirq.c, but given
> this was quite enough, I didn't touch those.

Unfortunately I get LOCKDEP_DEBUG warnings, when testing with one of
syzbot's configs. This appears at some point during boot (no other
test):

DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4875 check_flags+0x52/0x200 kernel/locking/lockdep.c:4875
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.0+ #8
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
RIP: 0010:check_flags+0x52/0x200 kernel/locking/lockdep.c:4875
Code: c0 74 48 e8 40 f7 a5 03 85 c0 74 20 83 3d bd 33 53 05 00 75 17 48 c7 c7 ce 12 c9 85 48 c7 c6 04 8c ca 85 31 c0 e8 4e c1 f7 ff <0f> 0b 48 c7 c7 db 8b ca 85 e9 8c 01 00 00 85 c0 75 0f 65 8b 05 c5
RSP: 0000:ffffffff86203bf0 EFLAGS: 00010046
RAX: 1f9fa70771052a00 RBX: 0000000000000000 RCX: ffffffff8627e980
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff813d8494 R09: 0000000000000000
R10: 0000000000000012 R11: ffffffff8627e980 R12: ffffffff8627e980
R13: ffffffff862e3420 R14: 00000000ffffffff R15: 0000000000000046
FS: 0000000000000000(0000) GS:ffff88881fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88883ffff000 CR3: 0000000006275001 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 00000000
Call Trace:
lock_is_held_type+0x4d/0x110 kernel/locking/lockdep.c:5043
lock_is_held include/linux/lockdep.h:266 [inline]
rcu_read_lock_sched_held+0x41/0x90 kernel/rcu/update.c:136
trace_lock_acquire+0x49/0x120 include/trace/events/lock.h:13
lock_acquire+0x8d/0x2c0 kernel/locking/lockdep.c:5003
mutex_trylock+0xdd/0x120 kernel/locking/mutex.c:1420
exp_funnel_lock kernel/rcu/tree_exp.h:290 [inline]
synchronize_rcu_expedited+0x19d/0x400 kernel/rcu/tree_exp.h:838
synchronize_rcu+0xa4/0xe0 kernel/rcu/tree.c:3574
rcu_test_sync_prims+0x5/0x10 kernel/rcu/update.c:234
rest_init+0xc/0x270 init/main.c:665
start_kernel+0x445/0x4bd init/main.c:1045
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
irq event stamp: 6599
hardirqs last enabled at (6599): [<ffffffff812f6a2a>] console_unlock+0x6fa/0x7d0 kernel/printk/printk.c:2530
hardirqs last disabled at (6598): [<ffffffff812f6420>] console_unlock+0xf0/0x7d0 kernel/printk/printk.c:2441
softirqs last enabled at (5810): [<ffffffff84d2e66d>] common_interrupt+0x1d/0x240 arch/x86/kernel/irq.c:239
softirqs last disabled at (5809): [<ffffffff812598b0>] local_bh_disable+0x0/0x20 net/mptcp/protocol.c:1301
---[ end trace f3de5b6e614e349e ]---
possible reason: unannotated irqs-off.
irq event stamp: 6599
hardirqs last enabled at (6599): [<ffffffff812f6a2a>] console_unlock+0x6fa/0x7d0 kernel/printk/printk.c:2530
hardirqs last disabled at (6598): [<ffffffff812f6420>] console_unlock+0xf0/0x7d0 kernel/printk/printk.c:2441
softirqs last enabled at (5810): [<ffffffff84d2e66d>] common_interrupt+0x1d/0x240 arch/x86/kernel/irq.c:239
softirqs last disabled at (5809): [<ffffffff812598b0>] local_bh_disable+0x0/0x20 net/mptcp/protocol.c:1301

The .config is attached.

Thanks,
-- Marco


Attachments:
(No filename) (4.02 kB)
.config (211.98 kB)
Download all attachments

2020-08-10 12:12:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/3] Tracing, Idle, RCU and such goodness

On Mon, Aug 10, 2020 at 11:55:35AM +0200, Marco Elver wrote:

> Unfortunately I get LOCKDEP_DEBUG warnings, when testing with one of
> syzbot's configs. This appears at some point during boot (no other
> test):
>
> DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
> WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4875 check_flags+0x52/0x200 kernel/locking/lockdep.c:4875
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.0+ #8
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> RIP: 0010:check_flags+0x52/0x200 kernel/locking/lockdep.c:4875
> Code: c0 74 48 e8 40 f7 a5 03 85 c0 74 20 83 3d bd 33 53 05 00 75 17 48 c7 c7 ce 12 c9 85 48 c7 c6 04 8c ca 85 31 c0 e8 4e c1 f7 ff <0f> 0b 48 c7 c7 db 8b ca 85 e9 8c 01 00 00 85 c0 75 0f 65 8b 05 c5
> RSP: 0000:ffffffff86203bf0 EFLAGS: 00010046
> RAX: 1f9fa70771052a00 RBX: 0000000000000000 RCX: ffffffff8627e980
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
> RBP: 0000000000000000 R08: ffffffff813d8494 R09: 0000000000000000
> R10: 0000000000000012 R11: ffffffff8627e980 R12: ffffffff8627e980
> R13: ffffffff862e3420 R14: 00000000ffffffff R15: 0000000000000046
> FS: 0000000000000000(0000) GS:ffff88881fa00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff88883ffff000 CR3: 0000000006275001 CR4: 0000000000770ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 00000000
> Call Trace:
> lock_is_held_type+0x4d/0x110 kernel/locking/lockdep.c:5043
> lock_is_held include/linux/lockdep.h:266 [inline]
> rcu_read_lock_sched_held+0x41/0x90 kernel/rcu/update.c:136
> trace_lock_acquire+0x49/0x120 include/trace/events/lock.h:13
> lock_acquire+0x8d/0x2c0 kernel/locking/lockdep.c:5003
> mutex_trylock+0xdd/0x120 kernel/locking/mutex.c:1420

*groan*, indeed! Which is the direct result of lockdep recursion because
we pulled that tracepoint out from under the recursion thing.

Let me go prod this with a sharp stick...

2020-08-11 16:11:40

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/3] Tracing, Idle, RCU and such goodness

On Mon, Aug 10, 2020 at 02:11:38PM +0200, [email protected] wrote:
> On Mon, Aug 10, 2020 at 11:55:35AM +0200, Marco Elver wrote:
>
> > Unfortunately I get LOCKDEP_DEBUG warnings, when testing with one of
> > syzbot's configs. This appears at some point during boot (no other
> > test):
> >
> > DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
> > WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:4875 check_flags+0x52/0x200 kernel/locking/lockdep.c:4875
> > Modules linked in:
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.0+ #8
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> > RIP: 0010:check_flags+0x52/0x200 kernel/locking/lockdep.c:4875
> > Code: c0 74 48 e8 40 f7 a5 03 85 c0 74 20 83 3d bd 33 53 05 00 75 17 48 c7 c7 ce 12 c9 85 48 c7 c6 04 8c ca 85 31 c0 e8 4e c1 f7 ff <0f> 0b 48 c7 c7 db 8b ca 85 e9 8c 01 00 00 85 c0 75 0f 65 8b 05 c5
> > RSP: 0000:ffffffff86203bf0 EFLAGS: 00010046
> > RAX: 1f9fa70771052a00 RBX: 0000000000000000 RCX: ffffffff8627e980
> > RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
> > RBP: 0000000000000000 R08: ffffffff813d8494 R09: 0000000000000000
> > R10: 0000000000000012 R11: ffffffff8627e980 R12: ffffffff8627e980
> > R13: ffffffff862e3420 R14: 00000000ffffffff R15: 0000000000000046
> > FS: 0000000000000000(0000) GS:ffff88881fa00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: ffff88883ffff000 CR3: 0000000006275001 CR4: 0000000000770ef0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > PKRU: 00000000
> > Call Trace:
> > lock_is_held_type+0x4d/0x110 kernel/locking/lockdep.c:5043
> > lock_is_held include/linux/lockdep.h:266 [inline]
> > rcu_read_lock_sched_held+0x41/0x90 kernel/rcu/update.c:136
> > trace_lock_acquire+0x49/0x120 include/trace/events/lock.h:13
> > lock_acquire+0x8d/0x2c0 kernel/locking/lockdep.c:5003
> > mutex_trylock+0xdd/0x120 kernel/locking/mutex.c:1420
>
> *groan*, indeed! Which is the direct result of lockdep recursion because
> we pulled that tracepoint out from under the recursion thing.
>
> Let me go prod this with a sharp stick...

On the other hand, the first two commits in this series did pass
weekend rcutorture testing.

I clearly need to make rcutorture more viscious. ;-)

But in the meantime, for the first two:

Tested-by: Paul E. McKenney <[email protected]>

Thanx, Paul