2013-07-09 22:36:19

by Sasha Levin

[permalink] [raw]
Subject: timer: lockup in run_timer_softirq()

Hi all,

While fuzzing with trinity inside a KVM tools guest running latest -next, I've
stumbled on the following spew:

[ 2536.440007] BUG: soft lockup - CPU#0 stuck for 23s! [trinity-child86:12368]
[ 2536.440007] Modules linked in:
[ 2536.440007] irq event stamp: 1572135
[ 2536.440007] hardirqs last enabled at (1572134): [<ffffffff84198ef7>] restore_args+0x0/0x30
[ 2536.440007] hardirqs last disabled at (1572135): [<ffffffff841a2d2d>] apic_timer_interrupt+0x6d/0x80
[ 2536.440007] softirqs last enabled at (431824): [<ffffffff81132d07>] __do_softirq+0x447/0x4d0
[ 2536.440007] softirqs last disabled at (431969): [<ffffffff81132ed6>] irq_exit+0x86/0x120
[ 2536.440007] CPU: 0 PID: 12368 Comm: trinity-child86 Tainted: G W
3.10.0-next-20130709-sasha #3953
[ 2536.440007] task: ffff8807dea03000 ti: ffff8807d7c14000 task.ti: ffff8807d7c14000
[ 2536.440007] RIP: 0010:[<ffffffff8419871c>] [<ffffffff8419871c>] _raw_spin_unlock_irq+0x4c/0x80
[ 2536.440007] RSP: 0000:ffff880807603e08 EFLAGS: 00000286
[ 2536.440007] RAX: ffff8807dea03000 RBX: ffffffff84198ef7 RCX: 0000000000000000
[ 2536.440007] RDX: ffff8807dea03000 RSI: 0000000000000000 RDI: ffff8807dea03000
[ 2536.440007] RBP: ffff880807603e18 R08: 0000000000000000 R09: 0000000000000000
[ 2536.440007] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880807603d78
[ 2536.440007] R13: ffffffff841a2d32 R14: ffff880807603e18 R15: ffffffff86c81980
[ 2536.440007] FS: 00007f33dd6e5700(0000) GS:ffff880807600000(0000) knlGS:0000000000000000
[ 2536.440007] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2536.440007] CR2: 00000000024523c8 CR3: 00000007d255a000 CR4: 00000000000006f0
[ 2536.440007] Stack:
[ 2536.440007] ffffffff86c829e8 ffffffff86c81980 ffff880807603eb8 ffffffff8113d180
[ 2536.440007] ffff8807dea03000 ffff880807603e38 ffffffff86c835e8 ffffffff86c831e8
[ 2536.440007] ffff8807d7c14010 ffffffff86c82de8 ffff8807d7c14000 ffff8807d7c14000
[ 2536.440007] Call Trace:
[ 2536.440007] <IRQ>
[ 2536.440007] [<ffffffff8113d180>] run_timer_softirq+0x2d0/0x330
[ 2536.440007] [<ffffffff8113b400>] ? lock_timer_base+0x70/0x70
[ 2536.440007] [<ffffffff81132b21>] __do_softirq+0x261/0x4d0
[ 2536.440007] [<ffffffff81132ed6>] irq_exit+0x86/0x120
[ 2536.440007] [<ffffffff841a43ea>] smp_apic_timer_interrupt+0x4a/0x60
[ 2536.440007] [<ffffffff841a2d32>] apic_timer_interrupt+0x72/0x80
[ 2536.440007] <EOI>
[ 2536.440007] [<ffffffff84198ef7>] ? retint_restore_args+0x13/0x13
[ 2536.440007] [<ffffffff812355e5>] ? user_enter+0x135/0x150
[ 2536.440007] [<ffffffff81077d6d>] syscall_trace_leave+0x12d/0x160
[ 2536.440007] [<ffffffff841a2337>] int_check_syscall_exit_work+0x34/0x3d
[ 2536.440007] Code: 01 fd 48 89 df e8 45 5b 8f fd e8 b0 f1 00 fd 48 83 3d 30 e6 8a 01 00 75 0e 0f
0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 fb 66 66 90 <66> 66 90 bf 01 00 00 00 e8 17 4f 00 00 65 48 8b
04 25 88 d9 00

While going through the NMI dump, I noticed that it's very incomplete, and full of:

[ 2536.500130] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run:
697182.008 msecs


Thanks,
Sasha


2013-07-09 22:47:50

by Dave Jones

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On Tue, Jul 09, 2013 at 06:35:27PM -0400, Sasha Levin wrote:
> Hi all,
>
> While fuzzing with trinity inside a KVM tools guest running latest -next, I've
> stumbled on the following spew:
>
> [ 2536.440007] BUG: soft lockup - CPU#0 stuck for 23s! [trinity-child86:12368]
> [ 2536.440007] Call Trace:
> [ 2536.440007] <IRQ>
> [ 2536.440007] [<ffffffff8113d180>] run_timer_softirq+0x2d0/0x330
> [ 2536.440007] [<ffffffff8113b400>] ? lock_timer_base+0x70/0x70
> [ 2536.440007] [<ffffffff81132b21>] __do_softirq+0x261/0x4d0
> [ 2536.440007] [<ffffffff81132ed6>] irq_exit+0x86/0x120
> [ 2536.440007] [<ffffffff841a43ea>] smp_apic_timer_interrupt+0x4a/0x60
> [ 2536.440007] [<ffffffff841a2d32>] apic_timer_interrupt+0x72/0x80
> [ 2536.440007] <EOI>
> [ 2536.440007] [<ffffffff84198ef7>] ? retint_restore_args+0x13/0x13
> [ 2536.440007] [<ffffffff812355e5>] ? user_enter+0x135/0x150
> [ 2536.440007] [<ffffffff81077d6d>] syscall_trace_leave+0x12d/0x160
> [ 2536.440007] [<ffffffff841a2337>] int_check_syscall_exit_work+0x34/0x3d
> [ 2536.440007] Code: 01 fd 48 89 df e8 45 5b 8f fd e8 b0 f1 00 fd 48 83 3d 30 e6 8a 01 00 75 0e 0f
> 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 fb 66 66 90 <66> 66 90 bf 01 00 00 00 e8 17 4f 00 00 65 48 8b
> 04 25 88 d9 00
>
> While going through the NMI dump, I noticed that it's very incomplete, and full of:
>
> [ 2536.500130] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run:
> 697182.008 msecs

I've been reporting these (and other traces) for a while https://lkml.org/lkml/2013/7/5/185 [1]
on bare-metal.

I've not managed to find time yet to try and narrow down the exact combination
of trinity syscalls that causes this. Sometimes it happens quickly, sometimes
after a few hours (even when run with the same seeds, it seems variable)

Dave

[1] lkml.org seems to be down right now. again. Sigh.
See
Subject: Yet more softlockups.
and
Subject: scheduling while atomic & hang

2013-07-09 22:50:15

by Sasha Levin

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On 07/09/2013 06:47 PM, Dave Jones wrote:
> On Tue, Jul 09, 2013 at 06:35:27PM -0400, Sasha Levin wrote:
> > Hi all,
> >
> > While fuzzing with trinity inside a KVM tools guest running latest -next, I've
> > stumbled on the following spew:
> >
> > [ 2536.440007] BUG: soft lockup - CPU#0 stuck for 23s! [trinity-child86:12368]
> > [ 2536.440007] Call Trace:
> > [ 2536.440007] <IRQ>
> > [ 2536.440007] [<ffffffff8113d180>] run_timer_softirq+0x2d0/0x330
> > [ 2536.440007] [<ffffffff8113b400>] ? lock_timer_base+0x70/0x70
> > [ 2536.440007] [<ffffffff81132b21>] __do_softirq+0x261/0x4d0
> > [ 2536.440007] [<ffffffff81132ed6>] irq_exit+0x86/0x120
> > [ 2536.440007] [<ffffffff841a43ea>] smp_apic_timer_interrupt+0x4a/0x60
> > [ 2536.440007] [<ffffffff841a2d32>] apic_timer_interrupt+0x72/0x80
> > [ 2536.440007] <EOI>
> > [ 2536.440007] [<ffffffff84198ef7>] ? retint_restore_args+0x13/0x13
> > [ 2536.440007] [<ffffffff812355e5>] ? user_enter+0x135/0x150
> > [ 2536.440007] [<ffffffff81077d6d>] syscall_trace_leave+0x12d/0x160
> > [ 2536.440007] [<ffffffff841a2337>] int_check_syscall_exit_work+0x34/0x3d
> > [ 2536.440007] Code: 01 fd 48 89 df e8 45 5b 8f fd e8 b0 f1 00 fd 48 83 3d 30 e6 8a 01 00 75 0e 0f
> > 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 fb 66 66 90 <66> 66 90 bf 01 00 00 00 e8 17 4f 00 00 65 48 8b
> > 04 25 88 d9 00
> >
> > While going through the NMI dump, I noticed that it's very incomplete, and full of:
> >
> > [ 2536.500130] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run:
> > 697182.008 msecs
>
> I've been reporting these (and other traces) for a while https://lkml.org/lkml/2013/7/5/185 [1]
> on bare-metal.
>
> I've not managed to find time yet to try and narrow down the exact combination
> of trinity syscalls that causes this. Sometimes it happens quickly, sometimes
> after a few hours (even when run with the same seeds, it seems variable)

Interesting. It's the first time I'm seeing these, but I haven't really changed
anything in my configuration.

Are you also seeing the "NMI handler took too long" messages?


Thanks,
Sasha

2013-07-09 23:03:54

by Dave Jones

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On Tue, Jul 09, 2013 at 06:49:29PM -0400, Sasha Levin wrote:
> On 07/09/2013 06:47 PM, Dave Jones wrote:

> > > [ 2536.500130] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run:
> > > 697182.008 msecs
> >
> > I've been reporting these (and other traces) for a while https://lkml.org/lkml/2013/7/5/185 [1]
> > on bare-metal.
>
> Interesting. It's the first time I'm seeing these, but I haven't really changed
> anything in my configuration.
>
> Are you also seeing the "NMI handler took too long" messages?

Now that I look closer, one difference is you're seeing arch_trigger_all_cpu_backtrace_handler
in that message. I'm seeing them from perf_event_nmi_handler.

Dave

2013-07-09 23:10:33

by Sasha Levin

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On 07/09/2013 07:03 PM, Dave Jones wrote:
> On Tue, Jul 09, 2013 at 06:49:29PM -0400, Sasha Levin wrote:
> > On 07/09/2013 06:47 PM, Dave Jones wrote:
>
> > > > [ 2536.500130] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run:
> > > > 697182.008 msecs
> > >
> > > I've been reporting these (and other traces) for a while https://lkml.org/lkml/2013/7/5/185 [1]
> > > on bare-metal.
> >
> > Interesting. It's the first time I'm seeing these, but I haven't really changed
> > anything in my configuration.
> >
> > Are you also seeing the "NMI handler took too long" messages?
>
> Now that I look closer, one difference is you're seeing arch_trigger_all_cpu_backtrace_handler
> in that message. I'm seeing them from perf_event_nmi_handler.

perf huh? I also have this spew I'm currently working on, it seems related:

[ 1443.380407] ------------[ cut here ]------------
[ 1443.381713] WARNING: CPU: 20 PID: 49263 at kernel/lockdep.c:3552 check_flags+0x16b/0x220()
[ 1443.383988] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
[ 1443.385452] Modules linked in:
[ 1443.386459] CPU: 20 PID: 49263 Comm: trinity-child50 Tainted: G W
3.10.0-next-20130709-sasha #3953
[ 1443.388735] 0000000000000de0 ffff880805e03ab8 ffffffff84191329 ffffffff8519f386
[ 1443.390082] ffff880805e03b08 ffff880805e03af8 ffffffff811279cc ffffffff8519f386
[ 1443.390082] 0000000000000000 ffff8807cf898000 ffffffff85a66940 0000000000000082
[ 1443.390082] Call Trace:
[ 1443.390082] <IRQ> [<ffffffff84191329>] dump_stack+0x52/0x87
[ 1443.390082] [<ffffffff811279cc>] warn_slowpath_common+0x8c/0xc0
[ 1443.390082] [<ffffffff81127ab6>] warn_slowpath_fmt+0x46/0x50
[ 1443.390082] [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
[ 1443.390082] [<ffffffff811a192b>] check_flags+0x16b/0x220
[ 1443.390082] [<ffffffff811a2ee7>] lock_is_held+0x77/0x110
[ 1443.390082] [<ffffffff8419d644>] ? sub_preempt_count+0x4/0xf0
[ 1443.390082] [<ffffffff8122dafe>] perf_tp_event+0xbe/0x450
[ 1443.390082] [<ffffffff8122de62>] ? perf_tp_event+0x422/0x450
[ 1443.390082] [<ffffffff8419d644>] ? sub_preempt_count+0x4/0xf0
[ 1443.390082] [<ffffffff812185f2>] perf_ftrace_function_call+0xc2/0xe0
[ 1443.390082] [<ffffffff811f8c78>] ? ftrace_ops_control_func+0xc8/0x140
[ 1443.427495] [<ffffffff8419d644>] ? sub_preempt_count+0x4/0xf0
[ 1443.427495] [<ffffffff81131917>] ? __local_bh_enable+0xc7/0xd0
[ 1443.427495] [<ffffffff811f8c78>] ftrace_ops_control_func+0xc8/0x140
[ 1443.427495] [<ffffffff841a1c3c>] ftrace_call+0x5/0x2f
[ 1443.427495] [<ffffffff841a1c3c>] ? ftrace_call+0x5/0x2f
[ 1443.427495] [<ffffffff81175b87>] ? vtime_account_irq_exit+0x67/0x80
[ 1443.427495] [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
[ 1443.427495] [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
[ 1443.427495] [<ffffffff8113185e>] ? __local_bh_enable+0xe/0xd0
[ 1443.427495] [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
[ 1443.427495] [<ffffffff81131917>] __local_bh_enable+0xc7/0xd0
[ 1443.427495] [<ffffffff81132d07>] __do_softirq+0x447/0x4d0
[ 1443.427495] [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
[ 1443.427495] [<ffffffff81132ed6>] irq_exit+0x86/0x120
[ 1443.427495] [<ffffffff841a43ea>] smp_apic_timer_interrupt+0x4a/0x60
[ 1443.427495] [<ffffffff841a2d32>] apic_timer_interrupt+0x72/0x80
[ 1443.427495] <EOI> [<ffffffff81130542>] ? do_setitimer+0x242/0x2a0
[ 1443.427495] [<ffffffff8419871c>] ? _raw_spin_unlock_irq+0x4c/0x80
[ 1443.427495] [<ffffffff84198700>] ? _raw_spin_unlock_irq+0x30/0x80
[ 1443.427495] [<ffffffff81130542>] do_setitimer+0x242/0x2a0
[ 1443.427495] [<ffffffff811306fa>] alarm_setitimer+0x3a/0x70
[ 1443.427495] [<ffffffff8113b41e>] SyS_alarm+0xe/0x20
[ 1443.427495] [<ffffffff841a2240>] tracesys+0xdd/0xe2
[ 1443.427495] ---[ end trace e3b9a6b9c7462a51 ]---

2013-07-10 09:52:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On Tue, Jul 09, 2013 at 07:09:51PM -0400, Sasha Levin wrote:
> perf huh? I also have this spew I'm currently working on, it seems related:
>
> [ 1443.380407] ------------[ cut here ]------------
> [ 1443.381713] WARNING: CPU: 20 PID: 49263 at kernel/lockdep.c:3552 check_flags+0x16b/0x220()
> [ 1443.383988] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled)
> [ 1443.385452] Modules linked in:
> [ 1443.386459] CPU: 20 PID: 49263 Comm: trinity-child50 Tainted: G W
> 3.10.0-next-20130709-sasha #3953
> [ 1443.388735] 0000000000000de0 ffff880805e03ab8 ffffffff84191329 ffffffff8519f386
> [ 1443.390082] ffff880805e03b08 ffff880805e03af8 ffffffff811279cc ffffffff8519f386
> [ 1443.390082] 0000000000000000 ffff8807cf898000 ffffffff85a66940 0000000000000082
> [ 1443.390082] Call Trace:
> [ 1443.390082] <IRQ> [<ffffffff84191329>] dump_stack+0x52/0x87
> [ 1443.390082] [<ffffffff811279cc>] warn_slowpath_common+0x8c/0xc0
> [ 1443.390082] [<ffffffff81127ab6>] warn_slowpath_fmt+0x46/0x50
> [ 1443.390082] [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
> [ 1443.390082] [<ffffffff811a192b>] check_flags+0x16b/0x220
> [ 1443.390082] [<ffffffff811a2ee7>] lock_is_held+0x77/0x110
> [ 1443.390082] [<ffffffff8419d644>] ? sub_preempt_count+0x4/0xf0
> [ 1443.390082] [<ffffffff8122dafe>] perf_tp_event+0xbe/0x450
> [ 1443.390082] [<ffffffff8122de62>] ? perf_tp_event+0x422/0x450
> [ 1443.390082] [<ffffffff8419d644>] ? sub_preempt_count+0x4/0xf0
> [ 1443.390082] [<ffffffff812185f2>] perf_ftrace_function_call+0xc2/0xe0
> [ 1443.390082] [<ffffffff811f8c78>] ? ftrace_ops_control_func+0xc8/0x140
> [ 1443.427495] [<ffffffff8419d644>] ? sub_preempt_count+0x4/0xf0
> [ 1443.427495] [<ffffffff81131917>] ? __local_bh_enable+0xc7/0xd0
> [ 1443.427495] [<ffffffff811f8c78>] ftrace_ops_control_func+0xc8/0x140
> [ 1443.427495] [<ffffffff841a1c3c>] ftrace_call+0x5/0x2f
> [ 1443.427495] [<ffffffff841a1c3c>] ? ftrace_call+0x5/0x2f
> [ 1443.427495] [<ffffffff81175b87>] ? vtime_account_irq_exit+0x67/0x80
> [ 1443.427495] [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
> [ 1443.427495] [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
> [ 1443.427495] [<ffffffff8113185e>] ? __local_bh_enable+0xe/0xd0
> [ 1443.427495] [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
> [ 1443.427495] [<ffffffff81131917>] __local_bh_enable+0xc7/0xd0
> [ 1443.427495] [<ffffffff81132d07>] __do_softirq+0x447/0x4d0
> [ 1443.427495] [<ffffffff8419d649>] ? sub_preempt_count+0x9/0xf0
> [ 1443.427495] [<ffffffff81132ed6>] irq_exit+0x86/0x120
> [ 1443.427495] [<ffffffff841a43ea>] smp_apic_timer_interrupt+0x4a/0x60
> [ 1443.427495] [<ffffffff841a2d32>] apic_timer_interrupt+0x72/0x80
> [ 1443.427495] <EOI> [<ffffffff81130542>] ? do_setitimer+0x242/0x2a0
> [ 1443.427495] [<ffffffff8419871c>] ? _raw_spin_unlock_irq+0x4c/0x80
> [ 1443.427495] [<ffffffff84198700>] ? _raw_spin_unlock_irq+0x30/0x80
> [ 1443.427495] [<ffffffff81130542>] do_setitimer+0x242/0x2a0
> [ 1443.427495] [<ffffffff811306fa>] alarm_setitimer+0x3a/0x70
> [ 1443.427495] [<ffffffff8113b41e>] SyS_alarm+0xe/0x20
> [ 1443.427495] [<ffffffff841a2240>] tracesys+0xdd/0xe2
> [ 1443.427495] ---[ end trace e3b9a6b9c7462a51 ]---

Fun.. :-) we trace __local_bh_enable() and hit a ftrace callback between
telling lockdep we enabled softirqs and actually doing so.

I'm just a tad confused by the trace; it says we go:
lock_is_held()
check_flags()

Looking at perf_tp_event() this would most likely be from:

ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]);

Where the lock_is_held() would be from rcu_dereference_check()'s
rcu_read_lock_sched_held(). However, by there we've already passed
rcu_read_lock() which includes rcu_lock_acquire() -> lock_acquire() ->
check_flags(). So it should've triggered there.

Ideally we'd not trace __local_bh_enable() at all, seeing as how any RCU usage
in there would be bound to trigger this.

Steven?

2013-07-10 09:55:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On Tue, Jul 09, 2013 at 06:35:27PM -0400, Sasha Levin wrote:
> While going through the NMI dump, I noticed that it's very incomplete, and full of:
>
> [ 2536.500130] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler)
> took too long to run: 697182.008 msecs

Now I would actually expect arch_trigger_all_cpu_backtrace_handler() to
issue this warning, that said, ~700 seconds is a bit long, I'm fairly
sure it didn't actually take that long, you'd have noticed your vm being
'away' for 10+ minutes I'd think, no?


There's something entirely fishy with that stuff, let me to stare at it.

2013-07-10 12:27:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On Wed, 2013-07-10 at 11:52 +0200, Peter Zijlstra wrote:

> Fun.. :-) we trace __local_bh_enable() and hit a ftrace callback between
> telling lockdep we enabled softirqs and actually doing so.
>
> I'm just a tad confused by the trace; it says we go:
> lock_is_held()
> check_flags()
>
> Looking at perf_tp_event() this would most likely be from:
>
> ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]);

Function tracing should not be treated as a normal trace point. It is
much more invasive, and there's things that one should be weary about
when using it.

>
> Where the lock_is_held() would be from rcu_dereference_check()'s
> rcu_read_lock_sched_held(). However, by there we've already passed
> rcu_read_lock() which includes rcu_lock_acquire() -> lock_acquire() ->
> check_flags(). So it should've triggered there.
>
> Ideally we'd not trace __local_bh_enable() at all, seeing as how any RCU usage
> in there would be bound to trigger this.

I find it very useful to trace __local_bh_enable(). I also trace RCU
calls.

When using function tracing, you need to use
rcu_dereference_raw_notrace().

Also, function tracing callbacks should avoid rcu_read_lock(), as that's
traced as well. You can use preempt_disable_notrace() for rcu usage.

-- Steve

2013-07-10 12:43:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On Wed, Jul 10, 2013 at 08:27:34AM -0400, Steven Rostedt wrote:
> On Wed, 2013-07-10 at 11:52 +0200, Peter Zijlstra wrote:
>
> > Fun.. :-) we trace __local_bh_enable() and hit a ftrace callback between
> > telling lockdep we enabled softirqs and actually doing so.
> >
> > I'm just a tad confused by the trace; it says we go:
> > lock_is_held()
> > check_flags()
> >
> > Looking at perf_tp_event() this would most likely be from:
> >
> > ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]);
>
> Function tracing should not be treated as a normal trace point. It is
> much more invasive, and there's things that one should be weary about
> when using it.

Clearly.. yet someone forgot this when merging ced39002 :/

> >
> > Where the lock_is_held() would be from rcu_dereference_check()'s
> > rcu_read_lock_sched_held(). However, by there we've already passed
> > rcu_read_lock() which includes rcu_lock_acquire() -> lock_acquire() ->
> > check_flags(). So it should've triggered there.
> >
> > Ideally we'd not trace __local_bh_enable() at all, seeing as how any RCU usage
> > in there would be bound to trigger this.
>
> I find it very useful to trace __local_bh_enable(). I also trace RCU
> calls.
>
> When using function tracing, you need to use
> rcu_dereference_raw_notrace().
>
> Also, function tracing callbacks should avoid rcu_read_lock(), as that's
> traced as well. You can use preempt_disable_notrace() for rcu usage.

but but but preempt_disable_notrace() isn't an rcu_read_lock().. You can only
do that for rcu_sched.

Anyway, I don't see a nice way out of this mess :/ the entire perf core uses
regular RCU and converting all that is going to me a nasty big patch.

2013-07-10 12:58:14

by Steven Rostedt

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On Wed, 2013-07-10 at 14:42 +0200, Peter Zijlstra wrote:

> but but but preempt_disable_notrace() isn't an rcu_read_lock().. You can only
> do that for rcu_sched.

Right. And not even rcu_sched is safe, as function tracing can trace
functions out of rcu scope. That's why I had to add that code to do a
schedule_on_each_cpu() in ftrace.

>
> Anyway, I don't see a nice way out of this mess :/ the entire perf core uses
> regular RCU and converting all that is going to me a nasty big patch.

There is an easier way. We can add a way to have perf not trace specific
functions. Now there's already infrastructure there to pick and choose
what to trace and what not to for individual function tracing users like
perf. The trick will be how to annotate them.

If there's a way to mark a function without moving it to a section, this
would be possible. Perhaps similar to EXPORT_SYMBOL(). We could add a
PERF_NOTRACE().

static void __local_bh_enable()
{
[...]
}
PERF_NOTRACE(__local_bh_enable);

And add these to a black list of functions that perf should not trace.

How's that sound?

-- Steve

2013-07-10 14:48:20

by Sasha Levin

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On 07/10/2013 05:54 AM, Peter Zijlstra wrote:
> On Tue, Jul 09, 2013 at 06:35:27PM -0400, Sasha Levin wrote:
>> While going through the NMI dump, I noticed that it's very incomplete, and full of:
>>
>> [ 2536.500130] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler)
>> took too long to run: 697182.008 msecs
>
> Now I would actually expect arch_trigger_all_cpu_backtrace_handler() to
> issue this warning, that said, ~700 seconds is a bit long, I'm fairly
> sure it didn't actually take that long, you'd have noticed your vm being
> 'away' for 10+ minutes I'd think, no?
>
>
> There's something entirely fishy with that stuff, let me to stare at it.

Yeah, it didn't actually hang that long - that number is bogus.


Thanks,
Sasha

2013-07-11 10:03:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On Wed, Jul 10, 2013 at 10:47:40AM -0400, Sasha Levin wrote:
> On 07/10/2013 05:54 AM, Peter Zijlstra wrote:
> >On Tue, Jul 09, 2013 at 06:35:27PM -0400, Sasha Levin wrote:
> >>While going through the NMI dump, I noticed that it's very incomplete, and full of:
> >>
> >>[ 2536.500130] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler)
> >>took too long to run: 697182.008 msecs
> >
> >Now I would actually expect arch_trigger_all_cpu_backtrace_handler() to
> >issue this warning, that said, ~700 seconds is a bit long, I'm fairly
> >sure it didn't actually take that long, you'd have noticed your vm being
> >'away' for 10+ minutes I'd think, no?
> >
> >
> >There's something entirely fishy with that stuff, let me to stare at it.
>
> Yeah, it didn't actually hang that long - that number is bogus.

This might help..

---
arch/x86/kernel/nmi.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 0920212..4e54911 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -123,8 +123,8 @@ static int __kprobes nmi_handle(unsigned int type, struct pt_regs *regs, bool b2
continue;

nmi_longest_ns = delta;
- whole_msecs = do_div(delta, (1000 * 1000));
- decimal_msecs = do_div(delta, 1000) % 1000;
+ whole_msecs = div_u64(delta, (1000 * 1000));
+ decimal_msecs = div_u64(delta, 1000) % 1000;
printk_ratelimited(KERN_INFO
"INFO: NMI handler (%ps) took too long to run: "
"%lld.%03d msecs\n", a->handler, whole_msecs,

2013-07-11 16:43:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On Wed, Jul 10, 2013 at 08:58:11AM -0400, Steven Rostedt wrote:
> On Wed, 2013-07-10 at 14:42 +0200, Peter Zijlstra wrote:
>
> > but but but preempt_disable_notrace() isn't an rcu_read_lock().. You can only
> > do that for rcu_sched.
>
> Right. And not even rcu_sched is safe, as function tracing can trace
> functions out of rcu scope. That's why I had to add that code to do a
> schedule_on_each_cpu() in ftrace.
>
> >
> > Anyway, I don't see a nice way out of this mess :/ the entire perf core uses
> > regular RCU and converting all that is going to me a nasty big patch.
>
> There is an easier way. We can add a way to have perf not trace specific
> functions. Now there's already infrastructure there to pick and choose
> what to trace and what not to for individual function tracing users like
> perf. The trick will be how to annotate them.
>
> If there's a way to mark a function without moving it to a section, this
> would be possible. Perhaps similar to EXPORT_SYMBOL(). We could add a
> PERF_NOTRACE().
>
> static void __local_bh_enable()
> {
> [...]
> }
> PERF_NOTRACE(__local_bh_enable);
>
> And add these to a black list of functions that perf should not trace.
>
> How's that sound?

I'm afraid this is going to be hard to create and hard to keep correct :/

Other than that, a function tracer environment that is safer to use might be
useful for other people as well.

2013-07-11 16:55:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On Thu, 2013-07-11 at 18:42 +0200, Peter Zijlstra wrote:

> I'm afraid this is going to be hard to create and hard to keep correct :/

That's the nature of tracing functions ;-)

>
> Other than that, a function tracer environment that is safer to use might be
> useful for other people as well.

Not sure how to make the environment safe, as the main purpose of the
function trace is to debug those hard to debug locations, like NMIs,
RCU, dynamic ticks, etc. To ensure a "safe" environment, it would
cripple the tracer.

Hmm, what would you state as a safe environment? How can we detect if
the environment is safe to trace or not?

-- Steve

2013-07-11 16:59:37

by Steven Rostedt

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On Thu, 2013-07-11 at 12:55 -0400, Steven Rostedt wrote:

> >
> > Other than that, a function tracer environment that is safer to use might be
> > useful for other people as well.
>
> Not sure how to make the environment safe, as the main purpose of the
> function trace is to debug those hard to debug locations, like NMIs,
> RCU, dynamic ticks, etc. To ensure a "safe" environment, it would
> cripple the tracer.
>
> Hmm, what would you state as a safe environment? How can we detect if
> the environment is safe to trace or not?

Maybe I misunderstood you. You mean to have this environment be
something for not just perf, and have the macro be:

NONSAFE_TRACE(__local_bh_enable);

?

Then, any ftrace user could set a flag in the registering of its ops to
'safe_only_functions'. And it will ignore all of these locations.
There's really not many of them, so it may not be too hard to weed out.

-- Steve

2013-07-11 17:12:48

by Peter Zijlstra

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On Thu, Jul 11, 2013 at 12:59:32PM -0400, Steven Rostedt wrote:
> On Thu, 2013-07-11 at 12:55 -0400, Steven Rostedt wrote:
>
> > >
> > > Other than that, a function tracer environment that is safer to use might be
> > > useful for other people as well.
> >
> > Not sure how to make the environment safe, as the main purpose of the
> > function trace is to debug those hard to debug locations, like NMIs,
> > RCU, dynamic ticks, etc. To ensure a "safe" environment, it would
> > cripple the tracer.
> >
> > Hmm, what would you state as a safe environment? How can we detect if
> > the environment is safe to trace or not?
>
> Maybe I misunderstood you. You mean to have this environment be
> something for not just perf, and have the macro be:
>
> NONSAFE_TRACE(__local_bh_enable);
>
> ?
>
> Then, any ftrace user could set a flag in the registering of its ops to
> 'safe_only_functions'. And it will ignore all of these locations.
> There's really not many of them, so it may not be too hard to weed out.

Yah, like that. But that doesn't invalidate your question as to what 'safe'
would encompass. I think RCU/lockdep would be the big thing for perf, not
sure it should be wider than that.

2014-04-03 20:45:07

by Sasha Levin

[permalink] [raw]
Subject: Re: timer: lockup in run_timer_softirq()

On 07/11/2013 01:11 PM, Peter Zijlstra wrote:
> On Thu, Jul 11, 2013 at 12:59:32PM -0400, Steven Rostedt wrote:
>> On Thu, 2013-07-11 at 12:55 -0400, Steven Rostedt wrote:
>>
>>>>
>>>> Other than that, a function tracer environment that is safer to use might be
>>>> useful for other people as well.
>>>
>>> Not sure how to make the environment safe, as the main purpose of the
>>> function trace is to debug those hard to debug locations, like NMIs,
>>> RCU, dynamic ticks, etc. To ensure a "safe" environment, it would
>>> cripple the tracer.
>>>
>>> Hmm, what would you state as a safe environment? How can we detect if
>>> the environment is safe to trace or not?
>>
>> Maybe I misunderstood you. You mean to have this environment be
>> something for not just perf, and have the macro be:
>>
>> NONSAFE_TRACE(__local_bh_enable);
>>
>> ?
>>
>> Then, any ftrace user could set a flag in the registering of its ops to
>> 'safe_only_functions'. And it will ignore all of these locations.
>> There's really not many of them, so it may not be too hard to weed out.
>
> Yah, like that. But that doesn't invalidate your question as to what 'safe'
> would encompass. I think RCU/lockdep would be the big thing for perf, not
> sure it should be wider than that.

Ping? There was no conclusion here and this issue is still ongoing in -next.


Thanks,
Sasha