Hello
after a month of trying I finally got a small test-case out of my
perf_fuzzer suite that triggers a system lockup with just one syscall.
Attached is the code that triggers it.
I tried tracking through the ftrace code to find out what's going on, but
I quickly get lost. The perf/ftrace code does do questionable things like
copy at 64-bit config value into an "int".
You may recall this bug triggers messages like this before total system
lockup:
[ 47.813436] ------------[ cut here ]------------
[ 47.813436] WARNING: CPU: 0 PID: 2781 at kernel/watchdog.c:245 watchdog_over)
[ 47.813436] Watchdog detected hard LOCKUP on cpu 0
[ 47.813436] Modules linked in: cpufreq_userspace cpufreq_stats cpufreq_powern
[ 47.813436] CPU: 0 PID: 2781 Comm: out Not tainted 3.12.0-rc6+ #79
[ 47.813436] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 47.813436] 00000000000000f5 ffff88011fc07c48 ffffffff8151a736 ffff88011fc08
[ 47.813436] ffffffff8103cd68 ffffffff817fac46 ffffffff810a0fe5 ffff88011b310
[ 47.813436] ffff88011fc07ef8 ffff88011fc07d58 0000000000000000 ffff88011fc08
[ 47.813436] Call Trace:
[ 47.813436] <NMI> [<ffffffff8151a736>] ? dump_stack+0x41/0x53
[ 47.813436] [<ffffffff8103cd68>] ? warn_slowpath_common+0x81/0x9c
[ 47.813436] [<ffffffff810a0fe5>] ? watchdog_overflow_callback+0x9b/0xa6
[ 47.813436] [<ffffffff8103ce25>] ? warn_slowpath_fmt+0x46/0x48
[ 47.813436] [<ffffffff810a0f53>] ? watchdog_overflow_callback+0x9/0xa6
[ 47.813436] [<ffffffff810a0fe5>] ? watchdog_overflow_callback+0x9b/0xa6
[ 47.813436] [<ffffffff810ca46b>] ? __perf_event_overflow+0x12e/0x1ab
[ 47.813436] [<ffffffff81018047>] ? intel_pmu_save_and_restart+0xe/0x22
[ 47.813436] [<ffffffff81018c95>] ? intel_pmu_handle_irq+0x2b8/0x34d
[ 47.813436] [<ffffffff8152457c>] ? ftrace_call+0x5/0x2f
[ 47.813436] [<ffffffff810c4aa4>] ? irq_work_queue+0x71/0x72
[ 47.813436] [<ffffffff8151f0b3>] ? perf_event_nmi_handler+0x24/0x3d
[ 47.813436] [<ffffffff8151e9f2>] ? nmi_handle+0x5e/0x13a
[ 47.813436] [<ffffffff810a17fa>] ? rcu_nmi_enter+0x9/0x5b
[ 47.813436] [<ffffffff8151eb75>] ? do_nmi+0xa7/0x2bd
[ 47.813436] [<ffffffff8151e177>] ? end_repeat_nmi+0x1e/0x2e
[ 47.813436] [<ffffffff810c4aa4>] ? irq_work_queue+0x71/0x72
[ 47.813436] [<ffffffff810c4aa4>] ? irq_work_queue+0x71/0x72
[ 47.813436] [<ffffffff810c4aa4>] ? irq_work_queue+0x71/0x72
[ 47.813436] <<EOE>> <IRQ> [<ffffffff810ca4d9>] ? __perf_event_overflow+0xb
[ 47.813436] [<ffffffff810a8f18>] ? ftrace_ops_control_func+0x9a/0xc8
[ 47.813436] [<ffffffff810ca654>] ? perf_swevent_overflow+0x3f/0x58
[ 47.813436] [<ffffffff810ca86e>] ? perf_tp_event+0xe9/0x1f6
[ 47.813436] [<ffffffff810a8f18>] ? ftrace_ops_control_func+0x9a/0xc8
[ 47.813436] [<ffffffff8151d65c>] ? _raw_spin_unlock_irqrestore+0x4/0x11
[ 47.813436] [<ffffffff810be8db>] ? perf_ftrace_function_call+0xc7/0xd1
[ 47.813436] [<ffffffff81525d9a>] ? irq_work_interrupt+0x6a/0x70
[ 47.813436] [<ffffffff81040cdb>] ? irq_exit+0x4/0xa1
[ 47.813436] [<ffffffff81525d9a>] ? irq_work_interrupt+0x6a/0x70
[ 47.813436] [<ffffffff810be8db>] ? perf_ftrace_function_call+0xc7/0xd1
[ 47.813436] [<ffffffff81040cdb>] ? irq_exit+0x4/0xa1
[ 47.813436] [<ffffffff810a8f18>] ? ftrace_ops_control_func+0x9a/0xc8
[ 47.813436] [<ffffffff81040cdb>] ? irq_exit+0x4/0xa1
[ 47.813436] [<ffffffff810a8f18>] ? ftrace_ops_control_func+0x9a/0xc8
[ 47.813436] [<ffffffff8151d661>] ? _raw_spin_unlock_irqrestore+0x9/0x11
[ 47.813436] [<ffffffff8152457c>] ? ftrace_call+0x5/0x2f
[ 47.813436] [<ffffffff8111f7a4>] ? kill_fasync+0x74/0x86
[ 47.813436] [<ffffffff810cabc2>] ? perf_event_wakeup+0x8c/0xa1
[ 47.813436] [<ffffffff81005f1f>] ? smp_irq_work_interrupt+0x9/0x21
[ 47.813436] [<ffffffff81005f1f>] ? smp_irq_work_interrupt+0x9/0x21
[ 47.813436] [<ffffffff810c4916>] ? __irq_work_run+0x5f/0x7b
[ 47.813436] [<ffffffff81040ce0>] ? irq_exit+0x9/0xa1
[ 47.813436] [<ffffffff81040ce0>] ? irq_exit+0x9/0xa1
[ 47.813436] [<ffffffff81525d9a>] ? irq_work_interrupt+0x6a/0x70
[ 47.813436] <EOI> [<ffffffff81520b5d>] ? __do_page_fault+0x10b/0x3f5
[ 47.813436] [<ffffffff81520a6b>] ? __do_page_fault+0x19/0x3f5
[ 47.813436] [<ffffffff8152457c>] ? ftrace_call+0x5/0x2f
[ 47.813436] [<ffffffff811281ce>] ? fget_raw_light+0x9/0x8a
[ 47.813436] [<ffffffff81520e50>] ? do_page_fault+0x9/0xf
[ 47.813436] [<ffffffff8151de72>] ? page_fault+0x22/0x30
[ 47.813436] ---[ end trace 9a2f13d704fa7b22 ]---
On Thu, 24 Oct 2013, Vince Weaver wrote:
> after a month of trying I finally got a small test-case out of my
> perf_fuzzer suite that triggers a system lockup with just one syscall.
>
> Attached is the code that triggers it.
And it turns out you can only trigger this specific problem if advanced
ftrace options are enabled.
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_STACK_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_FTRACE_MCOUNT_RECORD=y
Urgh, I had turned those on to try to debug something and forgot to
disable. I feel like I saw this problem before I had those enabled so I
guess I have to start from scratch fuzzing to see if I can get a more
generally reproducible trace.
Vince
On Thu, 2013-10-24 at 14:25 -0400, Vince Weaver wrote:
> On Thu, 24 Oct 2013, Vince Weaver wrote:
> > after a month of trying I finally got a small test-case out of my
> > perf_fuzzer suite that triggers a system lockup with just one syscall.
> >
> > Attached is the code that triggers it.
>
> And it turns out you can only trigger this specific problem if advanced
> ftrace options are enabled.
>
> CONFIG_KPROBES_ON_FTRACE=y
> CONFIG_FUNCTION_TRACER=y
> CONFIG_FUNCTION_GRAPH_TRACER=y
> CONFIG_STACK_TRACER=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
> CONFIG_FUNCTION_PROFILER=y
> CONFIG_FTRACE_MCOUNT_RECORD=y
The above STACK_TRACER, FTRACE_WITH_REGS and FUNCTION_PROFILER probably
don't need to be set, as they are pretty much stand alone, and don't
look to be involved in the stack traces that you (and Dave) posted.
>
> Urgh, I had turned those on to try to debug something and forgot to
> disable. I feel like I saw this problem before I had those enabled so I
> guess I have to start from scratch fuzzing to see if I can get a more
> generally reproducible trace.
Looks like something is incorrectly enabling function tracer within
perf. Peter told me that there's some ref count bug that may use data
after being freed on exit.
I tried the program that you attached in you previous email, and was not
able to hit the bug. Are you able to hit the bug with that code each
time?
-- Steve
* Steven Rostedt <[email protected]> wrote:
> > Urgh, I had turned those on to try to debug something and forgot
> > to disable. I feel like I saw this problem before I had those
> > enabled so I guess I have to start from scratch fuzzing to see
> > if I can get a more generally reproducible trace.
Btw., even if various config options are enabled, we don't want the
kernel to lock up. So it might not be _the_ bug you are looking, but
it certainly looks like _a_ bug.
> Looks like something is incorrectly enabling function tracer
> within perf. Peter told me that there's some ref count bug that
> may use data after being freed on exit.
>
> I tried the program that you attached in you previous email, and
> was not able to hit the bug. Are you able to hit the bug with that
> code each time?
A full .config and a SHA1 of the tested tree would definitely be
nice!
Thanks,
Ingo
On Fri, 25 Oct 2013, Ingo Molnar wrote:
>
> Btw., even if various config options are enabled, we don't want the
> kernel to lock up. So it might not be _the_ bug you are looking, but
> it certainly looks like _a_ bug.
yes, I agree, but it means I was chasing a different bug than I thought I
was.
I can still get the same (or similar) bug with the advanced ftrace options
turned off, it's just it takes a lot longer and is harder to trigger.
I have a 30,000 syscall trace that triggers it but only maybe 1 time in 5
so it's harder to narrow down the cause.
> A full .config and a SHA1 of the tested tree would definitely be
> nice!
I was testing e6036c0b88962df82a8853971b86a55f09faef40
but I can crash machines from 3.10 through current with that chunk of code
I posted. I am sure it affects before 3.10, that's just when
I first enabled the extra ftrace options.
Attached is the full config. This is running on a core2 system. I've
seen similar bugs on other systems but not sure if the trigger is the same
or not.
Vince
On Fri, 25 Oct 2013, Steven Rostedt wrote:
> On Thu, 2013-10-24 at 14:25 -0400, Vince Weaver wrote:
> > On Thu, 24 Oct 2013, Vince Weaver wrote:
> > > after a month of trying I finally got a small test-case out of my
> > > perf_fuzzer suite that triggers a system lockup with just one syscall.
> > >
> > > Attached is the code that triggers it.
> >
> > And it turns out you can only trigger this specific problem if advanced
> > ftrace options are enabled.
> >
> > CONFIG_KPROBES_ON_FTRACE=y
> > CONFIG_FUNCTION_TRACER=y
> > CONFIG_FUNCTION_GRAPH_TRACER=y
> > CONFIG_STACK_TRACER=y
> > CONFIG_DYNAMIC_FTRACE=y
> > CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
> > CONFIG_FUNCTION_PROFILER=y
> > CONFIG_FTRACE_MCOUNT_RECORD=y
>
> The above STACK_TRACER, FTRACE_WITH_REGS and FUNCTION_PROFILER probably
> don't need to be set, as they are pretty much stand alone, and don't
> look to be involved in the stack traces that you (and Dave) posted.
>
> >
> > Urgh, I had turned those on to try to debug something and forgot to
> > disable. I feel like I saw this problem before I had those enabled so I
> > guess I have to start from scratch fuzzing to see if I can get a more
> > generally reproducible trace.
>
> Looks like something is incorrectly enabling function tracer within
> perf. Peter told me that there's some ref count bug that may use data
> after being freed on exit.
>
> I tried the program that you attached in you previous email, and was not
> able to hit the bug. Are you able to hit the bug with that code each
> time?
yes. My poor core2 machine has been hard-reset (hold down the power
button it's locked that hard) about 200 times in the past month while
trying to track down this problem.
I'm not sure how tracepoints work exactly, but the problem code is setting
pe[5].type=PERF_TYPE_TRACEPOINT;
pe[5].config=0x7fffffff00000001;
The config is being truncated to 32-bits by the perf/ftrace code so I
think this means the tracepoint being enabled is
tracing/events/ftrace/function/id:1
The sample period is set to
pe[5].sample_period=0xffffffffff000000;
and the fd is set to generate a signal on overflow (the crash doesn't
happen unless a signal handler is set up).
If I must I can problem start sprinkling printks around the code to try to
track things down in more detail but I'd rather not if I can avoid that.
Vince
On Fri, 25 Oct 2013, Vince Weaver wrote:
>
> I'm not sure how tracepoints work exactly, but the problem code is setting
> pe[5].type=PERF_TYPE_TRACEPOINT;
> pe[5].config=0x7fffffff00000001;
>
> The config is being truncated to 32-bits by the perf/ftrace code so I
> think this means the tracepoint being enabled is
>
> tracing/events/ftrace/function/id:1
>
I've wasted much of the day playing with this and adding printks, etc.
The key things that cause the problem are:
tracepoint event
config is 1 (tracing/events/ftrace/function)
PERF_SAMPLE_PERIOD set in the sample type
no user mmap buffer
period (not frequency) enabled
What this means is there is pretty high number of kernel calls happening.
Every kernel function entry ends up calling the
perf_swevent_overflow() overflow handler, which calls perf_event_output()
which attempts to dump the buffer, but it can't because no user buffer is
mmap'd.
This causes some sort of storm and eventually the system just stops
responding and the watchdog kicks in, although the traces it gives back
are different each time.
It's possible the kernel is making forward progress (though very slowly)
and this is just some sort of throttling issue.
I don't know if there are any better ways to try to debug things than the
printk route. Though that has its own problems as the printk's themselves
start showing up in the ftrace traces.
Vince
* Vince Weaver <[email protected]> wrote:
> I don't know if there are any better ways to try to debug things
> than the printk route. Though that has its own problems as the
> printk's themselves start showing up in the ftrace traces.
One additional brute-force method is earlyprintk=...,keep and
marking the (handful of) earlyprintk functions (and snprintf)
notrace, plus turning off printk itself completely.
That gives a completely instrumentation-free and locking-free
console output function of last resort :-/
Thanks,
Ingo