2009-04-20 22:29:19

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


Ingo,

Please pull the latest tip/tracing/ftrace tree, which can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/ftrace


Steven Rostedt (2):
tracing: use recursive counter over irq level
tracing: use nowakeup version of commit for function event trace tests

----
kernel/trace/ring_buffer.c | 45 +++++++++++++++---------------------------
kernel/trace/trace_events.c | 2 +-
2 files changed, 17 insertions(+), 30 deletions(-)
--


2009-04-21 08:24:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


* Steven Rostedt <[email protected]> wrote:

>
> Ingo,
>
> Please pull the latest tip/tracing/ftrace tree, which can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/ftrace
>
>
> Steven Rostedt (2):
> tracing: use recursive counter over irq level
> tracing: use nowakeup version of commit for function event trace tests
>
> ----
> kernel/trace/ring_buffer.c | 45 +++++++++++++++---------------------------
> kernel/trace/trace_events.c | 2 +-
> 2 files changed, 17 insertions(+), 30 deletions(-)

Pulled, thanks guys! I've reactivated the self-tests in tip:master -
lets see how it goes today.

Ingo

2009-04-21 09:47:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


* Ingo Molnar <[email protected]> wrote:

> Pulled, thanks guys! I've reactivated the self-tests in tip:master
> - lets see how it goes today.

i'm still getting a ring-buffer warning - see below. Config
attached.

Ingo

[ 7.671145] calling clear_boot_tracer+0x0/0x52 @ 1
[ 7.672145] initcall clear_boot_tracer+0x0/0x52 returned 0 after 0 usecs
[ 7.673145] calling event_trace_self_tests_init+0x0/0x5f @ 1
[ 7.674206] Running tests on trace events:
[ 7.675143] Testing event kfree_skb: OK
[ 7.680145] Testing event kmalloc: OK
[ 7.683819] Testing event kmem_cache_alloc: OK
[ 7.686755] Testing event kmalloc_node: OK
[ 7.689744] Testing event kmem_cache_alloc_node: OK
[ 7.692711] Testing event kfree: OK
[ 7.695712] Testing event kmem_cache_free: OK
[ 7.698744] Testing event irq_handler_exit: OK
[ 7.701724] Testing event irq_handler_entry: OK
[ 7.704710] Testing event softirq_entry: OK
[ 7.707743] Testing event softirq_exit: OK
[ 7.710768] Testing event lock_acquired: OK
[ 7.713738] Testing event lock_acquire: OK
[ 7.716782] Testing event lock_release: OK
[ 7.719771] Testing event lock_contended: OK
[ 7.722740] Testing event sched_kthread_stop: OK
[ 7.726741] Testing event sched_kthread_stop_ret: OK
[ 7.729710] Testing event sched_wait_task: OK
[ 7.732706] Testing event sched_wakeup: OK
[ 7.735713] Testing event sched_wakeup_new: OK
[ 7.738727] Testing event sched_switch: OK
[ 7.741724] Testing event sched_migrate_task: OK
[ 7.745711] Testing event sched_process_free: OK
[ 7.749704] Testing event sched_process_exit: OK
[ 7.753699] Testing event sched_process_wait: OK
[ 7.757692] Testing event sched_process_fork: OK
[ 7.761711] Testing event sched_signal_send: OK
[ 7.764709] Running tests on trace event systems:
[ 7.765144] Testing event system skb: OK
[ 7.768776] Testing event system kmem: OK
[ 7.771817] Testing event system irq: OK
[ 7.774778] Testing event system lockdep: OK
[ 7.777921] Testing event system sched: OK
[ 7.780902] Running tests on all trace events:
[ 7.781148] Testing all events: OK
[ 7.787765] Running tests again, along with the function tracer
[ 7.788156] Running tests on trace events:
[ 7.789154] Testing event kfree_skb: OK
[ 7.792858] Testing event kmalloc: OK
[ 7.795853] Testing event kmem_cache_alloc: OK
[ 7.799848] Testing event kmalloc_node: OK
[ 7.802966] Testing event kmem_cache_alloc_node: OK
[ 7.805970] Testing event kfree: OK
[ 7.808942] Testing event kmem_cache_free: OK
[ 7.812935] Testing event irq_handler_exit: OK
[ 7.815850] Testing event irq_handler_entry: OK
[ 7.818877] Testing event softirq_entry: OK
[ 7.822846] Testing event softirq_exit: OK
[ 7.825895] Testing event lock_acquired: OK
[ 7.830045] Testing event lock_acquire: OK
[ 7.834025] Testing event lock_release: OK
[ 7.837920] Testing event lock_contended: OK
[ 7.841854] Testing event sched_kthread_stop: OK
[ 7.844883] Testing event sched_kthread_stop_ret: OK
[ 7.847850] Testing event sched_wait_task: OK
[ 7.851938] Testing event sched_wakeup: OK
[ 7.855960] Testing event sched_wakeup_new: OK
[ 7.858884] Testing event sched_switch: OK
[ 7.861941] Testing event sched_migrate_task: OK
[ 7.864972] Testing event sched_process_free: OK
[ 7.867853] Testing event sched_process_exit: OK
[ 7.870890] Testing event sched_process_wait: OK
[ 7.873856] Testing event sched_process_fork: OK
[ 7.876930] Testing event sched_signal_send: OK
[ 7.879853] Running tests on trace event systems:
[ 7.880154] Testing event system skb: OK
[ 7.883940] Testing event system kmem: OK
[ 7.888105] Testing event system irq: OK
[ 7.892085] Testing event system lockdep: OK
[ 7.896154] Testing event system sched: OK
[ 7.901156] Running tests on all trace events:
[ 7.903170] Testing all events: <4>------------[ cut here ]------------
[ 7.909137] WARNING: at kernel/trace/ring_buffer.c:1245 __rb_reserve_next+0x130/0x40c()
[ 7.909137] Modules linked in:
[ 7.909137] Pid: 0, comm: swapper Not tainted 2.6.30-rc2-tip #38087
[ 7.909137] Call Trace:
[ 7.909137] <IRQ> [<ffffffff8025a273>] warn_slowpath+0xca/0x115
[ 7.909137] [<ffffffff802bf4fe>] ? __rb_reserve_next+0x2a5/0x40c
[ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
[ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
[ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
[ 7.909137] [<ffffffff802bf2c6>] ? __rb_reserve_next+0x6d/0x40c
[ 7.909137] [<ffffffff802cb54e>] ? time_hardirqs_off+0x31/0x47
[ 7.909137] [<ffffffff802bf389>] __rb_reserve_next+0x130/0x40c
[ 7.909137] [<ffffffff802c00ad>] rb_reserve_next_event+0x1d3/0x31a
[ 7.909137] [<ffffffff802c04c9>] ring_buffer_lock_reserve+0x16b/0x1cb
[ 7.909137] [<ffffffff802c4f4b>] trace_buffer_lock_reserve+0x34/0x7e
[ 7.909137] [<ffffffff802c542c>] trace_vbprintk+0x136/0x275
[ 7.909137] [<ffffffff8028534f>] ? trace_hardirqs_on_caller+0x132/0x18e
[ 7.909137] [<ffffffff80286cfb>] ? lock_acquire+0x74/0x15b
[ 7.909137] [<ffffffff802ca54b>] __trace_bprintk+0x80/0x9d
[ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
[ 7.909137] [<ffffffff802ca54b>] ? __trace_bprintk+0x80/0x9d
[ 7.909137] [<ffffffff80283871>] ? put_lock_stats+0x44/0x5a
[ 7.909137] [<ffffffff802c5d68>] ? tracing_record_cmdline+0x3d/0x141
[ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
[ 7.909137] [<ffffffff8028259c>] ftrace_event_lock_acquire+0x6a/0x89
[ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
[ 7.909137] [<ffffffff80286cfb>] lock_acquire+0x74/0x15b
[ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
[ 7.909137] [<ffffffff802668c1>] ? process_timeout+0x0/0x37
[ 7.909137] [<ffffffff80a246af>] _spin_lock_irq+0x5c/0xa5
[ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
[ 7.909137] [<ffffffff80266273>] run_timer_softirq+0x226/0x281
[ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
[ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
[ 7.909137] [<ffffffff80260bd3>] __do_softirq+0xff/0x22c
[ 7.909137] [<ffffffff802149bc>] call_softirq+0x1c/0x30
[ 7.909137] [<ffffffff802163df>] do_softirq+0x5f/0xdb
[ 7.909137] [<ffffffff802601ac>] irq_exit+0x5d/0x7d
[ 7.909137] [<ffffffff80a2a076>] do_IRQ+0xbe/0xeb
[ 7.909137] [<ffffffff80214153>] ret_from_intr+0x0/0x16
[ 7.909137] <EOI> [<ffffffff802bf2a9>] ? __rb_reserve_next+0x50/0x40c
[ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
[ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
[ 7.909137] [<ffffffff802c04c9>] ? ring_buffer_lock_reserve+0x16b/0x1cb
[ 7.909137] [<ffffffff802c4f4b>] ? trace_buffer_lock_reserve+0x34/0x7e
[ 7.909137] [<ffffffff8021b722>] ? test_ti_thread_flag+0x8/0x38
[ 7.909137] [<ffffffff802c5d15>] ? trace_current_buffer_lock_reserve+0x36/0x4c
[ 7.909137] [<ffffffff802d3227>] ? function_test_events_call+0x8f/0x12e
[ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
[ 7.909137] [<ffffffff802d3251>] ? function_test_events_call+0xb9/0x12e
[ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
[ 7.909137] [<ffffffff8021b727>] ? test_ti_thread_flag+0xd/0x38
[ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
[ 7.909137] [<ffffffff8021b945>] ? poll_idle+0x41/0x64
[ 7.909137] [<ffffffff802cb871>] ? stop_critical_timings+0x3d/0x54
[ 7.909137] [<ffffffff802127b2>] ? cpu_idle+0xb1/0x112
[ 7.909137] [<ffffffff809e6369>] ? rest_init+0x8d/0xa3
[ 7.909137] [<ffffffff8108709a>] ? start_kernel+0x49c/0x4bd
[ 7.909137] [<ffffffff81086140>] ? early_idt_handler+0x0/0x71
[ 7.909137] [<ffffffff810862ce>] ? x86_64_start_reservations+0xb9/0xd4
[ 7.909137] [<ffffffff81086000>] ? _sinittext+0x0/0x140
[ 7.909137] [<ffffffff81086441>] ? x86_64_start_kernel+0x158/0x17b
[ 7.909137] ---[ end trace 3b85485e39bcb7cf ]---
[ 7.912099] OK
[ 7.913421] initcall event_trace_self_tests_init+0x0/0x5f returned 0 after 233398 usecs
[ 7.914172] calling random32_reseed+0x0/0xce @ 1
[ 7.915159] initcall random32_reseed+0x0/0xce returned 0 after 0 usecs
[ 7.916145] calling pci_resource_alignment_sysfs_init+0x0/0x40 @ 1


Attachments:
(No filename) (8.25 kB)
config (68.90 kB)
Download all attachments

2009-04-21 13:08:36

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


On Tue, 21 Apr 2009, Ingo Molnar wrote:

>
> * Ingo Molnar <[email protected]> wrote:
>
> > Pulled, thanks guys! I've reactivated the self-tests in tip:master
> > - lets see how it goes today.
>
> i'm still getting a ring-buffer warning - see below. Config
> attached.
>
> Ingo
>
> [ 7.671145] calling clear_boot_tracer+0x0/0x52 @ 1
> [ 7.672145] initcall clear_boot_tracer+0x0/0x52 returned 0 after 0 usecs
> [ 7.673145] calling event_trace_self_tests_init+0x0/0x5f @ 1
> [ 7.674206] Running tests on trace events:
> [ 7.675143] Testing event kfree_skb: OK
> [ 7.680145] Testing event kmalloc: OK
> [ 7.683819] Testing event kmem_cache_alloc: OK
> [ 7.686755] Testing event kmalloc_node: OK
> [ 7.689744] Testing event kmem_cache_alloc_node: OK
> [ 7.692711] Testing event kfree: OK
> [ 7.695712] Testing event kmem_cache_free: OK
> [ 7.698744] Testing event irq_handler_exit: OK
> [ 7.701724] Testing event irq_handler_entry: OK
> [ 7.704710] Testing event softirq_entry: OK
> [ 7.707743] Testing event softirq_exit: OK
> [ 7.710768] Testing event lock_acquired: OK
> [ 7.713738] Testing event lock_acquire: OK
> [ 7.716782] Testing event lock_release: OK
> [ 7.719771] Testing event lock_contended: OK
> [ 7.722740] Testing event sched_kthread_stop: OK
> [ 7.726741] Testing event sched_kthread_stop_ret: OK
> [ 7.729710] Testing event sched_wait_task: OK
> [ 7.732706] Testing event sched_wakeup: OK
> [ 7.735713] Testing event sched_wakeup_new: OK
> [ 7.738727] Testing event sched_switch: OK
> [ 7.741724] Testing event sched_migrate_task: OK
> [ 7.745711] Testing event sched_process_free: OK
> [ 7.749704] Testing event sched_process_exit: OK
> [ 7.753699] Testing event sched_process_wait: OK
> [ 7.757692] Testing event sched_process_fork: OK
> [ 7.761711] Testing event sched_signal_send: OK
> [ 7.764709] Running tests on trace event systems:
> [ 7.765144] Testing event system skb: OK
> [ 7.768776] Testing event system kmem: OK
> [ 7.771817] Testing event system irq: OK
> [ 7.774778] Testing event system lockdep: OK
> [ 7.777921] Testing event system sched: OK
> [ 7.780902] Running tests on all trace events:
> [ 7.781148] Testing all events: OK
> [ 7.787765] Running tests again, along with the function tracer
> [ 7.788156] Running tests on trace events:
> [ 7.789154] Testing event kfree_skb: OK
> [ 7.792858] Testing event kmalloc: OK
> [ 7.795853] Testing event kmem_cache_alloc: OK
> [ 7.799848] Testing event kmalloc_node: OK
> [ 7.802966] Testing event kmem_cache_alloc_node: OK
> [ 7.805970] Testing event kfree: OK
> [ 7.808942] Testing event kmem_cache_free: OK
> [ 7.812935] Testing event irq_handler_exit: OK
> [ 7.815850] Testing event irq_handler_entry: OK
> [ 7.818877] Testing event softirq_entry: OK
> [ 7.822846] Testing event softirq_exit: OK
> [ 7.825895] Testing event lock_acquired: OK
> [ 7.830045] Testing event lock_acquire: OK
> [ 7.834025] Testing event lock_release: OK
> [ 7.837920] Testing event lock_contended: OK
> [ 7.841854] Testing event sched_kthread_stop: OK
> [ 7.844883] Testing event sched_kthread_stop_ret: OK
> [ 7.847850] Testing event sched_wait_task: OK
> [ 7.851938] Testing event sched_wakeup: OK
> [ 7.855960] Testing event sched_wakeup_new: OK
> [ 7.858884] Testing event sched_switch: OK
> [ 7.861941] Testing event sched_migrate_task: OK
> [ 7.864972] Testing event sched_process_free: OK
> [ 7.867853] Testing event sched_process_exit: OK
> [ 7.870890] Testing event sched_process_wait: OK
> [ 7.873856] Testing event sched_process_fork: OK
> [ 7.876930] Testing event sched_signal_send: OK
> [ 7.879853] Running tests on trace event systems:
> [ 7.880154] Testing event system skb: OK
> [ 7.883940] Testing event system kmem: OK
> [ 7.888105] Testing event system irq: OK
> [ 7.892085] Testing event system lockdep: OK
> [ 7.896154] Testing event system sched: OK
> [ 7.901156] Running tests on all trace events:
> [ 7.903170] Testing all events: <4>------------[ cut here ]------------
> [ 7.909137] WARNING: at kernel/trace/ring_buffer.c:1245 __rb_reserve_next+0x130/0x40c()

The warning means that we wrapped the ring buffer before finishing a
single commit. But on start up, we only have two pages. This can easily
happen if we do not expand the ring buffer before tracing.

This warning does not turn off tracing or disable anything else, because
it is perfectly valid to happen, but on large ring buffers it could be a
sign of something wrong. Maybe I should only warn if the ring buffer has
more than two pages?

-- Steve


> [ 7.909137] Modules linked in:
> [ 7.909137] Pid: 0, comm: swapper Not tainted 2.6.30-rc2-tip #38087
> [ 7.909137] Call Trace:
> [ 7.909137] <IRQ> [<ffffffff8025a273>] warn_slowpath+0xca/0x115
> [ 7.909137] [<ffffffff802bf4fe>] ? __rb_reserve_next+0x2a5/0x40c
> [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> [ 7.909137] [<ffffffff802bf2c6>] ? __rb_reserve_next+0x6d/0x40c
> [ 7.909137] [<ffffffff802cb54e>] ? time_hardirqs_off+0x31/0x47
> [ 7.909137] [<ffffffff802bf389>] __rb_reserve_next+0x130/0x40c
> [ 7.909137] [<ffffffff802c00ad>] rb_reserve_next_event+0x1d3/0x31a
> [ 7.909137] [<ffffffff802c04c9>] ring_buffer_lock_reserve+0x16b/0x1cb
> [ 7.909137] [<ffffffff802c4f4b>] trace_buffer_lock_reserve+0x34/0x7e
> [ 7.909137] [<ffffffff802c542c>] trace_vbprintk+0x136/0x275
> [ 7.909137] [<ffffffff8028534f>] ? trace_hardirqs_on_caller+0x132/0x18e
> [ 7.909137] [<ffffffff80286cfb>] ? lock_acquire+0x74/0x15b
> [ 7.909137] [<ffffffff802ca54b>] __trace_bprintk+0x80/0x9d
> [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> [ 7.909137] [<ffffffff802ca54b>] ? __trace_bprintk+0x80/0x9d
> [ 7.909137] [<ffffffff80283871>] ? put_lock_stats+0x44/0x5a
> [ 7.909137] [<ffffffff802c5d68>] ? tracing_record_cmdline+0x3d/0x141
> [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> [ 7.909137] [<ffffffff8028259c>] ftrace_event_lock_acquire+0x6a/0x89
> [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> [ 7.909137] [<ffffffff80286cfb>] lock_acquire+0x74/0x15b
> [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> [ 7.909137] [<ffffffff802668c1>] ? process_timeout+0x0/0x37
> [ 7.909137] [<ffffffff80a246af>] _spin_lock_irq+0x5c/0xa5
> [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> [ 7.909137] [<ffffffff80266273>] run_timer_softirq+0x226/0x281
> [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> [ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
> [ 7.909137] [<ffffffff80260bd3>] __do_softirq+0xff/0x22c
> [ 7.909137] [<ffffffff802149bc>] call_softirq+0x1c/0x30
> [ 7.909137] [<ffffffff802163df>] do_softirq+0x5f/0xdb
> [ 7.909137] [<ffffffff802601ac>] irq_exit+0x5d/0x7d
> [ 7.909137] [<ffffffff80a2a076>] do_IRQ+0xbe/0xeb
> [ 7.909137] [<ffffffff80214153>] ret_from_intr+0x0/0x16
> [ 7.909137] <EOI> [<ffffffff802bf2a9>] ? __rb_reserve_next+0x50/0x40c
> [ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
> [ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
> [ 7.909137] [<ffffffff802c04c9>] ? ring_buffer_lock_reserve+0x16b/0x1cb
> [ 7.909137] [<ffffffff802c4f4b>] ? trace_buffer_lock_reserve+0x34/0x7e
> [ 7.909137] [<ffffffff8021b722>] ? test_ti_thread_flag+0x8/0x38
> [ 7.909137] [<ffffffff802c5d15>] ? trace_current_buffer_lock_reserve+0x36/0x4c
> [ 7.909137] [<ffffffff802d3227>] ? function_test_events_call+0x8f/0x12e
> [ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
> [ 7.909137] [<ffffffff802d3251>] ? function_test_events_call+0xb9/0x12e
> [ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
> [ 7.909137] [<ffffffff8021b727>] ? test_ti_thread_flag+0xd/0x38
> [ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
> [ 7.909137] [<ffffffff8021b945>] ? poll_idle+0x41/0x64
> [ 7.909137] [<ffffffff802cb871>] ? stop_critical_timings+0x3d/0x54
> [ 7.909137] [<ffffffff802127b2>] ? cpu_idle+0xb1/0x112
> [ 7.909137] [<ffffffff809e6369>] ? rest_init+0x8d/0xa3
> [ 7.909137] [<ffffffff8108709a>] ? start_kernel+0x49c/0x4bd
> [ 7.909137] [<ffffffff81086140>] ? early_idt_handler+0x0/0x71
> [ 7.909137] [<ffffffff810862ce>] ? x86_64_start_reservations+0xb9/0xd4
> [ 7.909137] [<ffffffff81086000>] ? _sinittext+0x0/0x140
> [ 7.909137] [<ffffffff81086441>] ? x86_64_start_kernel+0x158/0x17b
> [ 7.909137] ---[ end trace 3b85485e39bcb7cf ]---
> [ 7.912099] OK
> [ 7.913421] initcall event_trace_self_tests_init+0x0/0x5f returned 0 after 233398 usecs
> [ 7.914172] calling random32_reseed+0x0/0xce @ 1
> [ 7.915159] initcall random32_reseed+0x0/0xce returned 0 after 0 usecs
> [ 7.916145] calling pci_resource_alignment_sysfs_init+0x0/0x40 @ 1
>

2009-04-21 13:55:53

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH][GIT PULL] ring-buffer: only warn on wrap if buffer is bigger than two pages


Ingo,

Please pull the latest tip/tracing/ftrace tree, which can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/ftrace


Steven Rostedt (1):
ring-buffer: only warn on wrap if buffer is bigger than two pages

----
kernel/trace/ring_buffer.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
---------------------------
commit b76929b6d87e98e49a0988b863debb1a12bcf42b
Author: Steven Rostedt <[email protected]>
Date: Tue Apr 21 09:41:26 2009 -0400

ring-buffer: only warn on wrap if buffer is bigger than two pages

On boot up, to save memory, ftrace allocates the minimum buffer
which is two pages. Ftrace also goes through a series of tests
(when configured) on boot up. These tests can fill up a page within
a single interrupt.

The ring buffer also has a WARN_ON when it detects that the buffer was
completely filled within a single commit (other commits are allowed to
be nested).

Combine the small buffer on start up, with the tests that can fill more
than a single page within an interrupt, this can trigger the WARN_ON.

This patch makes the WARN_ON only happen when the ring buffer consists
of more than two pages.

[ Impact: prevent false WARN_ON in ftrace startup tests ]

Reported-by: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7bcfd3e..61dbdf2 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1241,7 +1241,8 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
* about it.
*/
if (unlikely(next_page == commit_page)) {
- WARN_ON_ONCE(1);
+ /* This can easily happen on small ring buffers */
+ WARN_ON_ONCE(buffer->pages > 2);
goto out_reset;
}


2009-04-21 14:04:21

by Steven Rostedt

[permalink] [raw]
Subject: [tip:tracing/core] ring-buffer: only warn on wrap if buffer is bigger than two pages

Commit-ID: 3554228d4289098a8fe5cfd87512ec32a19bbe5a
Gitweb: http://git.kernel.org/tip/3554228d4289098a8fe5cfd87512ec32a19bbe5a
Author: Steven Rostedt <[email protected]>
AuthorDate: Tue, 21 Apr 2009 09:41:26 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 21 Apr 2009 16:00:45 +0200

ring-buffer: only warn on wrap if buffer is bigger than two pages

On boot up, to save memory, ftrace allocates the minimum buffer
which is two pages. Ftrace also goes through a series of tests
(when configured) on boot up. These tests can fill up a page within
a single interrupt.

The ring buffer also has a WARN_ON when it detects that the buffer was
completely filled within a single commit (other commits are allowed to
be nested).

Combine the small buffer on start up, with the tests that can fill more
than a single page within an interrupt, this can trigger the WARN_ON.

This patch makes the WARN_ON only happen when the ring buffer consists
of more than two pages.

[ Impact: prevent false WARN_ON in ftrace startup tests ]

Reported-by: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/trace/ring_buffer.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7bcfd3e..61dbdf2 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1241,7 +1241,8 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer,
* about it.
*/
if (unlikely(next_page == commit_page)) {
- WARN_ON_ONCE(1);
+ /* This can easily happen on small ring buffers */
+ WARN_ON_ONCE(buffer->pages > 2);
goto out_reset;
}

2009-04-21 14:35:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][GIT PULL] ring-buffer: only warn on wrap if buffer is bigger than two pages


here's a new one:

Testing event sched_kthread_stop: <4>------------[ cut here ]------------
WARNING: at kernel/fork.c:988 copy_process+0x147/0xf7d()
Modules linked in:

config attached, bootlog below. This warning has the signs of some
sort of lockdep interaction. (this too is a new one - it never
triggered before)

Ingo

device: 'network_throughput': device_add
PM: Adding info for No Bus:network_throughput
initcall pm_qos_power_init+0x0/0xca returned 0 after 22529 usecs
calling software_resume+0x0/0x1b9 @ 1
initcall software_resume+0x0/0x1b9 returned -2 after 46 usecs
initcall software_resume+0x0/0x1b9 returned with error code -2
calling debugfs_kprobe_init+0x0/0x89 @ 1
initcall debugfs_kprobe_init+0x0/0x89 returned 0 after 34 usecs
calling taskstats_init+0x0/0x95 @ 1
registered taskstats version 1
initcall taskstats_init+0x0/0x95 returned 0 after 2760 usecs
calling clear_boot_tracer+0x0/0x2d @ 1
initcall clear_boot_tracer+0x0/0x2d returned 0 after 1 usecs
calling event_trace_self_tests_init+0x0/0x38 @ 1
Running tests on trace events:
Testing event kfree_skb: OK
Testing event kmalloc: OK
Testing event kmem_cache_alloc: OK
Testing event kmalloc_node: OK
Testing event kmem_cache_alloc_node: OK
Testing event kfree: OK
Testing event kmem_cache_free: OK
Testing event irq_handler_exit: OK
Testing event irq_handler_entry: OK
Testing event softirq_entry: OK
Testing event softirq_exit: OK
Testing event lock_acquired: OK
Testing event lock_acquire: OK
Testing event lock_release: OK
Testing event lock_contended: OK
Testing event sched_kthread_stop: OK
Testing event sched_kthread_stop_ret: OK
Testing event sched_wait_task: OK
Testing event sched_wakeup: OK
Testing event sched_wakeup_new: OK
Testing event sched_switch: OK
Testing event sched_migrate_task: OK
Testing event sched_process_free: OK
Testing event sched_process_exit: OK
Testing event sched_process_wait: OK
Testing event sched_process_fork: OK
Testing event sched_signal_send: OK
Running tests on trace event systems:
Testing event system skb: OK
Testing event system kmem: OK
Testing event system irq: OK
Testing event system lockdep: OK
Testing event system sched: OK
Running tests on all trace events:
Testing all events: OK
Running tests again, along with the function tracer
Running tests on trace events:
Testing event kfree_skb: OK
Testing event kmalloc: OK
Testing event kmem_cache_alloc: OK
Testing event kmalloc_node: OK
Testing event kmem_cache_alloc_node: OK
Testing event kfree: OK
Testing event kmem_cache_free: OK
Testing event irq_handler_exit: OK
Testing event irq_handler_entry: OK
Testing event softirq_entry: OK
Testing event softirq_exit: OK
Testing event lock_acquired: OK
Testing event lock_acquire: OK
Testing event lock_release: OK
Testing event lock_contended: OK
Testing event sched_kthread_stop: <4>------------[ cut here ]------------
WARNING: at kernel/fork.c:988 copy_process+0x147/0xf7d()
Modules linked in:
Pid: 2, comm: kthreadd Not tainted 2.6.30-rc2-tip-01581-gf4879a9-dirty #34509
Call Trace:
[<ffffffff80249801>] warn_slowpath+0x96/0xca
[<ffffffff8026f200>] ? trace_hardirqs_on_caller+0xff/0x14c
[<ffffffff8026f25a>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff80261cb6>] ? __mutex_init+0x5a/0x64
[<ffffffff802accdf>] ? perf_counter_init_task+0x65/0x17b
[<ffffffff80247ca6>] copy_process+0x147/0xf7d
[<ffffffff80211d57>] ? trace+0x3a/0x73
[<ffffffff80248c7b>] do_fork+0x19f/0x478
[<ffffffff8025f03c>] ? kthreadd+0xb8/0x11c
[<ffffffff80212f12>] kernel_thread+0x82/0xe0
[<ffffffff808cbad4>] ? schedule+0x11f/0x5e1
[<ffffffff8025f0a0>] ? kthread+0x0/0x80
[<ffffffff80212f70>] ? child_rip+0x0/0x20
[<ffffffff8025f050>] ? kthreadd+0xcc/0x11c
[<ffffffff80fde140>] ? early_idt_handler+0x0/0x71
[<ffffffff80212f7a>] child_rip+0xa/0x20
[<ffffffff80212a52>] ? restore_args+0x0/0x30
[<ffffffff8025ef84>] ? kthreadd+0x0/0x11c
[<ffffffff80212f70>] ? child_rip+0x0/0x20
---[ end trace 0843ba422c5d704e ]---
OK
Testing event sched_kthread_stop_ret: OK
Testing event sched_wait_task: OK
Testing event sched_wakeup: OK
Testing event sched_wakeup_new: OK
Testing event sched_switch: OK
Testing event sched_migrate_task: OK
Testing event sched_process_free: OK
Testing event sched_process_exit: OK
Testing event sched_process_wait: OK
Testing event sched_process_fork: OK
Testing event sched_signal_send: OK
Running tests on trace event systems:
Testing event system skb: OK
Testing event system kmem: OK
Testing event system irq: OK
Testing event system lockdep: OK
Testing event system sched: OK
Running tests on all trace events:
Testing all events: OK
initcall event_trace_self_tests_init+0x0/0x38 returned 0 after 444440 usecs
calling fail_page_alloc_debugfs+0x0/0xe4 @ 1
initcall fail_page_alloc_debugfs+0x0/0xe4 returned 0 after 78 usecs
calling max_swapfiles_check+0x0/0x8 @ 1
initcall max_swapfiles_check+0x0/0x8 returned 0 after 1 usecs


Attachments:
(No filename) (4.76 kB)
config (63.56 kB)
Download all attachments

2009-04-21 14:54:25

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH][GIT PULL] ring-buffer: only warn on wrap if buffer is bigger than two pages




On Tue, 21 Apr 2009, Ingo Molnar wrote:

>
> here's a new one:
>
> Testing event sched_kthread_stop: <4>------------[ cut here ]------------
> WARNING: at kernel/fork.c:988 copy_process+0x147/0xf7d()
> Modules linked in:
>
> config attached, bootlog below. This warning has the signs of some
> sort of lockdep interaction. (this too is a new one - it never
> triggered before)
>
> Ingo
>
> Running tests again, along with the function tracer
> Running tests on trace events:
> Testing event kfree_skb: OK
> Testing event kmalloc: OK
> Testing event kmem_cache_alloc: OK
> Testing event kmalloc_node: OK
> Testing event kmem_cache_alloc_node: OK
> Testing event kfree: OK
> Testing event kmem_cache_free: OK
> Testing event irq_handler_exit: OK
> Testing event irq_handler_entry: OK
> Testing event softirq_entry: OK
> Testing event softirq_exit: OK
> Testing event lock_acquired: OK
> Testing event lock_acquire: OK
> Testing event lock_release: OK
> Testing event lock_contended: OK
> Testing event sched_kthread_stop: <4>------------[ cut here ]------------
> WARNING: at kernel/fork.c:988 copy_process+0x147/0xf7d()
> Modules linked in:
> Pid: 2, comm: kthreadd Not tainted 2.6.30-rc2-tip-01581-gf4879a9-dirty #34509
> Call Trace:
> [<ffffffff80249801>] warn_slowpath+0x96/0xca
> [<ffffffff8026f200>] ? trace_hardirqs_on_caller+0xff/0x14c
> [<ffffffff8026f25a>] ? trace_hardirqs_on+0xd/0xf
> [<ffffffff80261cb6>] ? __mutex_init+0x5a/0x64
> [<ffffffff802accdf>] ? perf_counter_init_task+0x65/0x17b
> [<ffffffff80247ca6>] copy_process+0x147/0xf7d
> [<ffffffff80211d57>] ? trace+0x3a/0x73
> [<ffffffff80248c7b>] do_fork+0x19f/0x478
> [<ffffffff8025f03c>] ? kthreadd+0xb8/0x11c
> [<ffffffff80212f12>] kernel_thread+0x82/0xe0
> [<ffffffff808cbad4>] ? schedule+0x11f/0x5e1
> [<ffffffff8025f0a0>] ? kthread+0x0/0x80
> [<ffffffff80212f70>] ? child_rip+0x0/0x20
> [<ffffffff8025f050>] ? kthreadd+0xcc/0x11c
> [<ffffffff80fde140>] ? early_idt_handler+0x0/0x71
> [<ffffffff80212f7a>] child_rip+0xa/0x20
> [<ffffffff80212a52>] ? restore_args+0x0/0x30
> [<ffffffff8025ef84>] ? kthreadd+0x0/0x11c
> [<ffffffff80212f70>] ? child_rip+0x0/0x20
> ---[ end trace 0843ba422c5d704e ]---
> OK
> Testing event sched_kthread_stop_ret: OK

The above looks like the tracer slowed things down enough to trigger a
race elsewhere.

We have function tracing happening, and here we enabled the
sched_kthread_stop tracepoint. Each test starts and stops a kernel thread.

I'm not sure how this could have triggered.

I'll investigate a little more.

-- Steve

2009-04-21 16:22:40

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH][GIT PULL] ring-buffer: only warn on wrap if buffer is bigger than two pages


On Tue, 21 Apr 2009, Ingo Molnar wrote:

>
> here's a new one:
>
> Testing event sched_kthread_stop: <4>------------[ cut here ]------------
> WARNING: at kernel/fork.c:988 copy_process+0x147/0xf7d()
> Modules linked in:
>
> config attached, bootlog below. This warning has the signs of some
> sort of lockdep interaction. (this too is a new one - it never
> triggered before)
>
> Ingo
>

> Testing event sched_process_fork: OK

Interesting, with your config I hit the bug here. But it was the
hardirqs_enabled that was hit (was that the same you had?)

I noticed that you do not have the DEBUG_LOCKDEP set. I turned it on and
it booted fine :-/

-- Steve

> Testing event sched_signal_send: OK
> Running tests on trace event systems:
> Testing event system skb: OK
> Testing event system kmem: OK
> Testing event system irq: OK
> Testing event system lockdep: OK
> Testing event system sched: OK
> Running tests on all trace events:
> Testing all events: OK
> Running tests again, along with the function tracer
> Running tests on trace events:
> Testing event kfree_skb: OK
> Testing event kmalloc: OK
> Testing event kmem_cache_alloc: OK
> Testing event kmalloc_node: OK
> Testing event kmem_cache_alloc_node: OK
> Testing event kfree: OK
> Testing event kmem_cache_free: OK
> Testing event irq_handler_exit: OK
> Testing event irq_handler_entry: OK
> Testing event softirq_entry: OK
> Testing event softirq_exit: OK
> Testing event lock_acquired: OK
> Testing event lock_acquire: OK
> Testing event lock_release: OK
> Testing event lock_contended: OK
> Testing event sched_kthread_stop: <4>------------[ cut here ]------------
> WARNING: at kernel/fork.c:988 copy_process+0x147/0xf7d()
> Modules linked in:
> Pid: 2, comm: kthreadd Not tainted 2.6.30-rc2-tip-01581-gf4879a9-dirty #34509
> Call Trace:
> [<ffffffff80249801>] warn_slowpath+0x96/0xca
> [<ffffffff8026f200>] ? trace_hardirqs_on_caller+0xff/0x14c
> [<ffffffff8026f25a>] ? trace_hardirqs_on+0xd/0xf
> [<ffffffff80261cb6>] ? __mutex_init+0x5a/0x64
> [<ffffffff802accdf>] ? perf_counter_init_task+0x65/0x17b
> [<ffffffff80247ca6>] copy_process+0x147/0xf7d
> [<ffffffff80211d57>] ? trace+0x3a/0x73
> [<ffffffff80248c7b>] do_fork+0x19f/0x478
> [<ffffffff8025f03c>] ? kthreadd+0xb8/0x11c
> [<ffffffff80212f12>] kernel_thread+0x82/0xe0
> [<ffffffff808cbad4>] ? schedule+0x11f/0x5e1
> [<ffffffff8025f0a0>] ? kthread+0x0/0x80
> [<ffffffff80212f70>] ? child_rip+0x0/0x20
> [<ffffffff8025f050>] ? kthreadd+0xcc/0x11c
> [<ffffffff80fde140>] ? early_idt_handler+0x0/0x71
> [<ffffffff80212f7a>] child_rip+0xa/0x20
> [<ffffffff80212a52>] ? restore_args+0x0/0x30
> [<ffffffff8025ef84>] ? kthreadd+0x0/0x11c
> [<ffffffff80212f70>] ? child_rip+0x0/0x20
> ---[ end trace 0843ba422c5d704e ]---
> OK
> Testing event sched_kthread_stop_ret: OK
> Testing event sched_wait_task: OK
> Testing event sched_wakeup: OK
> Testing event sched_wakeup_new: OK
> Testing event sched_switch: OK
> Testing event sched_migrate_task: OK
> Testing event sched_process_free: OK
> Testing event sched_process_exit: OK
> Testing event sched_process_wait: OK
> Testing event sched_process_fork: OK
> Testing event sched_signal_send: OK
> Running tests on trace event systems:
> Testing event system skb: OK
> Testing event system kmem: OK
> Testing event system irq: OK
> Testing event system lockdep: OK
> Testing event system sched: OK
> Running tests on all trace events:
> Testing all events: OK
> initcall event_trace_self_tests_init+0x0/0x38 returned 0 after 444440 usecs
> calling fail_page_alloc_debugfs+0x0/0xe4 @ 1
> initcall fail_page_alloc_debugfs+0x0/0xe4 returned 0 after 78 usecs
> calling max_swapfiles_check+0x0/0x8 @ 1
> initcall max_swapfiles_check+0x0/0x8 returned 0 after 1 usecs
>

2009-04-22 06:48:23

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes



On Tue, 21 Apr 2009, Ingo Molnar wrote:

>
> * Ingo Molnar <[email protected]> wrote:
>
> > Pulled, thanks guys! I've reactivated the self-tests in tip:master
> > - lets see how it goes today.
>
> i'm still getting a ring-buffer warning - see below. Config
> attached.
>
> Ingo
>
> [ 7.671145] calling clear_boot_tracer+0x0/0x52 @ 1
> [ 7.672145] initcall clear_boot_tracer+0x0/0x52 returned 0 after 0 usecs
> [ 7.673145] calling event_trace_self_tests_init+0x0/0x5f @ 1
> [ 7.674206] Running tests on trace events:
> [ 7.675143] Testing event kfree_skb: OK
> [ 7.680145] Testing event kmalloc: OK
> [ 7.683819] Testing event kmem_cache_alloc: OK
> [ 7.686755] Testing event kmalloc_node: OK
> [ 7.689744] Testing event kmem_cache_alloc_node: OK
> [ 7.692711] Testing event kfree: OK
> [ 7.695712] Testing event kmem_cache_free: OK
> [ 7.698744] Testing event irq_handler_exit: OK
> [ 7.701724] Testing event irq_handler_entry: OK
> [ 7.704710] Testing event softirq_entry: OK
> [ 7.707743] Testing event softirq_exit: OK
> [ 7.710768] Testing event lock_acquired: OK
> [ 7.713738] Testing event lock_acquire: OK
> [ 7.716782] Testing event lock_release: OK
> [ 7.719771] Testing event lock_contended: OK
> [ 7.722740] Testing event sched_kthread_stop: OK
> [ 7.726741] Testing event sched_kthread_stop_ret: OK
> [ 7.729710] Testing event sched_wait_task: OK
> [ 7.732706] Testing event sched_wakeup: OK
> [ 7.735713] Testing event sched_wakeup_new: OK
> [ 7.738727] Testing event sched_switch: OK
> [ 7.741724] Testing event sched_migrate_task: OK
> [ 7.745711] Testing event sched_process_free: OK
> [ 7.749704] Testing event sched_process_exit: OK
> [ 7.753699] Testing event sched_process_wait: OK
> [ 7.757692] Testing event sched_process_fork: OK
> [ 7.761711] Testing event sched_signal_send: OK
> [ 7.764709] Running tests on trace event systems:
> [ 7.765144] Testing event system skb: OK
> [ 7.768776] Testing event system kmem: OK
> [ 7.771817] Testing event system irq: OK
> [ 7.774778] Testing event system lockdep: OK
> [ 7.777921] Testing event system sched: OK
> [ 7.780902] Running tests on all trace events:
> [ 7.781148] Testing all events: OK
> [ 7.787765] Running tests again, along with the function tracer
> [ 7.788156] Running tests on trace events:
> [ 7.789154] Testing event kfree_skb: OK
> [ 7.792858] Testing event kmalloc: OK
> [ 7.795853] Testing event kmem_cache_alloc: OK
> [ 7.799848] Testing event kmalloc_node: OK
> [ 7.802966] Testing event kmem_cache_alloc_node: OK
> [ 7.805970] Testing event kfree: OK
> [ 7.808942] Testing event kmem_cache_free: OK
> [ 7.812935] Testing event irq_handler_exit: OK
> [ 7.815850] Testing event irq_handler_entry: OK
> [ 7.818877] Testing event softirq_entry: OK
> [ 7.822846] Testing event softirq_exit: OK
> [ 7.825895] Testing event lock_acquired: OK
> [ 7.830045] Testing event lock_acquire: OK
> [ 7.834025] Testing event lock_release: OK
> [ 7.837920] Testing event lock_contended: OK
> [ 7.841854] Testing event sched_kthread_stop: OK
> [ 7.844883] Testing event sched_kthread_stop_ret: OK
> [ 7.847850] Testing event sched_wait_task: OK
> [ 7.851938] Testing event sched_wakeup: OK
> [ 7.855960] Testing event sched_wakeup_new: OK
> [ 7.858884] Testing event sched_switch: OK
> [ 7.861941] Testing event sched_migrate_task: OK
> [ 7.864972] Testing event sched_process_free: OK
> [ 7.867853] Testing event sched_process_exit: OK
> [ 7.870890] Testing event sched_process_wait: OK
> [ 7.873856] Testing event sched_process_fork: OK
> [ 7.876930] Testing event sched_signal_send: OK
> [ 7.879853] Running tests on trace event systems:
> [ 7.880154] Testing event system skb: OK
> [ 7.883940] Testing event system kmem: OK
> [ 7.888105] Testing event system irq: OK
> [ 7.892085] Testing event system lockdep: OK
> [ 7.896154] Testing event system sched: OK
> [ 7.901156] Running tests on all trace events:
> [ 7.903170] Testing all events: <4>------------[ cut here ]------------
> [ 7.909137] WARNING: at kernel/trace/ring_buffer.c:1245 __rb_reserve_next+0x130/0x40c()
> [ 7.909137] Modules linked in:
> [ 7.909137] Pid: 0, comm: swapper Not tainted 2.6.30-rc2-tip #38087
> [ 7.909137] Call Trace:
> [ 7.909137] <IRQ> [<ffffffff8025a273>] warn_slowpath+0xca/0x115
> [ 7.909137] [<ffffffff802bf4fe>] ? __rb_reserve_next+0x2a5/0x40c
> [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> [ 7.909137] [<ffffffff802bf2c6>] ? __rb_reserve_next+0x6d/0x40c
> [ 7.909137] [<ffffffff802cb54e>] ? time_hardirqs_off+0x31/0x47
> [ 7.909137] [<ffffffff802bf389>] __rb_reserve_next+0x130/0x40c
> [ 7.909137] [<ffffffff802c00ad>] rb_reserve_next_event+0x1d3/0x31a
> [ 7.909137] [<ffffffff802c04c9>] ring_buffer_lock_reserve+0x16b/0x1cb
> [ 7.909137] [<ffffffff802c4f4b>] trace_buffer_lock_reserve+0x34/0x7e
> [ 7.909137] [<ffffffff802c542c>] trace_vbprintk+0x136/0x275
> [ 7.909137] [<ffffffff8028534f>] ? trace_hardirqs_on_caller+0x132/0x18e
> [ 7.909137] [<ffffffff80286cfb>] ? lock_acquire+0x74/0x15b
> [ 7.909137] [<ffffffff802ca54b>] __trace_bprintk+0x80/0x9d
> [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> [ 7.909137] [<ffffffff802ca54b>] ? __trace_bprintk+0x80/0x9d
> [ 7.909137] [<ffffffff80283871>] ? put_lock_stats+0x44/0x5a
> [ 7.909137] [<ffffffff802c5d68>] ? tracing_record_cmdline+0x3d/0x141
> [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> [ 7.909137] [<ffffffff8028259c>] ftrace_event_lock_acquire+0x6a/0x89
> [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> [ 7.909137] [<ffffffff80286cfb>] lock_acquire+0x74/0x15b
> [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> [ 7.909137] [<ffffffff802668c1>] ? process_timeout+0x0/0x37
> [ 7.909137] [<ffffffff80a246af>] _spin_lock_irq+0x5c/0xa5
> [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> [ 7.909137] [<ffffffff80266273>] run_timer_softirq+0x226/0x281
> [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> [ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
> [ 7.909137] [<ffffffff80260bd3>] __do_softirq+0xff/0x22c
> [ 7.909137] [<ffffffff802149bc>] call_softirq+0x1c/0x30
> [ 7.909137] [<ffffffff802163df>] do_softirq+0x5f/0xdb
> [ 7.909137] [<ffffffff802601ac>] irq_exit+0x5d/0x7d
> [ 7.909137] [<ffffffff80a2a076>] do_IRQ+0xbe/0xeb
> [ 7.909137] [<ffffffff80214153>] ret_from_intr+0x0/0x16
> [ 7.909137] <EOI> [<ffffffff802bf2a9>] ? __rb_reserve_next+0x50/0x40c
> [ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
> [ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
> [ 7.909137] [<ffffffff802c04c9>] ? ring_buffer_lock_reserve+0x16b/0x1cb
> [ 7.909137] [<ffffffff802c4f4b>] ? trace_buffer_lock_reserve+0x34/0x7e
> [ 7.909137] [<ffffffff8021b722>] ? test_ti_thread_flag+0x8/0x38
> [ 7.909137] [<ffffffff802c5d15>] ? trace_current_buffer_lock_reserve+0x36/0x4c
> [ 7.909137] [<ffffffff802d3227>] ? function_test_events_call+0x8f/0x12e
> [ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
> [ 7.909137] [<ffffffff802d3251>] ? function_test_events_call+0xb9/0x12e
> [ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
> [ 7.909137] [<ffffffff8021b727>] ? test_ti_thread_flag+0xd/0x38
> [ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
> [ 7.909137] [<ffffffff8021b945>] ? poll_idle+0x41/0x64
> [ 7.909137] [<ffffffff802cb871>] ? stop_critical_timings+0x3d/0x54
> [ 7.909137] [<ffffffff802127b2>] ? cpu_idle+0xb1/0x112
> [ 7.909137] [<ffffffff809e6369>] ? rest_init+0x8d/0xa3
> [ 7.909137] [<ffffffff8108709a>] ? start_kernel+0x49c/0x4bd
> [ 7.909137] [<ffffffff81086140>] ? early_idt_handler+0x0/0x71
> [ 7.909137] [<ffffffff810862ce>] ? x86_64_start_reservations+0xb9/0xd4
> [ 7.909137] [<ffffffff81086000>] ? _sinittext+0x0/0x140
> [ 7.909137] [<ffffffff81086441>] ? x86_64_start_kernel+0x158/0x17b
> [ 7.909137] ---[ end trace 3b85485e39bcb7cf ]---
> [ 7.912099] OK
> [ 7.913421] initcall event_trace_self_tests_init+0x0/0x5f returned 0 after 233398 usecs
> [ 7.914172] calling random32_reseed+0x0/0xce @ 1
> [ 7.915159] initcall random32_reseed+0x0/0xce returned 0 after 0 usecs
> [ 7.916145] calling pci_resource_alignment_sysfs_init+0x0/0x40 @ 1
>

I spent the entire day (and half the night) debugging this. I was fighting
a case where the hardirqs_enabled flag in the task struct (lockdep flag)
was mysteriously being set and cleared. I stepped through the entire
kernel thread fork process (that was an exercise) and could not find
anything wrong.

Sometimes it would go away with printk's sometimes it would not. This was
driving me crazy, until I noticed that paravirt was enabled.

Turning off paravirtualization here (so far) makes everything run
smoothly.

Thus my theory is that there's something fishy with the modifying of the
irq enable/disable code when the system detects that it is running on bare
hardware.

I'm too tired to look at this more. Ingo supplied a config to play with.
You can disable VSMP too and it will still trigger the crash.

-- Steve

2009-04-22 11:48:13

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes

On Wed, Apr 22, 2009 at 02:48:09AM -0400, Steven Rostedt wrote:
>
>
> On Tue, 21 Apr 2009, Ingo Molnar wrote:
>
> >
> > * Ingo Molnar <[email protected]> wrote:
> >
> > > Pulled, thanks guys! I've reactivated the self-tests in tip:master
> > > - lets see how it goes today.
> >
> > i'm still getting a ring-buffer warning - see below. Config
> > attached.
> >
> > Ingo
> >
> > [ 7.671145] calling clear_boot_tracer+0x0/0x52 @ 1
> > [ 7.672145] initcall clear_boot_tracer+0x0/0x52 returned 0 after 0 usecs
> > [ 7.673145] calling event_trace_self_tests_init+0x0/0x5f @ 1
> > [ 7.674206] Running tests on trace events:
> > [ 7.675143] Testing event kfree_skb: OK
> > [ 7.680145] Testing event kmalloc: OK
> > [ 7.683819] Testing event kmem_cache_alloc: OK
> > [ 7.686755] Testing event kmalloc_node: OK
> > [ 7.689744] Testing event kmem_cache_alloc_node: OK
> > [ 7.692711] Testing event kfree: OK
> > [ 7.695712] Testing event kmem_cache_free: OK
> > [ 7.698744] Testing event irq_handler_exit: OK
> > [ 7.701724] Testing event irq_handler_entry: OK
> > [ 7.704710] Testing event softirq_entry: OK
> > [ 7.707743] Testing event softirq_exit: OK
> > [ 7.710768] Testing event lock_acquired: OK
> > [ 7.713738] Testing event lock_acquire: OK
> > [ 7.716782] Testing event lock_release: OK
> > [ 7.719771] Testing event lock_contended: OK
> > [ 7.722740] Testing event sched_kthread_stop: OK
> > [ 7.726741] Testing event sched_kthread_stop_ret: OK
> > [ 7.729710] Testing event sched_wait_task: OK
> > [ 7.732706] Testing event sched_wakeup: OK
> > [ 7.735713] Testing event sched_wakeup_new: OK
> > [ 7.738727] Testing event sched_switch: OK
> > [ 7.741724] Testing event sched_migrate_task: OK
> > [ 7.745711] Testing event sched_process_free: OK
> > [ 7.749704] Testing event sched_process_exit: OK
> > [ 7.753699] Testing event sched_process_wait: OK
> > [ 7.757692] Testing event sched_process_fork: OK
> > [ 7.761711] Testing event sched_signal_send: OK
> > [ 7.764709] Running tests on trace event systems:
> > [ 7.765144] Testing event system skb: OK
> > [ 7.768776] Testing event system kmem: OK
> > [ 7.771817] Testing event system irq: OK
> > [ 7.774778] Testing event system lockdep: OK
> > [ 7.777921] Testing event system sched: OK
> > [ 7.780902] Running tests on all trace events:
> > [ 7.781148] Testing all events: OK
> > [ 7.787765] Running tests again, along with the function tracer
> > [ 7.788156] Running tests on trace events:
> > [ 7.789154] Testing event kfree_skb: OK
> > [ 7.792858] Testing event kmalloc: OK
> > [ 7.795853] Testing event kmem_cache_alloc: OK
> > [ 7.799848] Testing event kmalloc_node: OK
> > [ 7.802966] Testing event kmem_cache_alloc_node: OK
> > [ 7.805970] Testing event kfree: OK
> > [ 7.808942] Testing event kmem_cache_free: OK
> > [ 7.812935] Testing event irq_handler_exit: OK
> > [ 7.815850] Testing event irq_handler_entry: OK
> > [ 7.818877] Testing event softirq_entry: OK
> > [ 7.822846] Testing event softirq_exit: OK
> > [ 7.825895] Testing event lock_acquired: OK
> > [ 7.830045] Testing event lock_acquire: OK
> > [ 7.834025] Testing event lock_release: OK
> > [ 7.837920] Testing event lock_contended: OK
> > [ 7.841854] Testing event sched_kthread_stop: OK
> > [ 7.844883] Testing event sched_kthread_stop_ret: OK
> > [ 7.847850] Testing event sched_wait_task: OK
> > [ 7.851938] Testing event sched_wakeup: OK
> > [ 7.855960] Testing event sched_wakeup_new: OK
> > [ 7.858884] Testing event sched_switch: OK
> > [ 7.861941] Testing event sched_migrate_task: OK
> > [ 7.864972] Testing event sched_process_free: OK
> > [ 7.867853] Testing event sched_process_exit: OK
> > [ 7.870890] Testing event sched_process_wait: OK
> > [ 7.873856] Testing event sched_process_fork: OK
> > [ 7.876930] Testing event sched_signal_send: OK
> > [ 7.879853] Running tests on trace event systems:
> > [ 7.880154] Testing event system skb: OK
> > [ 7.883940] Testing event system kmem: OK
> > [ 7.888105] Testing event system irq: OK
> > [ 7.892085] Testing event system lockdep: OK
> > [ 7.896154] Testing event system sched: OK
> > [ 7.901156] Running tests on all trace events:
> > [ 7.903170] Testing all events: <4>------------[ cut here ]------------
> > [ 7.909137] WARNING: at kernel/trace/ring_buffer.c:1245 __rb_reserve_next+0x130/0x40c()
> > [ 7.909137] Modules linked in:
> > [ 7.909137] Pid: 0, comm: swapper Not tainted 2.6.30-rc2-tip #38087
> > [ 7.909137] Call Trace:
> > [ 7.909137] <IRQ> [<ffffffff8025a273>] warn_slowpath+0xca/0x115
> > [ 7.909137] [<ffffffff802bf4fe>] ? __rb_reserve_next+0x2a5/0x40c
> > [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> > [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> > [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> > [ 7.909137] [<ffffffff802bf2c6>] ? __rb_reserve_next+0x6d/0x40c
> > [ 7.909137] [<ffffffff802cb54e>] ? time_hardirqs_off+0x31/0x47
> > [ 7.909137] [<ffffffff802bf389>] __rb_reserve_next+0x130/0x40c
> > [ 7.909137] [<ffffffff802c00ad>] rb_reserve_next_event+0x1d3/0x31a
> > [ 7.909137] [<ffffffff802c04c9>] ring_buffer_lock_reserve+0x16b/0x1cb
> > [ 7.909137] [<ffffffff802c4f4b>] trace_buffer_lock_reserve+0x34/0x7e
> > [ 7.909137] [<ffffffff802c542c>] trace_vbprintk+0x136/0x275
> > [ 7.909137] [<ffffffff8028534f>] ? trace_hardirqs_on_caller+0x132/0x18e
> > [ 7.909137] [<ffffffff80286cfb>] ? lock_acquire+0x74/0x15b
> > [ 7.909137] [<ffffffff802ca54b>] __trace_bprintk+0x80/0x9d
> > [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> > [ 7.909137] [<ffffffff802ca54b>] ? __trace_bprintk+0x80/0x9d
> > [ 7.909137] [<ffffffff80283871>] ? put_lock_stats+0x44/0x5a
> > [ 7.909137] [<ffffffff802c5d68>] ? tracing_record_cmdline+0x3d/0x141
> > [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> > [ 7.909137] [<ffffffff8028259c>] ftrace_event_lock_acquire+0x6a/0x89
> > [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> > [ 7.909137] [<ffffffff80286cfb>] lock_acquire+0x74/0x15b
> > [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> > [ 7.909137] [<ffffffff802668c1>] ? process_timeout+0x0/0x37
> > [ 7.909137] [<ffffffff80a246af>] _spin_lock_irq+0x5c/0xa5
> > [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> > [ 7.909137] [<ffffffff80266273>] run_timer_softirq+0x226/0x281
> > [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> > [ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
> > [ 7.909137] [<ffffffff80260bd3>] __do_softirq+0xff/0x22c
> > [ 7.909137] [<ffffffff802149bc>] call_softirq+0x1c/0x30
> > [ 7.909137] [<ffffffff802163df>] do_softirq+0x5f/0xdb
> > [ 7.909137] [<ffffffff802601ac>] irq_exit+0x5d/0x7d
> > [ 7.909137] [<ffffffff80a2a076>] do_IRQ+0xbe/0xeb
> > [ 7.909137] [<ffffffff80214153>] ret_from_intr+0x0/0x16
> > [ 7.909137] <EOI> [<ffffffff802bf2a9>] ? __rb_reserve_next+0x50/0x40c
> > [ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
> > [ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
> > [ 7.909137] [<ffffffff802c04c9>] ? ring_buffer_lock_reserve+0x16b/0x1cb
> > [ 7.909137] [<ffffffff802c4f4b>] ? trace_buffer_lock_reserve+0x34/0x7e
> > [ 7.909137] [<ffffffff8021b722>] ? test_ti_thread_flag+0x8/0x38
> > [ 7.909137] [<ffffffff802c5d15>] ? trace_current_buffer_lock_reserve+0x36/0x4c
> > [ 7.909137] [<ffffffff802d3227>] ? function_test_events_call+0x8f/0x12e
> > [ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
> > [ 7.909137] [<ffffffff802d3251>] ? function_test_events_call+0xb9/0x12e
> > [ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
> > [ 7.909137] [<ffffffff8021b727>] ? test_ti_thread_flag+0xd/0x38
> > [ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
> > [ 7.909137] [<ffffffff8021b945>] ? poll_idle+0x41/0x64
> > [ 7.909137] [<ffffffff802cb871>] ? stop_critical_timings+0x3d/0x54
> > [ 7.909137] [<ffffffff802127b2>] ? cpu_idle+0xb1/0x112
> > [ 7.909137] [<ffffffff809e6369>] ? rest_init+0x8d/0xa3
> > [ 7.909137] [<ffffffff8108709a>] ? start_kernel+0x49c/0x4bd
> > [ 7.909137] [<ffffffff81086140>] ? early_idt_handler+0x0/0x71
> > [ 7.909137] [<ffffffff810862ce>] ? x86_64_start_reservations+0xb9/0xd4
> > [ 7.909137] [<ffffffff81086000>] ? _sinittext+0x0/0x140
> > [ 7.909137] [<ffffffff81086441>] ? x86_64_start_kernel+0x158/0x17b
> > [ 7.909137] ---[ end trace 3b85485e39bcb7cf ]---
> > [ 7.912099] OK
> > [ 7.913421] initcall event_trace_self_tests_init+0x0/0x5f returned 0 after 233398 usecs
> > [ 7.914172] calling random32_reseed+0x0/0xce @ 1
> > [ 7.915159] initcall random32_reseed+0x0/0xce returned 0 after 0 usecs
> > [ 7.916145] calling pci_resource_alignment_sysfs_init+0x0/0x40 @ 1
> >
>
> I spent the entire day (and half the night) debugging this. I was fighting
> a case where the hardirqs_enabled flag in the task struct (lockdep flag)
> was mysteriously being set and cleared. I stepped through the entire
> kernel thread fork process (that was an exercise) and could not find
> anything wrong.
>
> Sometimes it would go away with printk's sometimes it would not. This was
> driving me crazy, until I noticed that paravirt was enabled.
>
> Turning off paravirtualization here (so far) makes everything run
> smoothly.
>
> Thus my theory is that there's something fishy with the modifying of the
> irq enable/disable code when the system detects that it is running on bare
> hardware.
>
> I'm too tired to look at this more. Ingo supplied a config to play with.
> You can disable VSMP too and it will still trigger the crash.
>
> -- Steve
>

It's indeed a tricky one. I can reproduce it too, I will
try to manage having an irqsoff trace at this point, hopefully I
could get the source of this irq disabling...

2009-04-22 13:49:26

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes




On Wed, 22 Apr 2009, Frederic Weisbecker wrote:
> >
> > I spent the entire day (and half the night) debugging this. I was fighting
> > a case where the hardirqs_enabled flag in the task struct (lockdep flag)
> > was mysteriously being set and cleared. I stepped through the entire
> > kernel thread fork process (that was an exercise) and could not find
> > anything wrong.
> >
> > Sometimes it would go away with printk's sometimes it would not. This was
> > driving me crazy, until I noticed that paravirt was enabled.
> >
> > Turning off paravirtualization here (so far) makes everything run
> > smoothly.
> >
> > Thus my theory is that there's something fishy with the modifying of the
> > irq enable/disable code when the system detects that it is running on bare
> > hardware.
> >
> > I'm too tired to look at this more. Ingo supplied a config to play with.
> > You can disable VSMP too and it will still trigger the crash.
> >
> > -- Steve
> >
>
> It's indeed a tricky one. I can reproduce it too, I will
> try to manage having an irqsoff trace at this point, hopefully I
> could get the source of this irq disabling...

It doesn't disable interrupts :-/

It is the hardirqs_enabled flag in the task struct that mysteriously turns
off and back on. I put in printks when it is off in fork, and the next
printk shows that it turns back on (between the printks!!!).

I printed the output of "irqs_disabled()" on each of these printks and
interrupts are always enabled. It is only the hardirqs_enabled flag that
is giving strange outputs.

Do you have CONFIG_PARAVIRT on? When I disabled it, I have yet to
reproduce the bug. But I've only rebooted a few times. I'm going to
continue to reboot to see if I can trigger it.

I'm thinking that the paravirt alternative code may have clobbered a
register in either the enable or disabling of interrupts. This might cause
a strange value to go into the hardirqs_enabled flag.

Thanks,

-- Steve

2009-04-22 17:11:12

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes

On Wed, Apr 22, 2009 at 09:49:14AM -0400, Steven Rostedt wrote:
>
>
>
> On Wed, 22 Apr 2009, Frederic Weisbecker wrote:
> > >
> > > I spent the entire day (and half the night) debugging this. I was fighting
> > > a case where the hardirqs_enabled flag in the task struct (lockdep flag)
> > > was mysteriously being set and cleared. I stepped through the entire
> > > kernel thread fork process (that was an exercise) and could not find
> > > anything wrong.
> > >
> > > Sometimes it would go away with printk's sometimes it would not. This was
> > > driving me crazy, until I noticed that paravirt was enabled.
> > >
> > > Turning off paravirtualization here (so far) makes everything run
> > > smoothly.
> > >
> > > Thus my theory is that there's something fishy with the modifying of the
> > > irq enable/disable code when the system detects that it is running on bare
> > > hardware.
> > >
> > > I'm too tired to look at this more. Ingo supplied a config to play with.
> > > You can disable VSMP too and it will still trigger the crash.
> > >
> > > -- Steve
> > >
> >
> > It's indeed a tricky one. I can reproduce it too, I will
> > try to manage having an irqsoff trace at this point, hopefully I
> > could get the source of this irq disabling...
>
> It doesn't disable interrupts :-/
>
> It is the hardirqs_enabled flag in the task struct that mysteriously turns
> off and back on. I put in printks when it is off in fork, and the next
> printk shows that it turns back on (between the printks!!!).
>
> I printed the output of "irqs_disabled()" on each of these printks and
> interrupts are always enabled. It is only the hardirqs_enabled flag that
> is giving strange outputs.


Oh, weird...


> Do you have CONFIG_PARAVIRT on? When I disabled it, I have yet to
> reproduce the bug. But I've only rebooted a few times. I'm going to
> continue to reboot to see if I can trigger it.


Yes it is enabled.



> I'm thinking that the paravirt alternative code may have clobbered a
> register in either the enable or disabling of interrupts. This might cause
> a strange value to go into the hardirqs_enabled flag.



Ok I will try it without PARAVIRT and tell you if I can reproduce it.



> Thanks,
>
> -- Steve
>

2009-04-22 17:17:58

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes

Frederic Weisbecker wrote:
> On Wed, Apr 22, 2009 at 09:49:14AM -0400, Steven Rostedt wrote:
>
>>
>> On Wed, 22 Apr 2009, Frederic Weisbecker wrote:
>>
>>>> I spent the entire day (and half the night) debugging this. I was fighting
>>>> a case where the hardirqs_enabled flag in the task struct (lockdep flag)
>>>> was mysteriously being set and cleared. I stepped through the entire
>>>> kernel thread fork process (that was an exercise) and could not find
>>>> anything wrong.
>>>>
>>>> Sometimes it would go away with printk's sometimes it would not. This was
>>>> driving me crazy, until I noticed that paravirt was enabled.
>>>>
>>>> Turning off paravirtualization here (so far) makes everything run
>>>> smoothly.
>>>>
>>>> Thus my theory is that there's something fishy with the modifying of the
>>>> irq enable/disable code when the system detects that it is running on bare
>>>> hardware.
>>>>
>>>> I'm too tired to look at this more. Ingo supplied a config to play with.
>>>> You can disable VSMP too and it will still trigger the crash.
>>>>
>>>> -- Steve
>>>>
>>>>
>>> It's indeed a tricky one. I can reproduce it too, I will
>>> try to manage having an irqsoff trace at this point, hopefully I
>>> could get the source of this irq disabling...
>>>
>> It doesn't disable interrupts :-/
>>
>> It is the hardirqs_enabled flag in the task struct that mysteriously turns
>> off and back on. I put in printks when it is off in fork, and the next
>> printk shows that it turns back on (between the printks!!!).
>>
>> I printed the output of "irqs_disabled()" on each of these printks and
>> interrupts are always enabled. It is only the hardirqs_enabled flag that
>> is giving strange outputs.
>>
>
>
> Oh, weird...
>
>
>
>> Do you have CONFIG_PARAVIRT on? When I disabled it, I have yet to
>> reproduce the bug. But I've only rebooted a few times. I'm going to
>> continue to reboot to see if I can trigger it.
>>
>
>
> Yes it is enabled.
>
>
>
>
>> I'm thinking that the paravirt alternative code may have clobbered a
>> register in either the enable or disabling of interrupts. This might cause
>> a strange value to go into the hardirqs_enabled flag.
>>
>
>
>
> Ok I will try it without PARAVIRT and tell you if I can reproduce it.
>

Interesting. What code is generated for native_irq_enable/disable?

J

2009-04-22 17:22:47

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


On Wed, 22 Apr 2009, Jeremy Fitzhardinge wrote:

> >
> >
> > > Do you have CONFIG_PARAVIRT on? When I disabled it, I have yet to
> > > reproduce the bug. But I've only rebooted a few times. I'm going to
> > > continue to reboot to see if I can trigger it.
> > >
> >
> >
> > Yes it is enabled.
> >
> >
> >
> >
> > > I'm thinking that the paravirt alternative code may have clobbered a
> > > register in either the enable or disabling of interrupts. This might cause
> > > a strange value to go into the hardirqs_enabled flag.
> > >
> >
> >
> >
> > Ok I will try it without PARAVIRT and tell you if I can reproduce it.
> >
>
> Interesting. What code is generated for native_irq_enable/disable?

Note, when I take my standard config and enable PARAVIRT and LOCKDEP it
still boots up fine. It may be a combination of some configs :-/

I'm currently doing a "config bisect" to find out what configs are needed.

-- Steve

2009-04-22 21:32:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


On Wed, 22 Apr 2009, Steven Rostedt wrote:

>
> On Wed, 22 Apr 2009, Jeremy Fitzhardinge wrote:
>
> > >
> > >
> > > > Do you have CONFIG_PARAVIRT on? When I disabled it, I have yet to
> > > > reproduce the bug. But I've only rebooted a few times. I'm going to
> > > > continue to reboot to see if I can trigger it.
> > > >
> > >
> > >
> > > Yes it is enabled.
> > >
> > >
> > >
> > >
> > > > I'm thinking that the paravirt alternative code may have clobbered a
> > > > register in either the enable or disabling of interrupts. This might cause
> > > > a strange value to go into the hardirqs_enabled flag.
> > > >
> > >
> > >
> > >
> > > Ok I will try it without PARAVIRT and tell you if I can reproduce it.
> > >
> >
> > Interesting. What code is generated for native_irq_enable/disable?
>
> Note, when I take my standard config and enable PARAVIRT and LOCKDEP it
> still boots up fine. It may be a combination of some configs :-/
>
> I'm currently doing a "config bisect" to find out what configs are needed.

I just finished the config bisect, and it came down to
CONFIG_DEBUG_PAGEALLOC (set is bad, not set is good). I'm sure this is not
the only config that is at issue. It is probably a combination of configs
:-/

I'll try others to see if things change.

-- Steve

2009-04-23 03:35:27

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes



On Wed, 22 Apr 2009, Steven Rostedt wrote:

>
>
> On Tue, 21 Apr 2009, Ingo Molnar wrote:
>
> >
> > * Ingo Molnar <[email protected]> wrote:
> >
> > > Pulled, thanks guys! I've reactivated the self-tests in tip:master
> > > - lets see how it goes today.
> >
> > i'm still getting a ring-buffer warning - see below. Config
> > attached.
> >
> > Ingo
> >
> > [ 7.671145] calling clear_boot_tracer+0x0/0x52 @ 1
> > [ 7.672145] initcall clear_boot_tracer+0x0/0x52 returned 0 after 0 usecs
> > [ 7.673145] calling event_trace_self_tests_init+0x0/0x5f @ 1
> > [ 7.674206] Running tests on trace events:
> > [ 7.675143] Testing event kfree_skb: OK
> > [ 7.680145] Testing event kmalloc: OK
> > [ 7.683819] Testing event kmem_cache_alloc: OK
> > [ 7.686755] Testing event kmalloc_node: OK
> > [ 7.689744] Testing event kmem_cache_alloc_node: OK
> > [ 7.692711] Testing event kfree: OK
> > [ 7.695712] Testing event kmem_cache_free: OK
> > [ 7.698744] Testing event irq_handler_exit: OK
> > [ 7.701724] Testing event irq_handler_entry: OK
> > [ 7.704710] Testing event softirq_entry: OK
> > [ 7.707743] Testing event softirq_exit: OK
> > [ 7.710768] Testing event lock_acquired: OK
> > [ 7.713738] Testing event lock_acquire: OK
> > [ 7.716782] Testing event lock_release: OK
> > [ 7.719771] Testing event lock_contended: OK
> > [ 7.722740] Testing event sched_kthread_stop: OK
> > [ 7.726741] Testing event sched_kthread_stop_ret: OK
> > [ 7.729710] Testing event sched_wait_task: OK
> > [ 7.732706] Testing event sched_wakeup: OK
> > [ 7.735713] Testing event sched_wakeup_new: OK
> > [ 7.738727] Testing event sched_switch: OK
> > [ 7.741724] Testing event sched_migrate_task: OK
> > [ 7.745711] Testing event sched_process_free: OK
> > [ 7.749704] Testing event sched_process_exit: OK
> > [ 7.753699] Testing event sched_process_wait: OK
> > [ 7.757692] Testing event sched_process_fork: OK
> > [ 7.761711] Testing event sched_signal_send: OK
> > [ 7.764709] Running tests on trace event systems:
> > [ 7.765144] Testing event system skb: OK
> > [ 7.768776] Testing event system kmem: OK
> > [ 7.771817] Testing event system irq: OK
> > [ 7.774778] Testing event system lockdep: OK
> > [ 7.777921] Testing event system sched: OK
> > [ 7.780902] Running tests on all trace events:
> > [ 7.781148] Testing all events: OK
> > [ 7.787765] Running tests again, along with the function tracer
> > [ 7.788156] Running tests on trace events:
> > [ 7.789154] Testing event kfree_skb: OK
> > [ 7.792858] Testing event kmalloc: OK
> > [ 7.795853] Testing event kmem_cache_alloc: OK
> > [ 7.799848] Testing event kmalloc_node: OK
> > [ 7.802966] Testing event kmem_cache_alloc_node: OK
> > [ 7.805970] Testing event kfree: OK
> > [ 7.808942] Testing event kmem_cache_free: OK
> > [ 7.812935] Testing event irq_handler_exit: OK
> > [ 7.815850] Testing event irq_handler_entry: OK
> > [ 7.818877] Testing event softirq_entry: OK
> > [ 7.822846] Testing event softirq_exit: OK
> > [ 7.825895] Testing event lock_acquired: OK
> > [ 7.830045] Testing event lock_acquire: OK
> > [ 7.834025] Testing event lock_release: OK
> > [ 7.837920] Testing event lock_contended: OK
> > [ 7.841854] Testing event sched_kthread_stop: OK
> > [ 7.844883] Testing event sched_kthread_stop_ret: OK
> > [ 7.847850] Testing event sched_wait_task: OK
> > [ 7.851938] Testing event sched_wakeup: OK
> > [ 7.855960] Testing event sched_wakeup_new: OK
> > [ 7.858884] Testing event sched_switch: OK
> > [ 7.861941] Testing event sched_migrate_task: OK
> > [ 7.864972] Testing event sched_process_free: OK
> > [ 7.867853] Testing event sched_process_exit: OK
> > [ 7.870890] Testing event sched_process_wait: OK
> > [ 7.873856] Testing event sched_process_fork: OK
> > [ 7.876930] Testing event sched_signal_send: OK
> > [ 7.879853] Running tests on trace event systems:
> > [ 7.880154] Testing event system skb: OK
> > [ 7.883940] Testing event system kmem: OK
> > [ 7.888105] Testing event system irq: OK
> > [ 7.892085] Testing event system lockdep: OK
> > [ 7.896154] Testing event system sched: OK
> > [ 7.901156] Running tests on all trace events:
> > [ 7.903170] Testing all events: <4>------------[ cut here ]------------
> > [ 7.909137] WARNING: at kernel/trace/ring_buffer.c:1245 __rb_reserve_next+0x130/0x40c()
> > [ 7.909137] Modules linked in:
> > [ 7.909137] Pid: 0, comm: swapper Not tainted 2.6.30-rc2-tip #38087
> > [ 7.909137] Call Trace:
> > [ 7.909137] <IRQ> [<ffffffff8025a273>] warn_slowpath+0xca/0x115
> > [ 7.909137] [<ffffffff802bf4fe>] ? __rb_reserve_next+0x2a5/0x40c
> > [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> > [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> > [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> > [ 7.909137] [<ffffffff802bf2c6>] ? __rb_reserve_next+0x6d/0x40c
> > [ 7.909137] [<ffffffff802cb54e>] ? time_hardirqs_off+0x31/0x47
> > [ 7.909137] [<ffffffff802bf389>] __rb_reserve_next+0x130/0x40c
> > [ 7.909137] [<ffffffff802c00ad>] rb_reserve_next_event+0x1d3/0x31a
> > [ 7.909137] [<ffffffff802c04c9>] ring_buffer_lock_reserve+0x16b/0x1cb
> > [ 7.909137] [<ffffffff802c4f4b>] trace_buffer_lock_reserve+0x34/0x7e
> > [ 7.909137] [<ffffffff802c542c>] trace_vbprintk+0x136/0x275
> > [ 7.909137] [<ffffffff8028534f>] ? trace_hardirqs_on_caller+0x132/0x18e
> > [ 7.909137] [<ffffffff80286cfb>] ? lock_acquire+0x74/0x15b
> > [ 7.909137] [<ffffffff802ca54b>] __trace_bprintk+0x80/0x9d
> > [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> > [ 7.909137] [<ffffffff802ca54b>] ? __trace_bprintk+0x80/0x9d
> > [ 7.909137] [<ffffffff80283871>] ? put_lock_stats+0x44/0x5a
> > [ 7.909137] [<ffffffff802c5d68>] ? tracing_record_cmdline+0x3d/0x141
> > [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> > [ 7.909137] [<ffffffff8028259c>] ftrace_event_lock_acquire+0x6a/0x89
> > [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> > [ 7.909137] [<ffffffff80286cfb>] lock_acquire+0x74/0x15b
> > [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> > [ 7.909137] [<ffffffff802668c1>] ? process_timeout+0x0/0x37
> > [ 7.909137] [<ffffffff80a246af>] _spin_lock_irq+0x5c/0xa5
> > [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> > [ 7.909137] [<ffffffff80266273>] run_timer_softirq+0x226/0x281
> > [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> > [ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
> > [ 7.909137] [<ffffffff80260bd3>] __do_softirq+0xff/0x22c
> > [ 7.909137] [<ffffffff802149bc>] call_softirq+0x1c/0x30
> > [ 7.909137] [<ffffffff802163df>] do_softirq+0x5f/0xdb
> > [ 7.909137] [<ffffffff802601ac>] irq_exit+0x5d/0x7d
> > [ 7.909137] [<ffffffff80a2a076>] do_IRQ+0xbe/0xeb
> > [ 7.909137] [<ffffffff80214153>] ret_from_intr+0x0/0x16
> > [ 7.909137] <EOI> [<ffffffff802bf2a9>] ? __rb_reserve_next+0x50/0x40c
> > [ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
> > [ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
> > [ 7.909137] [<ffffffff802c04c9>] ? ring_buffer_lock_reserve+0x16b/0x1cb
> > [ 7.909137] [<ffffffff802c4f4b>] ? trace_buffer_lock_reserve+0x34/0x7e
> > [ 7.909137] [<ffffffff8021b722>] ? test_ti_thread_flag+0x8/0x38
> > [ 7.909137] [<ffffffff802c5d15>] ? trace_current_buffer_lock_reserve+0x36/0x4c
> > [ 7.909137] [<ffffffff802d3227>] ? function_test_events_call+0x8f/0x12e
> > [ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
> > [ 7.909137] [<ffffffff802d3251>] ? function_test_events_call+0xb9/0x12e
> > [ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
> > [ 7.909137] [<ffffffff8021b727>] ? test_ti_thread_flag+0xd/0x38
> > [ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
> > [ 7.909137] [<ffffffff8021b945>] ? poll_idle+0x41/0x64
> > [ 7.909137] [<ffffffff802cb871>] ? stop_critical_timings+0x3d/0x54
> > [ 7.909137] [<ffffffff802127b2>] ? cpu_idle+0xb1/0x112
> > [ 7.909137] [<ffffffff809e6369>] ? rest_init+0x8d/0xa3
> > [ 7.909137] [<ffffffff8108709a>] ? start_kernel+0x49c/0x4bd
> > [ 7.909137] [<ffffffff81086140>] ? early_idt_handler+0x0/0x71
> > [ 7.909137] [<ffffffff810862ce>] ? x86_64_start_reservations+0xb9/0xd4
> > [ 7.909137] [<ffffffff81086000>] ? _sinittext+0x0/0x140
> > [ 7.909137] [<ffffffff81086441>] ? x86_64_start_kernel+0x158/0x17b
> > [ 7.909137] ---[ end trace 3b85485e39bcb7cf ]---
> > [ 7.912099] OK
> > [ 7.913421] initcall event_trace_self_tests_init+0x0/0x5f returned 0 after 233398 usecs
> > [ 7.914172] calling random32_reseed+0x0/0xce @ 1
> > [ 7.915159] initcall random32_reseed+0x0/0xce returned 0 after 0 usecs
> > [ 7.916145] calling pci_resource_alignment_sysfs_init+0x0/0x40 @ 1
> >
>
> I spent the entire day (and half the night) debugging this. I was fighting
> a case where the hardirqs_enabled flag in the task struct (lockdep flag)
> was mysteriously being set and cleared. I stepped through the entire
> kernel thread fork process (that was an exercise) and could not find
> anything wrong.
>
> Sometimes it would go away with printk's sometimes it would not. This was
> driving me crazy, until I noticed that paravirt was enabled.
>
> Turning off paravirtualization here (so far) makes everything run
> smoothly.
>
> Thus my theory is that there's something fishy with the modifying of the
> irq enable/disable code when the system detects that it is running on bare
> hardware.
>
> I'm too tired to look at this more. Ingo supplied a config to play with.
> You can disable VSMP too and it will still trigger the crash.
>

OK, after spending two full days doing a config bisect to find out what is
happening, I've found that CONFIG_PARAVIRT and CONFIG_DEBUG_PAGEALLOC do
not play nicely. But I have yet to find out why.

Here's the situation:

We've added selftests for the event tracer. What this basically does is
enables each event one at a time and runs tests. The tests include
creating a kernel thread, executing workqueues and grabbing locks.

We also have PROVE_LOCKING (LOCKDEP) enabled, that keeps track of
interrupts being enabled. When they are, we set a flag in the task struct
"hardirqs_enabled". When they are disabled, this flag is cleared.

When we fork a process, a test is made to see if the flag is set for the
new process and if it is not, a warning is printed (as is done in the
above dump).

I investigated this and found that the flag is mysteriously being set and
cleared for no apparent reason. The flag is in the task struct and nothing
should be touching it. In fact, it is a full int, not even a bit in a
flags variable.

I had a test that would print the flag and irqs_disabled() in copy_process
before the warning and it showed that the flag was cleared but irqs was
enabled. The funny part is, I if the test triggered, I printed the flag
again, and the second print it was set again!!

if (!p->hardirqs_enabled) {
printk("irqs:%d flag:%d\n", irqs_disabled(),
p->hardirqs_enabled);
printk("try again: %d\n", p->hardirqs_enabled);
}

The first print showed that it was cleared, the second showed it was set
again??

To trigger this I need these config options set:

CONFIG_EVENT_TRACING
CONFIG_FUNCTION_TRACER
CONFIG_PROVE_LOCKING
CONFIG_PARAVIRT
CONFIG_DEBUG_PAGEALLOC

CONFIG_EVENT_TRACING is needed because it does the tests.

CONFIG_PROVE_LOCKING is needed because it is the trigger to the WARN_ON
(the test for hardirqs_enabled being cleared)

CONFIG_FUNCTION_TRACER may be part of the problem, or it just changes the
way things work to cause it.

CONFIG_PARAVIRT and CONFIG_DEBUG_PAGEALLOC, disable either of them, the
problem goes away :-/

Also note, when I did the printks to find where the hardirqs_enabled went
zero, it would consistantly happen after this call:

ti = alloc_thread_info(tsk);

I'll see if I can find exactly where it happens :-/

-- Steve

2009-04-23 08:22:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


* Steven Rostedt <[email protected]> wrote:

> On Wed, 22 Apr 2009, Steven Rostedt wrote:
>
> >
> >
> > On Tue, 21 Apr 2009, Ingo Molnar wrote:
> >
> > >
> > > * Ingo Molnar <[email protected]> wrote:
> > >
> > > > Pulled, thanks guys! I've reactivated the self-tests in tip:master
> > > > - lets see how it goes today.
> > >
> > > i'm still getting a ring-buffer warning - see below. Config
> > > attached.
> > >
> > > Ingo
> > >
> > > [ 7.671145] calling clear_boot_tracer+0x0/0x52 @ 1
> > > [ 7.672145] initcall clear_boot_tracer+0x0/0x52 returned 0 after 0 usecs
> > > [ 7.673145] calling event_trace_self_tests_init+0x0/0x5f @ 1
> > > [ 7.674206] Running tests on trace events:
> > > [ 7.675143] Testing event kfree_skb: OK
> > > [ 7.680145] Testing event kmalloc: OK
> > > [ 7.683819] Testing event kmem_cache_alloc: OK
> > > [ 7.686755] Testing event kmalloc_node: OK
> > > [ 7.689744] Testing event kmem_cache_alloc_node: OK
> > > [ 7.692711] Testing event kfree: OK
> > > [ 7.695712] Testing event kmem_cache_free: OK
> > > [ 7.698744] Testing event irq_handler_exit: OK
> > > [ 7.701724] Testing event irq_handler_entry: OK
> > > [ 7.704710] Testing event softirq_entry: OK
> > > [ 7.707743] Testing event softirq_exit: OK
> > > [ 7.710768] Testing event lock_acquired: OK
> > > [ 7.713738] Testing event lock_acquire: OK
> > > [ 7.716782] Testing event lock_release: OK
> > > [ 7.719771] Testing event lock_contended: OK
> > > [ 7.722740] Testing event sched_kthread_stop: OK
> > > [ 7.726741] Testing event sched_kthread_stop_ret: OK
> > > [ 7.729710] Testing event sched_wait_task: OK
> > > [ 7.732706] Testing event sched_wakeup: OK
> > > [ 7.735713] Testing event sched_wakeup_new: OK
> > > [ 7.738727] Testing event sched_switch: OK
> > > [ 7.741724] Testing event sched_migrate_task: OK
> > > [ 7.745711] Testing event sched_process_free: OK
> > > [ 7.749704] Testing event sched_process_exit: OK
> > > [ 7.753699] Testing event sched_process_wait: OK
> > > [ 7.757692] Testing event sched_process_fork: OK
> > > [ 7.761711] Testing event sched_signal_send: OK
> > > [ 7.764709] Running tests on trace event systems:
> > > [ 7.765144] Testing event system skb: OK
> > > [ 7.768776] Testing event system kmem: OK
> > > [ 7.771817] Testing event system irq: OK
> > > [ 7.774778] Testing event system lockdep: OK
> > > [ 7.777921] Testing event system sched: OK
> > > [ 7.780902] Running tests on all trace events:
> > > [ 7.781148] Testing all events: OK
> > > [ 7.787765] Running tests again, along with the function tracer
> > > [ 7.788156] Running tests on trace events:
> > > [ 7.789154] Testing event kfree_skb: OK
> > > [ 7.792858] Testing event kmalloc: OK
> > > [ 7.795853] Testing event kmem_cache_alloc: OK
> > > [ 7.799848] Testing event kmalloc_node: OK
> > > [ 7.802966] Testing event kmem_cache_alloc_node: OK
> > > [ 7.805970] Testing event kfree: OK
> > > [ 7.808942] Testing event kmem_cache_free: OK
> > > [ 7.812935] Testing event irq_handler_exit: OK
> > > [ 7.815850] Testing event irq_handler_entry: OK
> > > [ 7.818877] Testing event softirq_entry: OK
> > > [ 7.822846] Testing event softirq_exit: OK
> > > [ 7.825895] Testing event lock_acquired: OK
> > > [ 7.830045] Testing event lock_acquire: OK
> > > [ 7.834025] Testing event lock_release: OK
> > > [ 7.837920] Testing event lock_contended: OK
> > > [ 7.841854] Testing event sched_kthread_stop: OK
> > > [ 7.844883] Testing event sched_kthread_stop_ret: OK
> > > [ 7.847850] Testing event sched_wait_task: OK
> > > [ 7.851938] Testing event sched_wakeup: OK
> > > [ 7.855960] Testing event sched_wakeup_new: OK
> > > [ 7.858884] Testing event sched_switch: OK
> > > [ 7.861941] Testing event sched_migrate_task: OK
> > > [ 7.864972] Testing event sched_process_free: OK
> > > [ 7.867853] Testing event sched_process_exit: OK
> > > [ 7.870890] Testing event sched_process_wait: OK
> > > [ 7.873856] Testing event sched_process_fork: OK
> > > [ 7.876930] Testing event sched_signal_send: OK
> > > [ 7.879853] Running tests on trace event systems:
> > > [ 7.880154] Testing event system skb: OK
> > > [ 7.883940] Testing event system kmem: OK
> > > [ 7.888105] Testing event system irq: OK
> > > [ 7.892085] Testing event system lockdep: OK
> > > [ 7.896154] Testing event system sched: OK
> > > [ 7.901156] Running tests on all trace events:
> > > [ 7.903170] Testing all events: <4>------------[ cut here ]------------
> > > [ 7.909137] WARNING: at kernel/trace/ring_buffer.c:1245 __rb_reserve_next+0x130/0x40c()
> > > [ 7.909137] Modules linked in:
> > > [ 7.909137] Pid: 0, comm: swapper Not tainted 2.6.30-rc2-tip #38087
> > > [ 7.909137] Call Trace:
> > > [ 7.909137] <IRQ> [<ffffffff8025a273>] warn_slowpath+0xca/0x115
> > > [ 7.909137] [<ffffffff802bf4fe>] ? __rb_reserve_next+0x2a5/0x40c
> > > [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> > > [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> > > [ 7.909137] [<ffffffff8021ab74>] ? native_sched_clock+0x4f/0x6a
> > > [ 7.909137] [<ffffffff802bf2c6>] ? __rb_reserve_next+0x6d/0x40c
> > > [ 7.909137] [<ffffffff802cb54e>] ? time_hardirqs_off+0x31/0x47
> > > [ 7.909137] [<ffffffff802bf389>] __rb_reserve_next+0x130/0x40c
> > > [ 7.909137] [<ffffffff802c00ad>] rb_reserve_next_event+0x1d3/0x31a
> > > [ 7.909137] [<ffffffff802c04c9>] ring_buffer_lock_reserve+0x16b/0x1cb
> > > [ 7.909137] [<ffffffff802c4f4b>] trace_buffer_lock_reserve+0x34/0x7e
> > > [ 7.909137] [<ffffffff802c542c>] trace_vbprintk+0x136/0x275
> > > [ 7.909137] [<ffffffff8028534f>] ? trace_hardirqs_on_caller+0x132/0x18e
> > > [ 7.909137] [<ffffffff80286cfb>] ? lock_acquire+0x74/0x15b
> > > [ 7.909137] [<ffffffff802ca54b>] __trace_bprintk+0x80/0x9d
> > > [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> > > [ 7.909137] [<ffffffff802ca54b>] ? __trace_bprintk+0x80/0x9d
> > > [ 7.909137] [<ffffffff80283871>] ? put_lock_stats+0x44/0x5a
> > > [ 7.909137] [<ffffffff802c5d68>] ? tracing_record_cmdline+0x3d/0x141
> > > [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> > > [ 7.909137] [<ffffffff8028259c>] ftrace_event_lock_acquire+0x6a/0x89
> > > [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> > > [ 7.909137] [<ffffffff80286cfb>] lock_acquire+0x74/0x15b
> > > [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> > > [ 7.909137] [<ffffffff802668c1>] ? process_timeout+0x0/0x37
> > > [ 7.909137] [<ffffffff80a246af>] _spin_lock_irq+0x5c/0xa5
> > > [ 7.909137] [<ffffffff80266273>] ? run_timer_softirq+0x226/0x281
> > > [ 7.909137] [<ffffffff80266273>] run_timer_softirq+0x226/0x281
> > > [ 7.909137] [<ffffffff802661bc>] ? run_timer_softirq+0x16f/0x281
> > > [ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
> > > [ 7.909137] [<ffffffff80260bd3>] __do_softirq+0xff/0x22c
> > > [ 7.909137] [<ffffffff802149bc>] call_softirq+0x1c/0x30
> > > [ 7.909137] [<ffffffff802163df>] do_softirq+0x5f/0xdb
> > > [ 7.909137] [<ffffffff802601ac>] irq_exit+0x5d/0x7d
> > > [ 7.909137] [<ffffffff80a2a076>] do_IRQ+0xbe/0xeb
> > > [ 7.909137] [<ffffffff80214153>] ret_from_intr+0x0/0x16
> > > [ 7.909137] <EOI> [<ffffffff802bf2a9>] ? __rb_reserve_next+0x50/0x40c
> > > [ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
> > > [ 7.909137] [<ffffffff802c00ad>] ? rb_reserve_next_event+0x1d3/0x31a
> > > [ 7.909137] [<ffffffff802c04c9>] ? ring_buffer_lock_reserve+0x16b/0x1cb
> > > [ 7.909137] [<ffffffff802c4f4b>] ? trace_buffer_lock_reserve+0x34/0x7e
> > > [ 7.909137] [<ffffffff8021b722>] ? test_ti_thread_flag+0x8/0x38
> > > [ 7.909137] [<ffffffff802c5d15>] ? trace_current_buffer_lock_reserve+0x36/0x4c
> > > [ 7.909137] [<ffffffff802d3227>] ? function_test_events_call+0x8f/0x12e
> > > [ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
> > > [ 7.909137] [<ffffffff802d3251>] ? function_test_events_call+0xb9/0x12e
> > > [ 7.909137] [<ffffffff802133d6>] ? ftrace_call+0x5/0x2b
> > > [ 7.909137] [<ffffffff8021b727>] ? test_ti_thread_flag+0xd/0x38
> > > [ 7.909137] [<ffffffff8021b841>] ? need_resched+0x36/0x54
> > > [ 7.909137] [<ffffffff8021b945>] ? poll_idle+0x41/0x64
> > > [ 7.909137] [<ffffffff802cb871>] ? stop_critical_timings+0x3d/0x54
> > > [ 7.909137] [<ffffffff802127b2>] ? cpu_idle+0xb1/0x112
> > > [ 7.909137] [<ffffffff809e6369>] ? rest_init+0x8d/0xa3
> > > [ 7.909137] [<ffffffff8108709a>] ? start_kernel+0x49c/0x4bd
> > > [ 7.909137] [<ffffffff81086140>] ? early_idt_handler+0x0/0x71
> > > [ 7.909137] [<ffffffff810862ce>] ? x86_64_start_reservations+0xb9/0xd4
> > > [ 7.909137] [<ffffffff81086000>] ? _sinittext+0x0/0x140
> > > [ 7.909137] [<ffffffff81086441>] ? x86_64_start_kernel+0x158/0x17b
> > > [ 7.909137] ---[ end trace 3b85485e39bcb7cf ]---
> > > [ 7.912099] OK
> > > [ 7.913421] initcall event_trace_self_tests_init+0x0/0x5f returned 0 after 233398 usecs
> > > [ 7.914172] calling random32_reseed+0x0/0xce @ 1
> > > [ 7.915159] initcall random32_reseed+0x0/0xce returned 0 after 0 usecs
> > > [ 7.916145] calling pci_resource_alignment_sysfs_init+0x0/0x40 @ 1
> > >
> >
> > I spent the entire day (and half the night) debugging this. I was fighting
> > a case where the hardirqs_enabled flag in the task struct (lockdep flag)
> > was mysteriously being set and cleared. I stepped through the entire
> > kernel thread fork process (that was an exercise) and could not find
> > anything wrong.
> >
> > Sometimes it would go away with printk's sometimes it would not. This was
> > driving me crazy, until I noticed that paravirt was enabled.
> >
> > Turning off paravirtualization here (so far) makes everything run
> > smoothly.
> >
> > Thus my theory is that there's something fishy with the modifying of the
> > irq enable/disable code when the system detects that it is running on bare
> > hardware.
> >
> > I'm too tired to look at this more. Ingo supplied a config to play with.
> > You can disable VSMP too and it will still trigger the crash.
> >
>
> OK, after spending two full days doing a config bisect to find out what is
> happening, I've found that CONFIG_PARAVIRT and CONFIG_DEBUG_PAGEALLOC do
> not play nicely. But I have yet to find out why.
>
> Here's the situation:
>
> We've added selftests for the event tracer. What this basically does is
> enables each event one at a time and runs tests. The tests include
> creating a kernel thread, executing workqueues and grabbing locks.
>
> We also have PROVE_LOCKING (LOCKDEP) enabled, that keeps track of
> interrupts being enabled. When they are, we set a flag in the task struct
> "hardirqs_enabled". When they are disabled, this flag is cleared.
>
> When we fork a process, a test is made to see if the flag is set for the
> new process and if it is not, a warning is printed (as is done in the
> above dump).
>
> I investigated this and found that the flag is mysteriously being set and
> cleared for no apparent reason. The flag is in the task struct and nothing
> should be touching it. In fact, it is a full int, not even a bit in a
> flags variable.
>
> I had a test that would print the flag and irqs_disabled() in copy_process
> before the warning and it showed that the flag was cleared but irqs was
> enabled. The funny part is, I if the test triggered, I printed the flag
> again, and the second print it was set again!!
>
> if (!p->hardirqs_enabled) {
> printk("irqs:%d flag:%d\n", irqs_disabled(),
> p->hardirqs_enabled);
> printk("try again: %d\n", p->hardirqs_enabled);
> }
>
> The first print showed that it was cleared, the second showed it was set
> again??

function tracer was active? So somewhere there we corrupted this
state? It's unlikely that printk itself did this.

Ingo

2009-04-23 08:29:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


* Steven Rostedt <[email protected]> wrote:

>
> On Wed, 22 Apr 2009, Steven Rostedt wrote:
>
> >
> > On Wed, 22 Apr 2009, Jeremy Fitzhardinge wrote:
> >
> > > >
> > > >
> > > > > Do you have CONFIG_PARAVIRT on? When I disabled it, I have yet to
> > > > > reproduce the bug. But I've only rebooted a few times. I'm going to
> > > > > continue to reboot to see if I can trigger it.
> > > > >
> > > >
> > > >
> > > > Yes it is enabled.
> > > >
> > > >
> > > >
> > > >
> > > > > I'm thinking that the paravirt alternative code may have clobbered a
> > > > > register in either the enable or disabling of interrupts. This might cause
> > > > > a strange value to go into the hardirqs_enabled flag.
> > > > >
> > > >
> > > >
> > > >
> > > > Ok I will try it without PARAVIRT and tell you if I can reproduce it.
> > > >
> > >
> > > Interesting. What code is generated for native_irq_enable/disable?
> >
> > Note, when I take my standard config and enable PARAVIRT and LOCKDEP it
> > still boots up fine. It may be a combination of some configs :-/
> >
> > I'm currently doing a "config bisect" to find out what configs are needed.
>
> I just finished the config bisect, and it came down to
> CONFIG_DEBUG_PAGEALLOC (set is bad, not set is good). I'm sure
> this is not the only config that is at issue. It is probably a
> combination of configs
> :-/
>
> I'll try others to see if things change.

It could also be the timing of an incoming IRQ to hit a particular
race window. PAGEALLOC might trigger the right kind of delays to
make it trigger.

Ingo

2009-04-23 13:54:10

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


On Thu, 23 Apr 2009, Ingo Molnar wrote:

> >
> > Here's the situation:
> >
> > We've added selftests for the event tracer. What this basically does is
> > enables each event one at a time and runs tests. The tests include
> > creating a kernel thread, executing workqueues and grabbing locks.
> >
> > We also have PROVE_LOCKING (LOCKDEP) enabled, that keeps track of
> > interrupts being enabled. When they are, we set a flag in the task struct
> > "hardirqs_enabled". When they are disabled, this flag is cleared.
> >
> > When we fork a process, a test is made to see if the flag is set for the
> > new process and if it is not, a warning is printed (as is done in the
> > above dump).
> >
> > I investigated this and found that the flag is mysteriously being set and
> > cleared for no apparent reason. The flag is in the task struct and nothing
> > should be touching it. In fact, it is a full int, not even a bit in a
> > flags variable.
> >
> > I had a test that would print the flag and irqs_disabled() in copy_process
> > before the warning and it showed that the flag was cleared but irqs was
> > enabled. The funny part is, I if the test triggered, I printed the flag
> > again, and the second print it was set again!!
> >
> > if (!p->hardirqs_enabled) {
> > printk("irqs:%d flag:%d\n", irqs_disabled(),
> > p->hardirqs_enabled);
> > printk("try again: %d\n", p->hardirqs_enabled);
> > }
> >
> > The first print showed that it was cleared, the second showed it was set
> > again??
>
> function tracer was active? So somewhere there we corrupted this
> state? It's unlikely that printk itself did this.

The problem always arises at the same spot. I'm not saying printk was the
culprit, I'm saying that printk actually "fixed" the issue. Which can be a
sign of a corrupted register somewhere.

I currently tracked it down to something in "prep_new_page". It is hard to
debug because as I add tests into the code, it makes the race window
smaller, and I need to run multiple boots to trigger the code. But when I
do trigger it (and I try to trigger it a few times) it always happens at
the same spot.

Note, everytime I catch the issue, the printk again "fixes" the problem
:-/

Since I need both DEBUG_PAGEALLOC and PARAVIRT on, I'm thinking that the
debug page alloc might be messing with the paravirt "modified" code (or
vice versa). It could also be the function tracer doing it too. But it
looks like the DEBUG_PAGEALLOC code is causing issues because the bug
always appears in the alloc code.

I'm not blaming any subsystem yet. It just seems that the combination of
these systems are broken.

-- Steve

2009-04-23 14:40:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


On Thu, 23 Apr 2009, Steven Rostedt wrote:

>
> On Thu, 23 Apr 2009, Ingo Molnar wrote:
>
> > >
> > > Here's the situation:
> > >
> > > We've added selftests for the event tracer. What this basically does is
> > > enables each event one at a time and runs tests. The tests include
> > > creating a kernel thread, executing workqueues and grabbing locks.
> > >
> > > We also have PROVE_LOCKING (LOCKDEP) enabled, that keeps track of
> > > interrupts being enabled. When they are, we set a flag in the task struct
> > > "hardirqs_enabled". When they are disabled, this flag is cleared.
> > >
> > > When we fork a process, a test is made to see if the flag is set for the
> > > new process and if it is not, a warning is printed (as is done in the
> > > above dump).
> > >
> > > I investigated this and found that the flag is mysteriously being set and
> > > cleared for no apparent reason. The flag is in the task struct and nothing
> > > should be touching it. In fact, it is a full int, not even a bit in a
> > > flags variable.
> > >
> > > I had a test that would print the flag and irqs_disabled() in copy_process
> > > before the warning and it showed that the flag was cleared but irqs was
> > > enabled. The funny part is, I if the test triggered, I printed the flag
> > > again, and the second print it was set again!!
> > >
> > > if (!p->hardirqs_enabled) {
> > > printk("irqs:%d flag:%d\n", irqs_disabled(),
> > > p->hardirqs_enabled);
> > > printk("try again: %d\n", p->hardirqs_enabled);
> > > }
> > >
> > > The first print showed that it was cleared, the second showed it was set
> > > again??
> >
> > function tracer was active? So somewhere there we corrupted this
> > state? It's unlikely that printk itself did this.
>
> The problem always arises at the same spot. I'm not saying printk was the
> culprit, I'm saying that printk actually "fixed" the issue. Which can be a
> sign of a corrupted register somewhere.
>
> I currently tracked it down to something in "prep_new_page". It is hard to
> debug because as I add tests into the code, it makes the race window
> smaller, and I need to run multiple boots to trigger the code. But when I
> do trigger it (and I try to trigger it a few times) it always happens at
> the same spot.
>
> Note, everytime I catch the issue, the printk again "fixes" the problem
> :-/
>
> Since I need both DEBUG_PAGEALLOC and PARAVIRT on, I'm thinking that the
> debug page alloc might be messing with the paravirt "modified" code (or
> vice versa). It could also be the function tracer doing it too. But it
> looks like the DEBUG_PAGEALLOC code is causing issues because the bug
> always appears in the alloc code.
>
> I'm not blaming any subsystem yet. It just seems that the combination of
> these systems are broken.

OK, now I need help :-p

I traced the corruption down to the __flush_tlb_all() in kernel_map_pages.
This is only called when DEBUG_PAGEALLOC is enabled, thus this explains
why DEBUG_PAGEALLOC must be set.

Now it may be an issue between the function tracer and paravirt.

The __flush_tlb_all is:

static inline void __flush_tlb_all(void)
{
if (cpu_has_pge)
__flush_tlb_global();
else
__flush_tlb();
}

Which, on my box calls __flush_tlb_global()

which is a paravirt call:

static void native_flush_tlb_global(void)
{
__native_flush_tlb_global();
}


Even though the alternative code makes this a direct call, it is still
traced by ftrace. Heck, even the calls in __native_flush_tlb_global call
into paravirt (read_cr4 and write_cr4).

static inline void __native_flush_tlb_global(void)
{
unsigned long flags;
unsigned long cr4;

/*
* Read-modify-write to CR4 - protect it from preemption and
* from interrupts. (Use the raw variant because this code can
* be called from deep inside debugging code.)
*/
raw_local_irq_save(flags);

cr4 = read_cr4();
/* clear PGE */
write_cr4(cr4 & ~X86_CR4_PGE);
/* write old PGE again and flush TLBs */
write_cr4(cr4);

raw_local_irq_restore(flags);
}


Each of those read_cr4 and write_cr4 can be traced. But I still do not see
how this can cause corruption in the current task struct.

The TLB just caches the pages we are using, not the contents inside. I
still do not see how this can corrupt a bit. Unless a register leaked :-/

-- Steve

2009-04-23 15:08:52

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


On Thu, 23 Apr 2009, Steven Rostedt wrote:
>
> static inline void __native_flush_tlb_global(void)
> {
> unsigned long flags;
> unsigned long cr4;
>
> /*
> * Read-modify-write to CR4 - protect it from preemption and
> * from interrupts. (Use the raw variant because this code can
> * be called from deep inside debugging code.)
> */
> raw_local_irq_save(flags);
>
> cr4 = read_cr4();
> /* clear PGE */
> write_cr4(cr4 & ~X86_CR4_PGE);
> /* write old PGE again and flush TLBs */
> write_cr4(cr4);
>
> raw_local_irq_restore(flags);
> }
>
>
> Each of those read_cr4 and write_cr4 can be traced. But I still do not see
> how this can cause corruption in the current task struct.
>
> The TLB just caches the pages we are using, not the contents inside. I
> still do not see how this can corrupt a bit. Unless a register leaked :-/

Ah, this is a lockdep thing.

The raw_local_irq_save/restore in __native_flush_tlb_global does not
update hardirqs_enabled.

When we call into ftrace, when we cross page bounderies, we disable
interrupts using the normal local_irq_save/restore calls.

But when we restore, it detects that interrupts are not going to be
enabled, and keeps hardirqs_enabled off.

The printk solved the issue because it called local_irq_restore, which set
the variable back.

I guess there's two solutions here. One, we can change the
raw_local_irq_enable/disable variants in __native_flush_tlb_global to the
non-raw type (it should protect against recursion).

or we can try to make the ring buffer use the raw_local_irq variants too.
I tried this once before, and it did cause issues.

Note, there's a "check_flags" in lockdep, but it is only called on
locking, it is not called when we only disable/enable interrupts.

-- Steve

2009-04-23 15:13:10

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


* Steven Rostedt <[email protected]> wrote:

> On Thu, 23 Apr 2009, Steven Rostedt wrote:
> >
> > static inline void __native_flush_tlb_global(void)
> > {
> > unsigned long flags;
> > unsigned long cr4;
> >
> > /*
> > * Read-modify-write to CR4 - protect it from preemption and
> > * from interrupts. (Use the raw variant because this code can
> > * be called from deep inside debugging code.)
> > */
> > raw_local_irq_save(flags);
> >
> > cr4 = read_cr4();
> > /* clear PGE */
> > write_cr4(cr4 & ~X86_CR4_PGE);
> > /* write old PGE again and flush TLBs */
> > write_cr4(cr4);
> >
> > raw_local_irq_restore(flags);
> > }
> >
> >
> > Each of those read_cr4 and write_cr4 can be traced. But I still do not see
> > how this can cause corruption in the current task struct.
> >
> > The TLB just caches the pages we are using, not the contents
> > inside. I still do not see how this can corrupt a bit. Unless a
> > register leaked :-/
>
> Ah, this is a lockdep thing.
>
> The raw_local_irq_save/restore in __native_flush_tlb_global does
> not update hardirqs_enabled.
>
> When we call into ftrace, when we cross page bounderies, we
> disable interrupts using the normal local_irq_save/restore calls.
>
> But when we restore, it detects that interrupts are not going to
> be enabled, and keeps hardirqs_enabled off.

ah, so the problem is ftrace doing a function trace entry _in the
middle_ of the raw-irqs section in __native_flush_tlb_global()? (due
to the cr4 getting virtualized too)

Nasty, really nasty.

Ingo

2009-04-23 16:50:18

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes

Steven Rostedt wrote:
> Ah, this is a lockdep thing.
>
> The raw_local_irq_save/restore in __native_flush_tlb_global does not
> update hardirqs_enabled.
>
> When we call into ftrace, when we cross page bounderies, we disable
> interrupts using the normal local_irq_save/restore calls.
>
> But when we restore, it detects that interrupts are not going to be
> enabled, and keeps hardirqs_enabled off.
>
> The printk solved the issue because it called local_irq_restore, which set
> the variable back.
>
> I guess there's two solutions here. One, we can change the
> raw_local_irq_enable/disable variants in __native_flush_tlb_global to the
> non-raw type (it should protect against recursion).
>
> or we can try to make the ring buffer use the raw_local_irq variants too.
> I tried this once before, and it did cause issues.
>
> Note, there's a "check_flags" in lockdep, but it is only called on
> locking, it is not called when we only disable/enable interrupts.
>

OK, the good news is that its not a callee-save calling convention
problem, which is what I feared. But it does sound pretty awkward to
fix. Does __native_flush_tlb_global() have to use
raw_local_irq_save/restore?

J

2009-04-23 17:27:52

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes

* Jeremy Fitzhardinge ([email protected]) wrote:
> OK, the good news is that its not a callee-save calling convention
> problem, which is what I feared. But it does sound pretty awkward to
> fix. Does __native_flush_tlb_global() have to use
> raw_local_irq_save/restore?

Vaguely related...makes sense to do native_(read/write)_cr4 since it's
the native tlb flushing implementation, no?

looks ok from inspection, not tested yet...

thanks,
-chris
--

Subject: [PATCH] x86: use native register access for native tlb flushing

currently these are paravirtulaized, doesn't appear any callers rely on
this (no pv_ops backends are using native_tlb and overriding cr3/4
access).

Signed-off-by: Chris Wright <[email protected]>
---
arch/x86/include/asm/tlbflush.h | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 16a5c84..a5ecc9c 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -17,7 +17,7 @@

static inline void __native_flush_tlb(void)
{
- write_cr3(read_cr3());
+ native_write_cr3(native_read_cr3());
}

static inline void __native_flush_tlb_global(void)
@@ -32,11 +32,11 @@ static inline void __native_flush_tlb_global(void)
*/
raw_local_irq_save(flags);

- cr4 = read_cr4();
+ cr4 = native_read_cr4();
/* clear PGE */
- write_cr4(cr4 & ~X86_CR4_PGE);
+ native_write_cr4(cr4 & ~X86_CR4_PGE);
/* write old PGE again and flush TLBs */
- write_cr4(cr4);
+ native_write_cr4(cr4);

raw_local_irq_restore(flags);
}

2009-04-23 17:52:35

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


On Thu, 23 Apr 2009, Jeremy Fitzhardinge wrote:

> Steven Rostedt wrote:
> > Ah, this is a lockdep thing.
> >
> > The raw_local_irq_save/restore in __native_flush_tlb_global does not update
> > hardirqs_enabled.
> >
> > When we call into ftrace, when we cross page bounderies, we disable
> > interrupts using the normal local_irq_save/restore calls.
> >
> > But when we restore, it detects that interrupts are not going to be enabled,
> > and keeps hardirqs_enabled off.
> >
> > The printk solved the issue because it called local_irq_restore, which set
> > the variable back.
> >
> > I guess there's two solutions here. One, we can change the
> > raw_local_irq_enable/disable variants in __native_flush_tlb_global to the
> > non-raw type (it should protect against recursion).
> >
> > or we can try to make the ring buffer use the raw_local_irq variants too. I
> > tried this once before, and it did cause issues.
> >
> > Note, there's a "check_flags" in lockdep, but it is only called on locking,
> > it is not called when we only disable/enable interrupts.
> >
>
> OK, the good news is that its not a callee-save calling convention problem,
> which is what I feared. But it does sound pretty awkward to fix. Does
> __native_flush_tlb_global() have to use raw_local_irq_save/restore?

No, I just investigated (and booted) without the raw versions (using the
normal disable instead). The hooks just update variables in the task
struct. I see no reason to have use it.

I use to use it with ftrace, but I found that issues are caused if ftrace
called something internal that used local_irq_save.

Perhaps that is the issue. If something uses raw_local_irq_save then calls
flush_tlb_global you might get a problem with the flags again (like we did
here). But the only user of raw_* should be lockdep and irqsoff, and they
should be able to deal with recursion. I don't think there should be any
other user.

-- Steve

2009-04-23 18:04:20

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes

* Chris Wright ([email protected]) wrote:
> * Jeremy Fitzhardinge ([email protected]) wrote:
> > OK, the good news is that its not a callee-save calling convention
> > problem, which is what I feared. But it does sound pretty awkward to
> > fix. Does __native_flush_tlb_global() have to use
> > raw_local_irq_save/restore?
>
> Vaguely related...makes sense to do native_(read/write)_cr4 since it's
> the native tlb flushing implementation, no?
>
> looks ok from inspection, not tested yet...

BTW, Steve, does that fix it for you? Sure makes the code cleaner:

410: 0f 20 e2 mov %cr4,%rdx
413: 48 89 d0 mov %rdx,%rax
416: 24 7f and $0x7f,%al
418: 0f 22 e0 mov %rax,%cr4
41b: 0f 22 e2 mov %rdx,%cr4

thanks,
-chris

> Subject: [PATCH] x86: use native register access for native tlb flushing
>
> currently these are paravirtulaized, doesn't appear any callers rely on
> this (no pv_ops backends are using native_tlb and overriding cr3/4
> access).
>
> Signed-off-by: Chris Wright <[email protected]>
> ---
> arch/x86/include/asm/tlbflush.h | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 16a5c84..a5ecc9c 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -17,7 +17,7 @@
>
> static inline void __native_flush_tlb(void)
> {
> - write_cr3(read_cr3());
> + native_write_cr3(native_read_cr3());
> }
>
> static inline void __native_flush_tlb_global(void)
> @@ -32,11 +32,11 @@ static inline void __native_flush_tlb_global(void)
> */
> raw_local_irq_save(flags);
>
> - cr4 = read_cr4();
> + cr4 = native_read_cr4();
> /* clear PGE */
> - write_cr4(cr4 & ~X86_CR4_PGE);
> + native_write_cr4(cr4 & ~X86_CR4_PGE);
> /* write old PGE again and flush TLBs */
> - write_cr4(cr4);
> + native_write_cr4(cr4);
>
> raw_local_irq_restore(flags);
> }

2009-04-23 18:33:31

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes



On Thu, 23 Apr 2009, Chris Wright wrote:

> * Chris Wright ([email protected]) wrote:
> > * Jeremy Fitzhardinge ([email protected]) wrote:
> > > OK, the good news is that its not a callee-save calling convention
> > > problem, which is what I feared. But it does sound pretty awkward to
> > > fix. Does __native_flush_tlb_global() have to use
> > > raw_local_irq_save/restore?
> >
> > Vaguely related...makes sense to do native_(read/write)_cr4 since it's
> > the native tlb flushing implementation, no?
> >
> > looks ok from inspection, not tested yet...
>
> BTW, Steve, does that fix it for you? Sure makes the code cleaner:

Hi Chris,

I bet it does, but I'm currently reviewing if we need the raw there. I may
take your patch instead.

I'm about to leave for school so I'll look at this later tonight.

-- Steve

>
> 410: 0f 20 e2 mov %cr4,%rdx
> 413: 48 89 d0 mov %rdx,%rax
> 416: 24 7f and $0x7f,%al
> 418: 0f 22 e0 mov %rax,%cr4
> 41b: 0f 22 e2 mov %rdx,%cr4
>
> thanks,
> -chris
>
> > Subject: [PATCH] x86: use native register access for native tlb flushing
> >
> > currently these are paravirtulaized, doesn't appear any callers rely on
> > this (no pv_ops backends are using native_tlb and overriding cr3/4
> > access).
> >
> > Signed-off-by: Chris Wright <[email protected]>
> > ---
> > arch/x86/include/asm/tlbflush.h | 8 ++++----
> > 1 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> > index 16a5c84..a5ecc9c 100644
> > --- a/arch/x86/include/asm/tlbflush.h
> > +++ b/arch/x86/include/asm/tlbflush.h
> > @@ -17,7 +17,7 @@
> >
> > static inline void __native_flush_tlb(void)
> > {
> > - write_cr3(read_cr3());
> > + native_write_cr3(native_read_cr3());
> > }
> >
> > static inline void __native_flush_tlb_global(void)
> > @@ -32,11 +32,11 @@ static inline void __native_flush_tlb_global(void)
> > */
> > raw_local_irq_save(flags);
> >
> > - cr4 = read_cr4();
> > + cr4 = native_read_cr4();
> > /* clear PGE */
> > - write_cr4(cr4 & ~X86_CR4_PGE);
> > + native_write_cr4(cr4 & ~X86_CR4_PGE);
> > /* write old PGE again and flush TLBs */
> > - write_cr4(cr4);
> > + native_write_cr4(cr4);
> >
> > raw_local_irq_restore(flags);
> > }
>

2009-04-23 18:45:48

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes

* Steven Rostedt ([email protected]) wrote:
> I bet it does, but I'm currently reviewing if we need the raw there. I may
> take your patch instead.

OK, saves a whopping 200 bytes on my .config too ;-)

text data bss dec hex filename
8041385 1816009 1822720 11680114 b23972 vmlinux.orig

text data bss dec hex filename
8041181 1816009 1822720 11679910 b238a6 vmlinux

2009-04-24 08:36:22

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


* Steven Rostedt <[email protected]> wrote:

>
> On Thu, 23 Apr 2009, Ingo Molnar wrote:
>
> > >
> > > Here's the situation:
> > >
> > > We've added selftests for the event tracer. What this basically does is
> > > enables each event one at a time and runs tests. The tests include
> > > creating a kernel thread, executing workqueues and grabbing locks.
> > >
> > > We also have PROVE_LOCKING (LOCKDEP) enabled, that keeps track of
> > > interrupts being enabled. When they are, we set a flag in the task struct
> > > "hardirqs_enabled". When they are disabled, this flag is cleared.
> > >
> > > When we fork a process, a test is made to see if the flag is set for the
> > > new process and if it is not, a warning is printed (as is done in the
> > > above dump).
> > >
> > > I investigated this and found that the flag is mysteriously being set and
> > > cleared for no apparent reason. The flag is in the task struct and nothing
> > > should be touching it. In fact, it is a full int, not even a bit in a
> > > flags variable.
> > >
> > > I had a test that would print the flag and irqs_disabled() in copy_process
> > > before the warning and it showed that the flag was cleared but irqs was
> > > enabled. The funny part is, I if the test triggered, I printed the flag
> > > again, and the second print it was set again!!
> > >
> > > if (!p->hardirqs_enabled) {
> > > printk("irqs:%d flag:%d\n", irqs_disabled(),
> > > p->hardirqs_enabled);
> > > printk("try again: %d\n", p->hardirqs_enabled);
> > > }
> > >
> > > The first print showed that it was cleared, the second showed it was set
> > > again??
> >
> > function tracer was active? So somewhere there we corrupted this
> > state? It's unlikely that printk itself did this.
>
> The problem always arises at the same spot. I'm not saying printk was the
> culprit, I'm saying that printk actually "fixed" the issue. Which can be a
> sign of a corrupted register somewhere.
>
> I currently tracked it down to something in "prep_new_page". It is
> hard to debug because as I add tests into the code, it makes the
> race window smaller, and I need to run multiple boots to trigger
> the code. But when I do trigger it (and I try to trigger it a few
> times) it always happens at the same spot.
>
> Note, everytime I catch the issue, the printk again "fixes" the
> problem
> :-/

btw., you could perhaps use early_printk() - which is a lot less
intrusive. It has no locking nor any irq-flags manipulation. (for
the serial bits at least)

Ingo

2009-04-24 08:38:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes


* Chris Wright <[email protected]> wrote:

> * Steven Rostedt ([email protected]) wrote:
> > I bet it does, but I'm currently reviewing if we need the raw there. I may
> > take your patch instead.
>
> OK, saves a whopping 200 bytes on my .config too ;-)
>
> text data bss dec hex filename
> 8041385 1816009 1822720 11680114 b23972 vmlinux.orig
>
> text data bss dec hex filename
> 8041181 1816009 1822720 11679910 b238a6 vmlinux

This really demonstrates the cost of paravirt :-/

Please lets not forget this patch - it makes sense regardless of
whether it works around the bug i reported.

Ingo

2009-04-24 15:14:16

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH 0/2] [GIT PULL] tracing: various bug fixes

* Ingo Molnar ([email protected]) wrote:
> * Chris Wright <[email protected]> wrote:
> > OK, saves a whopping 200 bytes on my .config too ;-)
> >
> > text data bss dec hex filename
> > 8041385 1816009 1822720 11680114 b23972 vmlinux.orig
> >
> > text data bss dec hex filename
> > 8041181 1816009 1822720 11679910 b238a6 vmlinux
>
> This really demonstrates the cost of paravirt :-/
>
> Please lets not forget this patch - it makes sense regardless of
> whether it works around the bug i reported.

Yup, Steve picked it up and it's in his most recent ftrace pull request
to you.

thanks,
-chris

2009-04-29 06:18:22

by Chris Wright

[permalink] [raw]
Subject: [tip:tracing/core] x86: use native register access for native tlb flushing

Commit-ID: d7285c6b5c54397fdf112c2fb98ee43193173aa9
Gitweb: http://git.kernel.org/tip/d7285c6b5c54397fdf112c2fb98ee43193173aa9
Author: Chris Wright <[email protected]>
AuthorDate: Thu, 23 Apr 2009 10:21:38 -0700
Committer: Steven Rostedt <[email protected]>
CommitDate: Thu, 23 Apr 2009 23:05:03 -0400

x86: use native register access for native tlb flushing

currently these are paravirtulaized, doesn't appear any callers rely on
this (no pv_ops backends are using native_tlb and overriding cr3/4
access).

[ Impact: fix lockdep warning with paravirt and function tracer ]

Signed-off-by: Chris Wright <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>


---
arch/x86/include/asm/tlbflush.h | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index d3539f9..e2927c5 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -17,7 +17,7 @@

static inline void __native_flush_tlb(void)
{
- write_cr3(read_cr3());
+ native_write_cr3(native_read_cr3());
}

static inline void __native_flush_tlb_global(void)
@@ -32,11 +32,11 @@ static inline void __native_flush_tlb_global(void)
*/
raw_local_irq_save(flags);

- cr4 = read_cr4();
+ cr4 = native_read_cr4();
/* clear PGE */
- write_cr4(cr4 & ~X86_CR4_PGE);
+ native_write_cr4(cr4 & ~X86_CR4_PGE);
/* write old PGE again and flush TLBs */
- write_cr4(cr4);
+ native_write_cr4(cr4);

raw_local_irq_restore(flags);
}