Current git of my perf_fuzzer tool (recently enhanced to fork and prctl)
https://github.com/deater/perf_event_tests/tree/master/fuzzer
rapidly crashes my machine with a
Warning followed by NMI errors then an rcu_stall then stuck cpus.
This is on a core2 machine.
I don't think these traces look like the other perf RCU bugs that have
been reported but it's hard to tell.
This is with linux 3.10-rc4 with the
[tip:perf/urgent] perf: Fix perf mmap bugs
patch applied. (What's the status of that patch?)
Vince
[ 2716.472065] ------------[ cut here ]------------
[ 2716.476035] WARNING: at kernel/events/core.c:2122 task_ctx_sched_out+0x3c/0x)
[ 2716.476035] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs locn
[ 2716.476035] CPU: 0 PID: 3164 Comm: perf_fuzzer Not tainted 3.10.0-rc4 #2
[ 2716.476035] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 2716.476035] 0000000000000000 ffffffff8102e215 0000000000000000 ffff88011fc18
[ 2716.476035] ffff8801175557f0 0000000000000000 ffff880119fda88c ffffffff810ad
[ 2716.476035] ffff880119fda880 ffffffff810af02a 0000000000000009 ffff880117550
[ 2716.476035] Call Trace:
[ 2716.476035] [<ffffffff8102e215>] ? warn_slowpath_common+0x5b/0x70
[ 2716.476035] [<ffffffff810ab2bd>] ? task_ctx_sched_out+0x3c/0x5f
[ 2716.476035] [<ffffffff810af02a>] ? perf_event_exit_task+0xbf/0x194
[ 2716.476035] [<ffffffff81032a37>] ? do_exit+0x3e7/0x90c
[ 2716.476035] [<ffffffff810cd5ab>] ? __do_fault+0x359/0x394
[ 2716.476035] [<ffffffff81032fe6>] ? do_group_exit+0x66/0x98
[ 2716.476035] [<ffffffff8103dbcd>] ? get_signal_to_deliver+0x479/0x4ad
[ 2716.476035] [<ffffffff810ac05c>] ? __perf_event_task_sched_out+0x230/0x2d1
[ 2716.476035] [<ffffffff8100205d>] ? do_signal+0x3c/0x432
[ 2716.476035] [<ffffffff810abbf9>] ? ctx_sched_in+0x43/0x141
[ 2716.476035] [<ffffffff810ac2ca>] ? perf_event_context_sched_in+0x7a/0x90
[ 2716.476035] [<ffffffff810ac311>] ? __perf_event_task_sched_in+0x31/0x118
[ 2716.476035] [<ffffffff81050dd9>] ? mmdrop+0xd/0x1c
[ 2716.476035] [<ffffffff81051a39>] ? finish_task_switch+0x7d/0xa6
[ 2716.476035] [<ffffffff81002473>] ? do_notify_resume+0x20/0x5d
[ 2716.476035] [<ffffffff813654f5>] ? retint_signal+0x3d/0x78
[ 2716.476035] ---[ end trace 827178d8a5966c3d ]---
[ 2718.046014] hrtimer: interrupt took 7185 ns
[ 2738.441822] NOHZ: local_softirq_pending 100
[ 2748.725529] Uhhuh. NMI received for unknown reason 3d on CPU 0.
[ 2748.728006] Do you have a strange power saving mode enabled?
[ 2748.728006] Dazed and confused, but trying to continue
[ 2748.780067] Uhhuh. NMI received for unknown reason 2d on CPU 0.
[ 2748.784003] Do you have a strange power saving mode enabled?
[ 2748.784003] Dazed and confused, but trying to continue
[ 2756.248003] [sched_delayed] sched: RT throttling activated
[ 2777.263999] INFO: rcu_sched self-detected stall on CPU[ 2777.272002] INFO: r)
[ 2777.272005] sending NMI to all CPUs:
[ 2777.264001] NMI backtrace for cpu 0
[ 2777.264001] CPU: 0 PID: 3103 Comm: perf_fuzzer Tainted: G W 3.10.02
[ 2777.264001] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 2777.264001] task: ffff88011766a040 ti: ffff880117400000 task.ti: ffff88011740
[ 2777.264001] RIP: 0010:[<ffffffff812450b8>] [<ffffffff812450b8>] io_serial_i0
[ 2777.264001] RSP: 0000:ffff88011fc03ca0 EFLAGS: 00000002
[ 2777.264001] RAX: ffffffff81245000 RBX: ffffffff8183c7b0 RCX: 0000000000000000
[ 2777.264001] RDX: 00000000000003fd RSI: 00000000000003fd RDI: ffffffff8183c7b0
[ 2777.264001] RBP: 00000000000026c8 R08: 0000000000000000 R09: ffffffff817caa5f
[ 2777.264001] R10: 0000000000000000 R11: ffff88011f008400 R12: 0000000000000020
[ 2777.264001] R13: ffffffff81246509 R14: 0000000000000039 R15: 0000000000000039
[ 2777.264001] FS: 00007ff55cc8a700(0000) GS:ffff88011fc00000(0000) knlGS:00000
[ 2777.264001] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2777.264001] CR2: 0000000000404850 CR3: 0000000119557000 CR4: 00000000000407f0
[ 2777.264001] DR0: 0000000000000020 DR1: 0000000000000020 DR2: 0000000000000000
[ 2777.264001] DR3: 0000000000001000 DR6: 00000000ffff0ff0 DR7: 00000000000f0602
[ 2777.264001] Stack:
[ 2777.264001] ffffffff812464a4 ffffffff8183c7b0 0000000000000065 ffffffff81830
[ 2777.264001] ffffffff8124651a ffffffff817caa71 ffffffff817caa50 ffffffff81249
[ 2777.264001] ffffffff8183c7b0 0000000000000000 0000000000000016 0000000000001
[ 2777.264001] Call Trace:
[ 2777.264001] <IRQ> [<ffffffff812464a4>] ? wait_for_xmitr+0x1c/0x81
[ 2777.264001] [<ffffffff8124651a>] ? serial8250_console_putchar+0x11/0x1f
[ 2777.264001] [<ffffffff812421d9>] ? uart_console_write+0x35/0x47
[ 2777.264001] [<ffffffff812465c6>] ? serial8250_console_write+0x9e/0xe9
[ 2777.264001] [<ffffffff8102eb60>] ? call_console_drivers.constprop.21+0x9e/0e
[ 2777.264001] [<ffffffff8102fa62>] ? console_unlock+0x12e/0x2f9
[ 2777.264001] [<ffffffff81030158>] ? vprintk_emit+0x3ab/0x3d6
[ 2777.264001] [<ffffffff8135fa26>] ? printk+0x4f/0x51
[ 2777.264001] [<ffffffff810656a0>] ? ktime_get+0x5f/0x6b
[ 2777.264001] [<ffffffff810926fa>] ? rcu_check_callbacks+0x106/0x4bb
[ 2777.264001] [<ffffffff8106b8d6>] ? tick_sched_do_timer+0x25/0x25
[ 2777.264001] [<ffffffff8103aa43>] ? update_process_times+0x31/0x5c
[ 2777.264001] [<ffffffff8106b63a>] ? tick_sched_handle+0x3e/0x4a
[ 2777.264001] [<ffffffff8106b906>] ? tick_sched_timer+0x30/0x4c
[ 2777.264001] [<ffffffff8104bf18>] ? __run_hrtimer+0xa9/0x14e
[ 2777.264001] [<ffffffff8104c525>] ? hrtimer_interrupt+0xbd/0x19e
[ 2777.264001] [<ffffffff810a9c63>] ? perf_remove_from_context+0x89/0x89
[ 2777.264001] [<ffffffff8101bca0>] ? smp_apic_timer_interrupt+0x6d/0x7e
[ 2777.264001] [<ffffffff8136adca>] ? apic_timer_interrupt+0x6a/0x70
[ 2777.264001] <EOI> [<ffffffff810aa7de>] ? group_sched_out+0x62/0x62
[ 2777.264001] [<ffffffff810a9cc7>] ? perf_event_disable+0x64/0x92
[ 2777.264001] [<ffffffff810a82ce>] ? perf_event_for_each_child+0x62/0x83
[ 2777.264001] [<ffffffff810ab398>] ? perf_event_task_disable+0x3f/0x6d
[ 2777.264001] [<ffffffff8104190e>] ? SyS_prctl+0x16a/0x32f
[ 2777.264001] [<ffffffff8136a212>] ? system_call_fastpath+0x16/0x1b
[ 2777.264001] Code: b6 4f 61 d3 e6 48 63 f6 48 03 77 10 89 16 c3 0f b6 4f 61 d
[ 2777.272117] NMI backtrace for cpu 1
[ 2777.272119] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 3.10.0-rc4 2
[ 2777.272120] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 2777.272121] task: ffff88011a1c2770 ti: ffff88011a1f0000 task.ti: ffff88011a10
[ 2777.272122] RIP: 0010:[<ffffffff811b8d30>] [<ffffffff811b8d30>] __delay+0x79
[ 2777.272126] RSP: 0018:ffff88011fc83e60 EFLAGS: 00000847
[ 2777.272127] RAX: ffffffff811b8d67 RBX: 0000000000002710 RCX: 0000000000000040
[ 2777.272127] RDX: 000000000026587d RSI: 0000000000000200 RDI: 000000000026587e
[ 2777.272128] RBP: ffff88011fc8d9a0 R08: 0000000000000000 R09: ffffffff817b8cb4
[ 2777.272129] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81637dc0
[ 2777.272130] R13: 0000000000000001 R14: ffff88011a1f0000 R15: ffffffff81637e80
[ 2777.272131] FS: 0000000000000000(0000) GS:ffff88011fc80000(0000) knlGS:00000
[ 2777.272132] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2777.272133] CR2: 000000000068ce30 CR3: 0000000117917000 CR4: 00000000000407e0
[ 2777.272134] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2777.272135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2777.272135] Stack:
[ 2777.272136] ffffffff8101c6bf ffffffff81637dc0 ffffffff810929d0 0000000000008
[ 2777.272138] 0000000000000083 0000000000000001 ffff88011a1c2770 0000000000000
[ 2777.272140] 0000000000000001 ffff88011fc8d370 ffffffff8106b8d6 ffff88011fc80
[ 2777.272141] Call Trace:
[ 2777.272142] <IRQ> [<ffffffff8101c6bf>] ? arch_trigger_all_cpu_backtrace+0xf
[ 2777.272145] [<ffffffff810929d0>] ? rcu_check_callbacks+0x3dc/0x4bb
[ 2777.272148] [<ffffffff8106b8d6>] ? tick_sched_do_timer+0x25/0x25
[ 2777.272149] [<ffffffff8103aa43>] ? update_process_times+0x31/0x5c
[ 2777.272152] [<ffffffff8106b63a>] ? tick_sched_handle+0x3e/0x4a
[ 2777.272153] [<ffffffff8106b906>] ? tick_sched_timer+0x30/0x4c
[ 2777.272155] [<ffffffff8104bf18>] ? __run_hrtimer+0xa9/0x14e
[ 2777.272157] [<ffffffff8104c525>] ? hrtimer_interrupt+0xbd/0x19e
[ 2777.272159] [<ffffffff8101bca0>] ? smp_apic_timer_interrupt+0x6d/0x7e
[ 2777.272161] [<ffffffff8136adca>] ? apic_timer_interrupt+0x6a/0x70
[ 2777.272163] <EOI> [<ffffffff81008184>] ? default_idle+0x14/0x3a
[ 2777.272167] [<ffffffff81064648>] ? cpu_startup_entry+0xbf/0x133
[ 2777.272169] [<ffffffff813574b2>] ? start_secondary+0x1d9/0x1dd
[ 2777.272172] Code: c0 74 1d eb 06 66 0f 1f 44 00 00 eb 0e 66 66 66 66 66 2e 0
[ 2777.264001] { 0} (t=5696 jiffies g=963 c=962 q=24)
[ 2777.264001] sending NMI to all CPUs:
[ 2777.264001] NMI backtrace for cpu 0
[ 2777.264001] CPU: 0 PID: 3103 Comm: perf_fuzzer Tainted: G W 3.10.02
[ 2777.264001] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 2777.264001] task: ffff88011766a040 ti: ffff880117400000 task.ti: ffff88011740
[ 2777.264001] RIP: 0010:[<ffffffff811b8d77>] [<ffffffff811b8d77>] delay_tsc+06
[ 2777.264001] RSP: 0000:ffff88011fc03e60 EFLAGS: 00000847
[ 2777.264001] RAX: 00000000e3cac8fa RBX: 0000000000002710 RCX: 0000000000000040
[ 2777.264001] RDX: 000000000000066e RSI: 0000000000000000 RDI: 000000000026587e
[ 2777.264001] RBP: ffff88011fc0d9a0 R08: 0000000000000000 R09: 0000000000000000
[ 2777.264001] R10: ffffffffffffffff R11: 0000000000000000 R12: ffffffff81637dc0
[ 2777.264001] R13: ffff88011fc0d370 R14: ffff880117400000 R15: 0000000000000000
[ 2777.264001] FS: 00007ff55cc8a700(0000) GS:ffff88011fc00000(0000) knlGS:00000
[ 2777.264001] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2777.264001] CR2: 0000000000404850 CR3: 0000000119557000 CR4: 00000000000407f0
[ 2777.264001] DR0: 0000000000000020 DR1: 0000000000000020 DR2: 0000000000000000
[ 2777.264001] DR3: 0000000000001000 DR6: 00000000ffff0ff0 DR7: 00000000000f0602
[ 2777.264001] Stack:
[ 2777.264001] ffffffff8101c6bf ffffffff81637dc0 ffffffff81092798 0000000000008
[ 2777.264001] 00000000000af4f1 0000000000000000 ffff88011766a040 0000000000000
[ 2777.264001] 0000000000000000 ffff88011fc0d370 ffffffff8106b8d6 ffff88011fc00
[ 2777.264001] Call Trace:
[ 2777.264001] <IRQ>
[ 2777.264001] [<ffffffff8101c6bf>] ? arch_trigger_all_cpu_backtrace+0x76/0x7f
[ 2777.264001] [<ffffffff81092798>] ? rcu_check_callbacks+0x1a4/0x4bb
[ 2777.264001] [<ffffffff8106b8d6>] ? tick_sched_do_timer+0x25/0x25
[ 2777.264001] [<ffffffff8103aa43>] ? update_process_times+0x31/0x5c
[ 2777.264001] [<ffffffff8106b63a>] ? tick_sched_handle+0x3e/0x4a
[ 2777.264001] [<ffffffff8106b906>] ? tick_sched_timer+0x30/0x4c
[ 2777.264001] [<ffffffff8104bf18>] ? __run_hrtimer+0xa9/0x14e
[ 2777.264001] [<ffffffff8104c525>] ? hrtimer_interrupt+0xbd/0x19e
[ 2777.264001] [<ffffffff810a9c63>] ? perf_remove_from_context+0x89/0x89
[ 2777.264001] [<ffffffff8101bca0>] ? smp_apic_timer_interrupt+0x6d/0x7e
[ 2777.264001] [<ffffffff8136adca>] ? apic_timer_interrupt+0x6a/0x70
[ 2777.264001] <EOI>
[ 2777.264001] [<ffffffff810aa7de>] ? group_sched_out+0x62/0x62
[ 2777.264001] [<ffffffff810a9cc7>] ? perf_event_disable+0x64/0x92
[ 2777.264001] [<ffffffff810a82ce>] ? perf_event_for_each_child+0x62/0x83
[ 2777.264001] [<ffffffff810ab398>] ? perf_event_task_disable+0x3f/0x6d
[ 2777.264001] [<ffffffff8104190e>] ? SyS_prctl+0x16a/0x32f
[ 2777.264001] [<ffffffff8136a212>] ? system_call_fastpath+0x16/0x1b
[ 2777.264001] Code: 01 e9 d7 ff ff ff 48 69 ff c7 10 00 00 e9 d4 ff ff ff 48 8
[ 2779.076001] NMI backtrace for cpu 1
[ 2779.076001] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 3.10.0-rc4 2
[ 2779.076001] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 2779.076001] task: ffff88011a1c2770 ti: ffff88011a1f0000 task.ti: ffff88011a10
[ 2779.076001] RIP: 0010:[<ffffffff81008184>] [<ffffffff81008184>] default_idla
[ 2779.076001] RSP: 0018:ffff88011a1f1f20 EFLAGS: 00000246
[ 2779.076001] RAX: 0000000000000000 RBX: ffff88011a1f1fd8 RCX: 00000000ffffffff
[ 2779.076001] RDX: 0100000000000000 RSI: 0000000000000001 RDI: ffff88011fc8d81c
[ 2779.076001] RBP: ffff88011a1f1fd8 R08: 0000000000000000 R09: 0000000000000000
[ 2779.076001] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 2779.076001] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 2779.076001] FS: 0000000000000000(0000) GS:ffff88011fc80000(0000) knlGS:00000
[ 2779.076001] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2779.076001] CR2: 000000000068ce30 CR3: 0000000117917000 CR4: 00000000000407e0
[ 2779.076001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2779.076001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2779.076001] Stack:
[ 2779.076001] ffffffff81064648 0000000000000001 0000000000000000 ffffffff81352
[ 2779.076001] 06d775b6cc484548 0000000000000000 0000000000000000 0000000000000
[ 2779.076001] 0000000000000000 0000000000000000 0000000000000000 0000000000000
[ 2779.076001] Call Trace:
[ 2779.076001] [<ffffffff81064648>] ? cpu_startup_entry+0xbf/0x133
[ 2779.076001] [<ffffffff813574b2>] ? start_secondary+0x1d9/0x1dd
[ 2779.076001] Code: 08 00 48 8b 7b 08 44 89 e2 89 ee ff 13 48 83 c3 10 48 83 3
[ 2804.244000] BUG: soft lockup - CPU#0 stuck for 22s! [perf_fuzzer:3103]
[ 2804.244000] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs locn
[ 2804.244000] CPU: 0 PID: 3103 Comm: perf_fuzzer Tainted: G W 3.10.02
[ 2804.244000] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 2804.244000] task: ffff88011766a040 ti: ffff880117400000 task.ti: ffff88011740
[ 2804.244000] RIP: 0010:[<ffffffff810a9cc7>] [<ffffffff810a9cc7>] perf_event_2
[ 2804.244000] RSP: 0000:ffff880117401ed0 EFLAGS: 00000282
[ 2804.244000] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
[ 2804.244000] RDX: 0000000000008b6a RSI: ffffffff810aa7de RDI: ffff8801179069cc
[ 2804.244000] RBP: ffff8801179069cc R08: 0000000000000000 R09: 0000000000000000
[ 2804.244000] R10: 0000000000000000 R11: ffff88011f008440 R12: 0000000000000000
[ 2804.244000] R13: ffff8801179069cc R14: 0000000000000000 R15: ffff88011f008440
[ 2804.244000] FS: 00007ff55cc8a700(0000) GS:ffff88011fc00000(0000) knlGS:00000
[ 2804.244000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2804.244000] CR2: 0000000000404850 CR3: 0000000119557000 CR4: 00000000000407f0
[ 2804.244000] DR0: 0000000000000020 DR1: 0000000000000020 DR2: 0000000000000000
[ 2804.244000] DR3: 0000000000001000 DR6: 00000000ffff0ff0 DR7: 00000000000f0602
[ 2804.244000] Stack:
[ 2804.244000] ffff8801195ac000 ffff8801168b5000 ffff8801195ac1d0 ffffffff810ae
[ 2804.244000] ffff8801195ac000 ffff88011766a040 ffff88011766a6c8 00007ff55ca74
[ 2804.244000] ffffffff810ab398 00007ff55ca700f4 00007fffe6c59c2c ffff880117660
[ 2804.244000] Call Trace:
[ 2804.244000] [<ffffffff810a82ce>] ? perf_event_for_each_child+0x62/0x83
[ 2804.244000] [<ffffffff810ab398>] ? perf_event_task_disable+0x3f/0x6d
[ 2804.244000] [<ffffffff8104190e>] ? SyS_prctl+0x16a/0x32f
[ 2804.244000] [<ffffffff8136a212>] ? system_call_fastpath+0x16/0x1b
[ 2804.244000] Code: 48 89 da 48 c7 c6 de a7 0a 81 e8 cc e4 ff ff 85 c0 74 41 4
[ 2832.244001] BUG: soft lockup - CPU#0 stuck for 22s! [perf_fuzzer:3103]
[ 2832.244001] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs locn
[ 2832.244001] CPU: 0 PID: 3103 Comm: perf_fuzzer Tainted: G W 3.10.02
[ 2832.244001] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 2832.244001] task: ffff88011766a040 ti: ffff880117400000 task.ti: ffff88011740
[ 2832.244001] RIP: 0010:[<ffffffff810a8177>] [<ffffffff810a8177>] perf_event_9
[ 2832.244001] RSP: 0000:ffff880117401ec8 EFLAGS: 00000202
[ 2832.244001] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
[ 2832.244001] RDX: ffff8801168b5000 RSI: ffffffff810aa7de RDI: ffff880118416770
[ 2832.244001] RBP: ffff8801179069cc R08: 0000000000000000 R09: 0000000000000000
[ 2832.244001] R10: 0000000000000000 R11: ffff88011f008440 R12: 0000000000000000
[ 2832.244001] R13: ffff88011f008440 R14: 0000000000000000 R15: ffff8801179069cc
[ 2832.244001] FS: 00007ff55cc8a700(0000) GS:ffff88011fc00000(0000) knlGS:00000
[ 2832.244001] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2832.244001] CR2: 0000000000404850 CR3: 0000000119557000 CR4: 00000000000407f0
[ 2832.244001] DR0: 0000000000000020 DR1: 0000000000000020 DR2: 0000000000000000
[ 2832.244001] DR3: 0000000000001000 DR6: 00000000ffff0ff0 DR7: 00000000000f0602
[ 2832.244001] Stack:
[ 2832.244001] ffffffff810a9cab ffff8801195ac000 ffff8801168b5000 ffff8801195a0
[ 2832.244001] ffffffff810a82ce ffff8801195ac000 ffff88011766a040 ffff880117668
[ 2832.244001] 00007ff55ca700e4 ffffffff810ab398 00007ff55ca700f4 00007fffe6c5c
[ 2832.244001] Call Trace:
[ 2832.244001] [<ffffffff810a9cab>] ? perf_event_disable+0x48/0x92
[ 2832.244001] [<ffffffff810a82ce>] ? perf_event_for_each_child+0x62/0x83
[ 2832.244001] [<ffffffff810ab398>] ? perf_event_task_disable+0x3f/0x6d
[ 2832.244001] [<ffffffff8104190e>] ? SyS_prctl+0x16a/0x32f
[ 2832.244001] [<ffffffff8136a212>] ? system_call_fastpath+0x16/0x1b
[ 2832.244001] Code: 48 89 47 38 c3 c7 87 68 01 00 00 00 00 00 00 c3 c7 87 68 0
OK, I haven't managed to get a small reproducible test case for the system
crash yet, but I do have one for the initial WARN_ONCE that happens.
This is on 3.10-rc5
As a refresher, it looks like this:
[ 44.688044] ------------[ cut here ]------------
[ 44.692023] WARNING: at kernel/events/core.c:2122 task_ctx_sched_out+0x3c/0x5f()
[ 44.692023] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs lockd dns_resolver fscache sunrpc loop fuse snd_hda_codec_hdmi snd_hda_codec_realtek coretemp kvm_intel kvm nouveau mxm_wmi evdev ttm microcode drm_kms_helper drm i2c_algo_bit video wmi snd_hda_intel acpi_cpufreq snd_hda_codec mperf snd_hwdep snd_pcm snd_page_alloc snd_seq snd_seq_device snd_timer shpchp snd soundcore button psmouse pcspkr processor i2c_nforce2 thermal_sys serio_raw i2c_core ext4 crc16 jbd2 mbcache sg sd_mod crc_t10dif ata_generic ahci libahci ohci_hcd libata ehci_pci ehci_hcd r8169 mii scsi_mod usbcore usb_common
[ 44.692023] CPU: 0 PID: 2992 Comm: out Not tainted 3.10.0-rc5 #2
[ 44.692023] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BIOS 080015 10/19/2012
[ 44.692023] 0000000000000000 ffffffff8102e215 0000000000000000 ffff88011fc15b28
[ 44.692023] ffff8801165a10c0 0000000000000000 ffff88011977458c ffffffff810ab2e1
[ 44.692023] ffff880119774580 ffffffff810af04e 0000000000000009 ffff8801165a10c0
[ 44.692023] Call Trace:
[ 44.692023] [<ffffffff8102e215>] ? warn_slowpath_common+0x5b/0x70
[ 44.692023] [<ffffffff810ab2e1>] ? task_ctx_sched_out+0x3c/0x5f
[ 44.692023] [<ffffffff810af04e>] ? perf_event_exit_task+0xbf/0x194
[ 44.692023] [<ffffffff81032a37>] ? do_exit+0x3e7/0x90c
[ 44.692023] [<ffffffff810cd5cf>] ? __do_fault+0x359/0x394
[ 44.692023] [<ffffffff81032fe6>] ? do_group_exit+0x66/0x98
[ 44.692023] [<ffffffff8103dbcd>] ? get_signal_to_deliver+0x479/0x4ad
[ 44.692023] [<ffffffff810cdd24>] ? __pte_alloc+0x83/0xf2
[ 44.692023] [<ffffffff8100205d>] ? do_signal+0x3c/0x432
[ 44.692023] [<ffffffff81368160>] ? __do_page_fault+0x31c/0x3ba
[ 44.692023] [<ffffffff81002473>] ? do_notify_resume+0x20/0x5d
[ 44.692023] [<ffffffff813655b5>] ? retint_signal+0x3d/0x78
[ 44.692023] ---[ end trace 38bad7d3252e7847 ]---
And the test code that consistently reproduces this:
/* log_to_code output from ./newcrash-bisect31 */
/* by Vince Weaver <vincent.weaver _at_ maine.edu */
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <sys/prctl.h>
#include <linux/hw_breakpoint.h>
#include <linux/perf_event.h>
int fd[1024];
struct perf_event_attr pe[1024];
char *mmap_result[1024];
int forked_pid;
int perf_event_open(struct perf_event_attr *hw_event_uptr,
pid_t pid, int cpu, int group_fd, unsigned long flags) {
return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu,
group_fd, flags);
}
int main(int argc, char **argv) {
memset(&pe[20],0,sizeof(struct perf_event_attr));
pe[20].type=PERF_TYPE_SOFTWARE;
pe[20].config=PERF_COUNT_SW_TASK_CLOCK;
pe[20].sample_type=0; /* 0 */
pe[20].read_format=0x0ULL; /* 0 */
pe[20].inherit=1;
pe[20].pinned=1;
pe[20].exclusive=1;
pe[20].exclude_user=1;
pe[20].exclude_kernel=1;
pe[20].exclude_hv=1;
pe[20].inherit_stat=1;
pe[20].precise_ip=0; /* arbitrary skid */
pe[20].exclude_guest=1;
pe[20].wakeup_events=0;
pe[20].bp_type=HW_BREAKPOINT_EMPTY;
fd[20]=perf_event_open(&pe[20],0,0,-1,0x20000000000000ULL /*20000000000000*/ );
memset(&pe[23],0,sizeof(struct perf_event_attr));
pe[23].type=PERF_TYPE_TRACEPOINT;
pe[23].size=96;
pe[23].config=0x1d;
pe[23].sample_type=0; /* 0 */
pe[23].read_format=PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_GROUP; /* a */
pe[23].exclude_user=1;
pe[23].exclude_hv=1;
pe[23].comm=1;
pe[23].inherit_stat=1;
pe[23].enable_on_exec=1;
pe[23].precise_ip=0; /* arbitrary skid */
pe[23].exclude_host=1;
pe[23].exclude_guest=1;
pe[23].wakeup_events=0;
pe[23].bp_type=HW_BREAKPOINT_EMPTY;
pe[23].config1=0x26;
fd[23]=perf_event_open(&pe[23],0,0,-1,0 /*0*/ );
memset(&pe[12],0,sizeof(struct perf_event_attr));
pe[12].type=PERF_TYPE_SOFTWARE;
pe[12].size=96;
pe[12].config=PERF_COUNT_SW_TASK_CLOCK;
pe[12].sample_type=0; /* 0 */
pe[12].read_format=PERF_FORMAT_ID; /* 4 */
pe[12].inherit=1;
pe[12].exclude_kernel=1;
pe[12].exclude_idle=1;
pe[12].mmap=1;
pe[12].freq=1;
pe[12].task=1;
pe[12].watermark=1;
pe[12].precise_ip=1; /* constant skid */
pe[12].sample_id_all=1;
pe[12].exclude_host=1;
pe[12].wakeup_watermark=0;
pe[12].bp_type=HW_BREAKPOINT_EMPTY;
fd[12]=perf_event_open(&pe[12],0,0,fd[23],PERF_FLAG_FD_NO_GROUP /*1*/ );
memset(&pe[30],0,sizeof(struct perf_event_attr));
pe[30].type=PERF_TYPE_HARDWARE;
pe[30].size=72;
pe[30].config=PERF_COUNT_HW_INSTRUCTIONS;
pe[30].sample_type=0; /* 0 */
pe[30].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_ID; /* 5 */
pe[30].exclude_user=1;
pe[30].exclude_hv=1;
pe[30].mmap=1;
pe[30].comm=1;
pe[30].freq=1;
pe[30].task=1;
pe[30].watermark=1;
pe[30].precise_ip=2; /* request zero skid */
pe[30].sample_id_all=1;
pe[30].wakeup_watermark=0;
pe[30].bp_type=HW_BREAKPOINT_EMPTY;
fd[30]=perf_event_open(&pe[30],0,0,fd[20],0 /*0*/ );
memset(&pe[11],0,sizeof(struct perf_event_attr));
pe[11].type=PERF_TYPE_SOFTWARE;
pe[11].size=64;
pe[11].config=PERF_COUNT_SW_TASK_CLOCK;
pe[11].sample_type=0; /* 0 */
pe[11].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_GROUP; /* 9 */
pe[11].exclude_idle=1;
pe[11].comm=1;
pe[11].enable_on_exec=1;
pe[11].task=1;
pe[11].watermark=1;
pe[11].precise_ip=0; /* arbitrary skid */
pe[11].exclude_host=1;
pe[11].exclude_guest=1;
pe[11].wakeup_watermark=0;
pe[11].bp_type=HW_BREAKPOINT_EMPTY;
fd[11]=perf_event_open(&pe[11],0,0,fd[23],PERF_FLAG_FD_NO_GROUP|PERF_FLAG_FD_OUTPUT /*3*/ );
forked_pid=fork();
if (forked_pid==0) while(1);
close(fd[23]);
kill(forked_pid,SIGKILL);
/* Replayed 8 syscalls */
return 0;
}
On Fri, 14 Jun 2013, Vince Weaver wrote:
> OK, I haven't managed to get a small reproducible test case for the system
> crash yet
I wasted the last 2 days bisecting a 10000 syscall trace, but below is a
20-syscall testcase that rapidly makes a core2 machine running 3.10-rc7
unusable.
You may have to run this twice (first triggers WARNING second crashes the
machine).
/* This code quickly causes 3.10-rc7 on core2 to get stuck */
/* process never ends, lots of garbage in logs, can't reboot */
/* see the error log at end. */
/* log_to_code output from ./bad.trace.bisect54 */
/* by Vince Weaver <vincent.weaver _at_ maine.edu */
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <sys/prctl.h>
#include <linux/hw_breakpoint.h>
#include <linux/perf_event.h>
int fd[1024];
struct perf_event_attr pe[1024];
int forked_pid;
int perf_event_open(struct perf_event_attr *hw_event_uptr,
pid_t pid, int cpu, int group_fd, unsigned long flags) {
return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu,
group_fd, flags);
}
int main(int argc, char **argv) {
/* 1 */
memset(&pe[29],0,sizeof(struct perf_event_attr));
pe[29].type=PERF_TYPE_HW_CACHE;
pe[29].size=96;
pe[29].config=PERF_COUNT_HW_CACHE_DTLB | ( PERF_COUNT_HW_CACHE_OP_READ << 8) | ( PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16 );
pe[29].sample_type=0; /* 0 */
pe[29].read_format=PERF_FORMAT_GROUP; /* 8 */
pe[29].pinned=1;
pe[29].exclude_kernel=1;
pe[29].exclude_hv=1;
pe[29].exclude_idle=1;
pe[29].mmap=1;
pe[29].freq=1;
pe[29].inherit_stat=1;
pe[29].watermark=1;
pe[29].precise_ip=0; /* arbitrary skid */
pe[29].wakeup_watermark=0;
pe[29].bp_type=HW_BREAKPOINT_EMPTY;
fd[29]=perf_event_open(&pe[29],0,0,-1,PERF_FLAG_FD_NO_GROUP /*1*/ );
/* 2 */
memset(&pe[42],0,sizeof(struct perf_event_attr));
pe[42].type=PERF_TYPE_SOFTWARE;
pe[42].size=80;
pe[42].config=PERF_COUNT_SW_TASK_CLOCK;
pe[42].sample_type=0; /* 0 */
pe[42].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID; /* 7 */
pe[42].inherit=1;
pe[42].exclusive=1;
pe[42].exclude_kernel=1;
pe[42].exclude_idle=1;
pe[42].freq=1;
pe[42].task=1;
pe[42].watermark=1;
pe[42].precise_ip=3; /* must have zero skid */
pe[42].sample_id_all=1;
pe[42].exclude_host=1;
pe[42].wakeup_watermark=0;
pe[42].bp_type=HW_BREAKPOINT_EMPTY;
fd[42]=perf_event_open(&pe[42],0,0,-1,0 /*0*/ );
/* 3 */
forked_pid=fork();
if (forked_pid==0) while(1);
/* 4 */
prctl(PR_TASK_PERF_EVENTS_DISABLE);
/* 5 */
kill(forked_pid,SIGKILL);
/* 6 */
prctl(PR_TASK_PERF_EVENTS_ENABLE);
/* 7 */
memset(&pe[15],0,sizeof(struct perf_event_attr));
pe[15].type=PERF_TYPE_SOFTWARE;
pe[15].config=PERF_COUNT_SW_TASK_CLOCK;
pe[15].sample_type=0; /* 0 */
pe[15].read_format=PERF_FORMAT_ID; /* 4 */
pe[15].disabled=1;
pe[15].inherit=1;
pe[15].exclude_user=1;
pe[15].exclude_hv=1;
pe[15].enable_on_exec=1;
pe[15].task=1;
pe[15].watermark=1;
pe[15].precise_ip=2; /* request zero skid */
pe[15].mmap_data=1;
pe[15].sample_id_all=1;
pe[15].exclude_guest=1;
pe[15].wakeup_watermark=0;
pe[15].bp_type=HW_BREAKPOINT_EMPTY;
fd[15]=perf_event_open(&pe[15],0,0,fd[29],0 /*0*/ );
/* 8 */
close(fd[29]);
/* 9 */
memset(&pe[12],0,sizeof(struct perf_event_attr));
pe[12].type=PERF_TYPE_SOFTWARE;
pe[12].config=PERF_COUNT_SW_TASK_CLOCK;
pe[12].sample_type=0; /* 0 */
pe[12].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_ID; /* 5 */
pe[12].disabled=1;
pe[12].inherit=1;
pe[12].exclusive=1;
pe[12].mmap=1;
pe[12].freq=1;
pe[12].enable_on_exec=1;
pe[12].watermark=1;
pe[12].precise_ip=0; /* arbitrary skid */
pe[12].mmap_data=1;
pe[12].exclude_guest=1;
pe[12].wakeup_watermark=0;
pe[12].bp_type=HW_BREAKPOINT_EMPTY;
fd[12]=perf_event_open(&pe[12],0,0,-1,PERF_FLAG_FD_NO_GROUP /*1*/ );
/* 10 */
prctl(PR_TASK_PERF_EVENTS_DISABLE);
/* 11 */
prctl(PR_TASK_PERF_EVENTS_ENABLE);
/* 12 */
memset(&pe[4],0,sizeof(struct perf_event_attr));
pe[4].type=PERF_TYPE_SOFTWARE;
pe[4].size=96;
pe[4].config=PERF_COUNT_SW_TASK_CLOCK;
pe[4].sample_type=0; /* 0 */
pe[4].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID; /* 7 */
pe[4].pinned=1;
pe[4].exclude_user=1;
pe[4].exclude_hv=1;
pe[4].exclude_idle=1;
pe[4].mmap=1;
pe[4].freq=1;
pe[4].inherit_stat=1;
pe[4].task=1;
pe[4].precise_ip=0; /* arbitrary skid */
pe[4].wakeup_events=0;
pe[4].bp_type=HW_BREAKPOINT_EMPTY;
fd[4]=perf_event_open(&pe[4],0,0,-1,0 /*0*/ );
/* 13 */
memset(&pe[7],0,sizeof(struct perf_event_attr));
pe[7].type=PERF_TYPE_SOFTWARE;
pe[7].size=96;
pe[7].config=PERF_COUNT_SW_TASK_CLOCK;
pe[7].sample_type=0; /* 0 */
pe[7].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_GROUP; /* 9 */
pe[7].exclusive=1;
pe[7].exclude_user=1;
pe[7].exclude_kernel=1;
pe[7].exclude_hv=1;
pe[7].exclude_idle=1;
pe[7].comm=1;
pe[7].inherit_stat=1;
pe[7].precise_ip=1; /* constant skid */
pe[7].exclude_host=1;
pe[7].wakeup_events=0;
pe[7].bp_type=HW_BREAKPOINT_EMPTY;
fd[7]=perf_event_open(&pe[7],0,0,-1,PERF_FLAG_FD_NO_GROUP /*1*/ );
/* 14 */
forked_pid=fork();
if (forked_pid==0) while(1);
/* 15 */
kill(forked_pid,SIGKILL);
/* 16 */
forked_pid=fork();
if (forked_pid==0) while(1);
/* 17 */
kill(forked_pid,SIGKILL);
/* 18 */
close(fd[4]);
/* 19 */
memset(&pe[4],0,sizeof(struct perf_event_attr));
pe[4].type=PERF_TYPE_HARDWARE;
pe[4].size=72;
pe[4].config=PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
pe[4].sample_type=0; /* 0 */
pe[4].read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING|PERF_FORMAT_ID; /* 7 */
pe[4].exclude_user=1;
pe[4].exclude_idle=1;
pe[4].enable_on_exec=1;
pe[4].precise_ip=0; /* arbitrary skid */
pe[4].mmap_data=1;
pe[4].exclude_host=1;
pe[4].exclude_guest=1;
pe[4].wakeup_events=0;
pe[4].bp_type=HW_BREAKPOINT_EMPTY;
fd[4]=perf_event_open(&pe[4],0,0,-1,PERF_FLAG_FD_OUTPUT /*2*/ );
/* 20 */
prctl(PR_TASK_PERF_EVENTS_DISABLE);
/* Replayed 20 syscalls */
return 0;
}
/*
Message from syslogd@core2 at Jun 28 15:11:34 ...
kernel:[ 140.248002] BUG: soft lockup - CPU#1 stuck for 23s! [out:3049]
Message from syslogd@core2 at Jun 28 15:12:02 ...
kernel:[ 168.248002] BUG: soft lockup - CPU#1 stuck for 22s! [out:3049]
Message from syslogd@core2 at Jun 28 15:12:38 ...
kernel:[ 204.248002] BUG: soft lockup - CPU#1 stuck for 24s! [out:3049]
Message from syslogd@core2 at Jun 28 15:13:06 ...
kernel:[ 232.248002] BUG: soft lockup - CPU#1 stuck for 23s! [out:3049]
Message from syslogd@core2 at Jun 28 15:13:38 ...
kernel:[ 264.248002] BUG: soft lockup - CPU#1 stuck for 22s! [out:3049]
[ 82.239623] ------------[ cut here ]------------
[ 82.243537] WARNING: at kernel/events/core.c:2122 task_ctx_sched_out+0x3c/0x)
[ 82.243537] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs locn
[ 82.243537] CPU: 0 PID: 3047 Comm: out Not tainted 3.10.0-rc7 #1
[ 82.243537] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 82.243537] 0000000000000000 ffffffff8102e0f5 0000000000000000 ffff88011fc18
[ 82.243537] ffff88011799b040 0000000000000000 ffff8801177acecc ffffffff810af
[ 82.243537] ffff8801177acec0 ffffffff810af06f 0000000000000009 ffff880117990
[ 82.243537] Call Trace:
[ 82.243537] [<ffffffff8102e0f5>] ? warn_slowpath_common+0x5b/0x70
[ 82.243537] [<ffffffff810ab2ff>] ? task_ctx_sched_out+0x3c/0x5f
[ 82.243537] [<ffffffff810af06f>] ? perf_event_exit_task+0xbf/0x194
[ 82.243537] [<ffffffff8103291b>] ? do_exit+0x3ef/0x90c
[ 82.243537] [<ffffffff810ce101>] ? do_wp_page+0x275/0x5a6
[ 82.243537] [<ffffffff81032ec2>] ? do_group_exit+0x66/0x98
[ 82.243537] [<ffffffff8103dab9>] ? get_signal_to_deliver+0x479/0x4ad
[ 82.243537] [<ffffffff810cdd50>] ? __pte_alloc+0x83/0xf2
[ 82.243537] [<ffffffff8100205d>] ? do_signal+0x3c/0x432
[ 82.243537] [<ffffffff813682a0>] ? __do_page_fault+0x31c/0x3ba
[ 82.243537] [<ffffffff81002473>] ? do_notify_resume+0x20/0x5d
[ 82.243537] [<ffffffff813656f5>] ? retint_signal+0x3d/0x78
[ 82.243537] ---[ end trace 6e0d112623f126d5 ]---
[ 111.664002] INFO: rcu_sched self-detected stall on CPU[ 111.676008] INFO: r)
[ 111.676018] sending NMI to all CPUs:
[ 111.664002] NMI backtrace for cpu 1
[ 111.664002] CPU: 1 PID: 3049 Comm: out Tainted: G W 3.10.0-rc7 #1
[ 111.664002] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 111.664002] task: ffff880117b62080 ti: ffff880117a6a000 task.ti: ffff880117a0
[ 111.664002] RIP: 0010:[<ffffffff811b8deb>] [<ffffffff811b8deb>] delay_tsc+06
[ 111.664002] RSP: 0018:ffff88011fc83ca0 EFLAGS: 00000083
[ 111.664002] RAX: 000000000dfb29b6 RBX: ffffffff8183c770 RCX: 000000000dfb218c
[ 111.664002] RDX: 000000000000082a RSI: 0000000000000001 RDI: 00000000000009d2
[ 111.664002] RBP: 000000000000270b R08: 0000000000000001 R09: ffffffff817caa1f
[ 111.664002] R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000020
[ 111.664002] R13: ffffffff812465bc R14: 0000000000000039 R15: 0000000000000039
[ 111.664002] FS: 00007fa834fef700(0000) GS:ffff88011fc80000(0000) knlGS:00000
[ 111.664002] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 111.664002] CR2: 0000000000600e70 CR3: 00000001171da000 CR4: 00000000000407e0
[ 111.664002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 111.664002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 111.664002] Stack:
[ 111.664002] ffffffff81246578 ffffffff8183c770 000000000000006c ffffffff81830
[ 111.664002] ffffffff812465cd ffffffff817caa41 ffffffff817caa10 ffffffff81245
[ 111.664002] ffffffff8183c770 0000000000000000 0000000000000016 0000000000001
[ 111.664002] Call Trace:
[ 111.664002] <IRQ> [<ffffffff81246578>] ? wait_for_xmitr+0x3d/0x81
[ 111.664002] [<ffffffff812465cd>] ? serial8250_console_putchar+0x11/0x1f
[ 111.664002] [<ffffffff81242285>] ? uart_console_write+0x35/0x47
[ 111.664002] [<ffffffff81246679>] ? serial8250_console_write+0x9e/0xe9
[ 111.664002] [<ffffffff8102ea40>] ? call_console_drivers.constprop.22+0x9e/0e
[ 111.664002] [<ffffffff8102f8b0>] ? console_unlock+0x12e/0x2f9
[ 111.664002] [<ffffffff8102ffa6>] ? vprintk_emit+0x3ab/0x3d6
[ 111.664002] [<ffffffff8135fc3f>] ? printk+0x4f/0x51
[ 111.664002] [<ffffffff8109273e>] ? rcu_check_callbacks+0x106/0x4bb
[ 111.664002] [<ffffffff8106b90e>] ? tick_sched_do_timer+0x25/0x25
[ 111.664002] [<ffffffff8103a92f>] ? update_process_times+0x31/0x5c
[ 111.664002] [<ffffffff8106b672>] ? tick_sched_handle+0x3e/0x4a
[ 111.664002] [<ffffffff8106b93e>] ? tick_sched_timer+0x30/0x4c
[ 111.664002] [<ffffffff8104be54>] ? __run_hrtimer+0xa9/0x14e
[ 111.664002] [<ffffffff8104c461>] ? hrtimer_interrupt+0xbd/0x19e
[ 111.664002] [<ffffffff810a9c9f>] ? perf_remove_from_context+0x89/0x89
[ 111.664002] [<ffffffff8101bc60>] ? smp_apic_timer_interrupt+0x6d/0x7e
[ 111.664002] [<ffffffff8136b00a>] ? apic_timer_interrupt+0x6a/0x70
[ 111.664002] <EOI> [<ffffffff810aa81a>] ? group_sched_out+0x62/0x62
[ 111.664002] [<ffffffff810a9d03>] ? perf_event_disable+0x64/0x92
[ 111.664002] [<ffffffff810a830a>] ? perf_event_for_each_child+0x62/0x83
[ 111.664002] [<ffffffff810ab3da>] ? perf_event_task_disable+0x3f/0x6d
[ 111.664002] [<ffffffff8104184c>] ? SyS_prctl+0x16a/0x32f
[ 111.664002] [<ffffffff8136a452>] ? system_call_fastpath+0x16/0x1b
[ 111.664002] Code: 3c bf e9 cb ff ff ff 65 8b 34 25 c4 b0 00 00 66 66 90 0f a
[ 111.676334] NMI backtrace for cpu 0
[ 111.676339] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.10.0-rc7 1
[ 111.676341] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 111.676345] task: ffffffff81610400 ti: ffffffff81600000 task.ti: ffffffff8160
[ 111.676348] RIP: 0010:[<ffffffff811b8d9b>] [<ffffffff811b8d9b>] __const_ude0
[ 111.676357] RSP: 0018:ffff88011fc03e60 EFLAGS: 00000046
[ 111.676360] RAX: 0000000000000000 RBX: 0000000000002710 RCX: 0000000000000040
[ 111.676363] RDX: 00000000009aa06c RSI: 0000000000000200 RDI: 0000000000418958
[ 111.676365] RBP: ffff88011fc0d9a0 R08: 0000000000000000 R09: ffffffff817b87ec
[ 111.676368] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81637dc0
[ 111.676371] R13: 0000000000000001 R14: ffffffff81600000 R15: ffffffff81637e80
[ 111.676374] FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:00000
[ 111.676377] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 111.676380] CR2: 00007fa834dd8a60 CR3: 00000001176de000 CR4: 00000000000407f0
[ 111.676383] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 111.676385] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 111.676387] Stack:
[ 111.676388] ffffffff8101c67f ffffffff81637dc0 ffffffff81092a14 0000000000002
[ 111.676394] 0000000000000083 0000000000000000 ffffffff81610400 0000000000000
[ 111.676399] 0000000000000000 ffff88011fc0d370 ffffffff8106b90e ffff88011fc00
[ 111.676405] Call Trace:
[ 111.676407] <IRQ> [<ffffffff8101c67f>] ? arch_trigger_all_cpu_backtrace+0xf
[ 111.676416] [<ffffffff81092a14>] ? rcu_check_callbacks+0x3dc/0x4bb
[ 111.676424] [<ffffffff8106b90e>] ? tick_sched_do_timer+0x25/0x25
[ 111.676429] [<ffffffff8103a92f>] ? update_process_times+0x31/0x5c
[ 111.676436] [<ffffffff8106b672>] ? tick_sched_handle+0x3e/0x4a
[ 111.676441] [<ffffffff8106b93e>] ? tick_sched_timer+0x30/0x4c
[ 111.676447] [<ffffffff8104be54>] ? __run_hrtimer+0xa9/0x14e
[ 111.676452] [<ffffffff8104c461>] ? hrtimer_interrupt+0xbd/0x19e
[ 111.676458] [<ffffffff8101bc60>] ? smp_apic_timer_interrupt+0x6d/0x7e
[ 111.676464] [<ffffffff8136b00a>] ? apic_timer_interrupt+0x6a/0x70
[ 111.676470] <EOI> [<ffffffff81008184>] ? default_idle+0x14/0x3a
[ 111.676480] [<ffffffff810086ec>] ? arch_cpu_idle+0x5/0x14
[ 111.676486] [<ffffffff8106465d>] ? cpu_startup_entry+0x103/0x176
[ 111.676493] [<ffffffff816aacaf>] ? start_kernel+0x3d6/0x3e1
[ 111.676499] [<ffffffff816aa6fb>] ? repair_env_string+0x54/0x54
[ 111.676505] Code: eb 0e 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 ff c8 7
[ 111.664002] { 1} (t=5733 jiffies g=359 c=358 q=18)
[ 111.664002] sending NMI to all CPUs:
[ 111.664002] NMI backtrace for cpu 1
[ 111.664002] CPU: 1 PID: 3049 Comm: out Tainted: G W 3.10.0-rc7 #1
[ 111.664002] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 111.664002] task: ffff880117b62080 ti: ffff880117a6a000 task.ti: ffff880117a0
[ 111.664002] RIP: 0010:[<ffffffff811b8ddf>] [<ffffffff811b8ddf>] delay_tsc+06
[ 111.664002] RSP: 0018:ffff88011fc83e60 EFLAGS: 00000847
[ 111.664002] RAX: 00000000357ac791 RBX: 0000000000002710 RCX: 00000000357ac791
[ 111.664002] RDX: 000000000000004f RSI: 0000000000000001 RDI: 0000000000265903
[ 111.664002] RBP: ffff88011fc8d9a0 R08: 0000000000000000 R09: 0000000000000000
[ 111.664002] R10: 00000000ffffffff R11: ffff880118676260 R12: ffffffff81637dc0
[ 111.664002] R13: ffff88011fc8d370 R14: ffff880117a6a000 R15: 0000000000000001
[ 111.664002] FS: 00007fa834fef700(0000) GS:ffff88011fc80000(0000) knlGS:00000
[ 111.664002] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 111.664002] CR2: 0000000000600e70 CR3: 00000001171da000 CR4: 00000000000407e0
[ 111.664002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 111.664002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 111.664002] Stack:
[ 111.664002] ffffffff8101c67f ffffffff81637dc0 ffffffff810927dc 0000000000002
[ 111.664002] 0000000000353d39 0000000000000001 ffff880117b62080 0000000000000
[ 111.664002] 0000000000000001 ffff88011fc8d370 ffffffff8106b90e ffff88011fc80
[ 111.664002] Call Trace:
[ 111.664002] <IRQ>
[ 111.664002] [<ffffffff8101c67f>] ? arch_trigger_all_cpu_backtrace+0x76/0x7f
[ 111.664002] [<ffffffff810927dc>] ? rcu_check_callbacks+0x1a4/0x4bb
[ 111.664002] [<ffffffff8106b90e>] ? tick_sched_do_timer+0x25/0x25
[ 111.664002] [<ffffffff8103a92f>] ? update_process_times+0x31/0x5c
[ 111.664002] [<ffffffff8106b672>] ? tick_sched_handle+0x3e/0x4a
[ 111.664002] [<ffffffff8106b93e>] ? tick_sched_timer+0x30/0x4c
[ 111.664002] [<ffffffff8104be54>] ? __run_hrtimer+0xa9/0x14e
[ 111.664002] [<ffffffff8104c461>] ? hrtimer_interrupt+0xbd/0x19e
[ 111.664002] [<ffffffff810a9c9f>] ? perf_remove_from_context+0x89/0x89
[ 111.664002] [<ffffffff8101bc60>] ? smp_apic_timer_interrupt+0x6d/0x7e
[ 111.664002] [<ffffffff8136b00a>] ? apic_timer_interrupt+0x6a/0x70
[ 111.664002] <EOI>
[ 111.664002] [<ffffffff810aa81a>] ? group_sched_out+0x62/0x62
[ 111.664002] [<ffffffff810a9d03>] ? perf_event_disable+0x64/0x92
[ 111.664002] [<ffffffff810a830a>] ? perf_event_for_each_child+0x62/0x83
[ 111.664002] [<ffffffff810ab3da>] ? perf_event_task_disable+0x3f/0x6d
[ 111.664002] [<ffffffff8104184c>] ? SyS_prctl+0x16a/0x32f
[ 111.664002] [<ffffffff8136a452>] ? system_call_fastpath+0x16/0x1b
[ 111.664002] Code: ff c7 10 00 00 e9 d4 ff ff ff 48 8d 3c bf e9 cb ff ff ff 6
[ 113.636004] NMI backtrace for cpu 0
[ 113.636004] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 3.10.0-rc7 1
[ 113.636004] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 113.636004] task: ffffffff81610400 ti: ffffffff81600000 task.ti: ffffffff8160
[ 113.636004] RIP: 0010:[<ffffffff81008184>] [<ffffffff81008184>] default_idla
[ 113.636004] RSP: 0018:ffffffff81601f98 EFLAGS: 00000246
[ 113.636004] RAX: 0000000000000000 RBX: ffffffff81601fd8 RCX: 00000000ffffffff
[ 113.636004] RDX: 0100000000000000 RSI: 0000000000000000 RDI: ffff88011fc0d81c
[ 113.636004] RBP: ffffffff81601fd8 R08: 0000000000000000 R09: 0000000000000000
[ 113.636004] R10: 0000000000000001 R11: 0000000000000002 R12: ffffffff817332d0
[ 113.636004] R13: ffff88011ffad540 R14: 0000000000000000 R15: 0000000000000000
[ 113.636004] FS: 0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:00000
[ 113.636004] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 113.636004] CR2: 00007fa834dd8a60 CR3: 0000000117988000 CR4: 00000000000407f0
[ 113.636004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 113.636004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 113.636004] Stack:
[ 113.636004] ffffffff8106465d 516a4e0130c2f192 ffffffffffffffff ffffffff81720
[ 113.636004] ffffffff816aacaf ffffffff816aa6fb ffffffff817332d0 0000000000000
[ 113.636004] 0000000000000000 0000000000000000 0000000000000000 0000000000000
[ 113.636004] Call Trace:
[ 113.636004] [<ffffffff8106465d>] ? cpu_startup_entry+0x103/0x176
[ 113.636004] [<ffffffff816aacaf>] ? start_kernel+0x3d6/0x3e1
[ 113.636004] [<ffffffff816aa6fb>] ? repair_env_string+0x54/0x54
[ 113.636004] Code: 08 00 48 8b 7b 08 44 89 e2 89 ee ff 13 48 83 c3 10 48 83 3
[ 140.248002] BUG: soft lockup - CPU#1 stuck for 23s! [out:3049]
[ 140.248002] Modules linked in: nfsd auth_rpcgss oid_registry nfs_acl nfs locn
[ 140.248002] CPU: 1 PID: 3049 Comm: out Tainted: G W 3.10.0-rc7 #1
[ 140.248002] Hardware name: AOpen DE7000/nMCP7ALPx-DE R1.06 Oct.19.2012, BI2
[ 140.248002] task: ffff880117b62080 ti: ffff880117a6a000 task.ti: ffff880117a0
[ 140.248002] RIP: 0010:[<ffffffff810a9d03>] [<ffffffff810a9d03>] perf_event_2
[ 140.248002] RSP: 0018:ffff880117a6bed0 EFLAGS: 00000282
[ 140.248002] RAX: 0000000000000001 RBX: 00007fa834b12259 RCX: 0000000000000001
[ 140.248002] RDX: 000000000000fe0d RSI: ffffffff810aa81a RDI: ffff8801177ac7cc
[ 140.248002] RBP: ffff8801177ac7cc R08: 0000000000000002 R09: 0000000000602000
[ 140.248002] R10: 00007fa834b12259 R11: 0000000000000246 R12: 0000000000000002
[ 140.248002] R13: ffff8801177ac7cc R14: 00007fa834b12259 R15: 0000000000000246
[ 140.248002] FS: 00007fa834fef700(0000) GS:ffff88011fc80000(0000) knlGS:00000
[ 140.248002] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 140.248002] CR2: 0000000000600e70 CR3: 00000001171da000 CR4: 00000000000407e0
[ 140.248002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 140.248002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 140.248002] Stack:
[ 140.248002] ffff880118b96400 ffff880117abf400 ffff880118b965d0 ffffffff810aa
[ 140.248002] ffff880118b96400 ffff880117b62080 ffff880117b62708 0000000000002
[ 140.248002] ffffffff810ab3da 0000000000000000 0000000000000000 ffff880117b60
[ 140.248002] Call Trace:
[ 140.248002] [<ffffffff810a830a>] ? perf_event_for_each_child+0x62/0x83
[ 140.248002] [<ffffffff810ab3da>] ? perf_event_task_disable+0x3f/0x6d
[ 140.248002] [<ffffffff8104184c>] ? SyS_prctl+0x16a/0x32f
[ 140.248002] [<ffffffff8136a452>] ? system_call_fastpath+0x16/0x1b
[ 140.248002] Code: 48 89 da 48 c7 c6 1a a8 0a 81 e8 cc e4 ff ff 85 c0 74 41 4
*/
On Fri, 28 Jun 2013, Vince Weaver wrote:
> On Fri, 14 Jun 2013, Vince Weaver wrote:
>
> > OK, I haven't managed to get a small reproducible test case for the system
> > crash yet
>
> I wasted the last 2 days bisecting a 10000 syscall trace, but below is a
> 20-syscall testcase that rapidly makes a core2 machine running 3.10-rc7
> unusable.
and it turns out I might have bisected down too much, as though that
crashes my core2 system it doesn't crash newer machines.
I'm too lazy to re-bisect today, but the much longer program here:
http://web.eece.maine.edu/~vweaver/files/nmi_bug_snb.c
reliably causes the same crash on a Sandybridge machine I have running 3.9
Vince Weaver
[email protected]
http://www.eece.maine.edu/~vweaver/
On Fri, Jun 28, 2013 at 05:07:38PM -0400, Vince Weaver wrote:
> On Fri, 28 Jun 2013, Vince Weaver wrote:
>
> > On Fri, 14 Jun 2013, Vince Weaver wrote:
> >
> > > OK, I haven't managed to get a small reproducible test case for the system
> > > crash yet
> >
> > I wasted the last 2 days bisecting a 10000 syscall trace, but below is a
> > 20-syscall testcase that rapidly makes a core2 machine running 3.10-rc7
> > unusable.
>
> and it turns out I might have bisected down too much, as though that
> crashes my core2 system it doesn't crash newer machines.
>
> I'm too lazy to re-bisect today, but the much longer program here:
> http://web.eece.maine.edu/~vweaver/files/nmi_bug_snb.c
> reliably causes the same crash on a Sandybridge machine I have running 3.9
OK, so on my westmere it triggers that WARN in task_ctx_sched_out() a
_lot_ (I removed the ONCE for easier debugging earlier -- still kinda
stumped there).
Then this thing causes an RCU stall and starts triggering NMI watchdog
msgs.. so YAY! :-)
I'll see what I can find.
On Mon, 1 Jul 2013, Peter Zijlstra wrote:
> OK, so on my westmere it triggers that WARN in task_ctx_sched_out() a
> _lot_ (I removed the ONCE for easier debugging earlier -- still kinda
> stumped there).
>
> Then this thing causes an RCU stall and starts triggering NMI watchdog
> msgs.. so YAY! :-)
I'm glad it's not just me.
I'm in the process of kernel bisecting this, though all I can tell you so
far is 2.6.32 is unaffected by the bug but 3.2 and later are.
Vince
On Mon, 1 Jul 2013, Vince Weaver wrote:
> I'm in the process of kernel bisecting this, though all I can tell you so
> far is 2.6.32 is unaffected by the bug but 3.2 and later are.
well, I spent 2 days bisecting between 3.1 and 3.2 (after
wasting time bisecting between 3.0 and 3.1 by mistake)
(also, a pox on whoever introduced that ethernet .config change between
3.1. 3.2).
and of course what it gives me is
a240f76165e6255384d4bdb8139895fac7988799
perf, core: Introduce attrs to count in either host or guest mode
so the test case wasn't crashing because the auto-generated file
was setting exclude_guest and exclude_host causing perf_event_open()
to fail. If I drop setting those bits it still crashes.
I guess back to the drawing board.
Vince
OK, I have bisected this problem to the following change:
commit 8dc85d547285668e509f86c177bcd4ea055bcaaf
Author: Peter Zijlstra <[email protected]>
Date: Thu Sep 2 16:50:03 2010 +0200
perf: Multiple task contexts
Provide the infrastructure for multiple task contexts.
On Tue, Jul 02, 2013 at 11:15:59PM -0400, Vince Weaver wrote:
>
> OK, I have bisected this problem to the following change:
>
> commit 8dc85d547285668e509f86c177bcd4ea055bcaaf
> Author: Peter Zijlstra <[email protected]>
> Date: Thu Sep 2 16:50:03 2010 +0200
>
> perf: Multiple task contexts
>
> Provide the infrastructure for multiple task contexts.
>
Thanks, that narrows down the scope a bit. As it happens I was also
narrowing down on that particular area so that's good.
On Wed, Jul 03, 2013 at 09:49:29AM +0200, Peter Zijlstra wrote:
> On Tue, Jul 02, 2013 at 11:15:59PM -0400, Vince Weaver wrote:
> >
> > OK, I have bisected this problem to the following change:
> >
> > commit 8dc85d547285668e509f86c177bcd4ea055bcaaf
> > Author: Peter Zijlstra <[email protected]>
> > Date: Thu Sep 2 16:50:03 2010 +0200
> >
> > perf: Multiple task contexts
> >
> > Provide the infrastructure for multiple task contexts.
> >
>
> Thanks, that narrows down the scope a bit. As it happens I was also
> narrowing down on that particular area so that's good.
>
hi,
patch below gets rid of the warning, and probably the
crash as well
the reason seems to be when HW context is cloned based
on SW event that happened to get there because of the
HW leader, which got closed just before (fd[15] event
in the reproducer)
and the consequence is that in HW context we access
SW's cpuctx -> and thus overwriting its task_ctx variable
any time there's *sched_in routine
need to check if that does not break anything else ;-)
jirka
---
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1833bc5..1d1f030 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7465,7 +7465,7 @@ inherit_task_group(struct perf_event *event, struct task_struct *parent,
* child.
*/
- child_ctx = alloc_perf_context(event->pmu, child);
+ child_ctx = alloc_perf_context(parent_ctx->pmu, child);
if (!child_ctx)
return -ENOMEM;
On Mon, 8 Jul 2013, Jiri Olsa wrote:
> On Wed, Jul 03, 2013 at 09:49:29AM +0200, Peter Zijlstra wrote:
> > On Tue, Jul 02, 2013 at 11:15:59PM -0400, Vince Weaver wrote:
> need to check if that does not break anything else ;-)
I've applied your patch to stock 3.10 and am now unable to hang my Core2
machine that was exhibiting the problem in the first place.
I've also run the various test suites I have and haven't seen any
regressions from the patch.
Vince
On Mon, Jul 08, 2013 at 03:50:48PM +0200, Jiri Olsa wrote:
> patch below gets rid of the warning, and probably the
> crash as well
>
> the reason seems to be when HW context is cloned based
> on SW event that happened to get there because of the
> HW leader, which got closed just before (fd[15] event
> in the reproducer)
>
> and the consequence is that in HW context we access
> SW's cpuctx -> and thus overwriting its task_ctx variable
> any time there's *sched_in routine
>
> need to check if that does not break anything else ;-)
>
> jirka
>
> ---
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 1833bc5..1d1f030 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7465,7 +7465,7 @@ inherit_task_group(struct perf_event *event, struct task_struct *parent,
> * child.
> */
>
> - child_ctx = alloc_perf_context(event->pmu, child);
> + child_ctx = alloc_perf_context(parent_ctx->pmu, child);
> if (!child_ctx)
> return -ENOMEM;
>
Ha! very good catch. Thanks!
On Mon, Jul 08, 2013 at 03:50:48PM +0200, Jiri Olsa wrote:
> need to check if that does not break anything else ;-)
Vince's test-case triggered the below; so there might still be a few loose
ends.
[ 324.983534] ------------[ cut here ]------------
[ 324.984420] WARNING: at /usr/src/linux-2.6/kernel/events/core.c:1953 __perf_event_enable+0x187/0x190()
[ 324.984420] Modules linked in:
[ 324.984420] CPU: 19 PID: 2715 Comm: nmi_bug_snb Not tainted 3.10.0+ #246
[ 324.984420] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
[ 324.984420] 0000000000000009 ffff88043fce3ec8 ffffffff8160ea0b ffff88043fce3f00
[ 324.984420] ffffffff81080ff0 ffff8802314fdc00 ffff880231a8f800 ffff88043fcf7860
[ 324.984420] 0000000000000286 ffff880231a8f800 ffff88043fce3f10 ffffffff8108103a
[ 324.984420] Call Trace:
[ 324.984420] <IRQ> [<ffffffff8160ea0b>] dump_stack+0x19/0x1b
[ 324.984420] [<ffffffff81080ff0>] warn_slowpath_common+0x70/0xa0
[ 324.984420] [<ffffffff8108103a>] warn_slowpath_null+0x1a/0x20
[ 324.984420] [<ffffffff81134437>] __perf_event_enable+0x187/0x190
[ 324.984420] [<ffffffff81130030>] remote_function+0x40/0x50
[ 324.984420] [<ffffffff810e51de>] generic_smp_call_function_single_interrupt+0xbe/0x130
[ 324.984420] [<ffffffff81066a47>] smp_call_function_single_interrupt+0x27/0x40
[ 324.984420] [<ffffffff8161fd2f>] call_function_single_interrupt+0x6f/0x80
[ 324.984420] <EOI> [<ffffffff816161a1>] ? _raw_spin_unlock_irqrestore+0x41/0x70
[ 324.984420] [<ffffffff8113799d>] perf_event_exit_task+0x14d/0x210
[ 324.984420] [<ffffffff810acd04>] ? switch_task_namespaces+0x24/0x60
[ 324.984420] [<ffffffff81086946>] do_exit+0x2b6/0xa40
[ 324.984420] [<ffffffff8161615c>] ? _raw_spin_unlock_irq+0x2c/0x30
[ 324.984420] [<ffffffff81087279>] do_group_exit+0x49/0xc0
[ 324.984420] [<ffffffff81096854>] get_signal_to_deliver+0x254/0x620
[ 324.984420] [<ffffffff81043057>] do_signal+0x57/0x5a0
[ 324.984420] [<ffffffff8161a164>] ? __do_page_fault+0x2a4/0x4e0
[ 324.984420] [<ffffffff8161665c>] ? retint_restore_args+0xe/0xe
[ 324.984420] [<ffffffff816166cd>] ? retint_signal+0x11/0x84
[ 324.984420] [<ffffffff81043605>] do_notify_resume+0x65/0x80
[ 324.984420] [<ffffffff81616702>] retint_signal+0x46/0x84
[ 324.984420] ---[ end trace 442ec2f04db3771a ]---
On Mon, Jul 08, 2013 at 06:50:23PM +0200, Peter Zijlstra wrote:
>
> [ 324.983534] ------------[ cut here ]------------
> [ 324.984420] WARNING: at /usr/src/linux-2.6/kernel/events/core.c:1953 __perf_event_enable+0x187/0x190()
> [ 324.984420] Modules linked in:
> [ 324.984420] CPU: 19 PID: 2715 Comm: nmi_bug_snb Not tainted 3.10.0+ #246
> [ 324.984420] Hardware name: Supermicro X8DTN/X8DTN, BIOS 4.6.3 01/08/2010
> [ 324.984420] 0000000000000009 ffff88043fce3ec8 ffffffff8160ea0b ffff88043fce3f00
> [ 324.984420] ffffffff81080ff0 ffff8802314fdc00 ffff880231a8f800 ffff88043fcf7860
> [ 324.984420] 0000000000000286 ffff880231a8f800 ffff88043fce3f10 ffffffff8108103a
> [ 324.984420] Call Trace:
> [ 324.984420] <IRQ> [<ffffffff8160ea0b>] dump_stack+0x19/0x1b
> [ 324.984420] [<ffffffff81080ff0>] warn_slowpath_common+0x70/0xa0
> [ 324.984420] [<ffffffff8108103a>] warn_slowpath_null+0x1a/0x20
> [ 324.984420] [<ffffffff81134437>] __perf_event_enable+0x187/0x190
> [ 324.984420] [<ffffffff81130030>] remote_function+0x40/0x50
> [ 324.984420] [<ffffffff810e51de>] generic_smp_call_function_single_interrupt+0xbe/0x130
> [ 324.984420] [<ffffffff81066a47>] smp_call_function_single_interrupt+0x27/0x40
> [ 324.984420] [<ffffffff8161fd2f>] call_function_single_interrupt+0x6f/0x80
> [ 324.984420] <EOI> [<ffffffff816161a1>] ? _raw_spin_unlock_irqrestore+0x41/0x70
> [ 324.984420] [<ffffffff8113799d>] perf_event_exit_task+0x14d/0x210
> [ 324.984420] [<ffffffff810acd04>] ? switch_task_namespaces+0x24/0x60
> [ 324.984420] [<ffffffff81086946>] do_exit+0x2b6/0xa40
> [ 324.984420] [<ffffffff8161615c>] ? _raw_spin_unlock_irq+0x2c/0x30
> [ 324.984420] [<ffffffff81087279>] do_group_exit+0x49/0xc0
> [ 324.984420] [<ffffffff81096854>] get_signal_to_deliver+0x254/0x620
> [ 324.984420] [<ffffffff81043057>] do_signal+0x57/0x5a0
> [ 324.984420] [<ffffffff8161a164>] ? __do_page_fault+0x2a4/0x4e0
> [ 324.984420] [<ffffffff8161665c>] ? retint_restore_args+0xe/0xe
> [ 324.984420] [<ffffffff816166cd>] ? retint_signal+0x11/0x84
> [ 324.984420] [<ffffffff81043605>] do_notify_resume+0x65/0x80
> [ 324.984420] [<ffffffff81616702>] retint_signal+0x46/0x84
> [ 324.984420] ---[ end trace 442ec2f04db3771a ]---
OK, this looks like an unrelated issue. Still wants fixing though.
It looks like we get an IPI from perf_event_enable() right after we release
child_ctx->lock in perf_event_exit_task_context().
I don't see what we could do here other than write a comment and remove the
WARN. It seems a valid, albeit, unlikely scenario.