2015-06-12 03:09:46

by Vince Weaver

[permalink] [raw]
Subject: perf: aux area related crash and warnings


The fuzzer turned these up (this is 4.1-rc7 with the fasync patch
applied) on a Haswell. I'm listing the crash first, but the warning
happened earlier, not sure if it is related.

[36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
[36298.992915] lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4
[36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G W 4.1.0-rc7+ #155
[36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[36299.020606] ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054
[36299.029199] ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0
[36299.037796] ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0
[36299.046338] Call Trace:
[36299.049501] <NMI> [<ffffffff816d7229>] dump_stack+0x45/0x57
[36299.056284] [<ffffffff810c2895>] spin_dump+0x85/0xe0
[36299.062282] [<ffffffff810c2916>] spin_bug+0x26/0x30
[36299.068111] [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
[36299.074897] [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
[36299.081276] [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
[36299.088340] [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
[36299.095182] [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
[36299.102291] [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
[36299.108987] [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
[36299.115850] [<ffffffff8117aa47>] __free_pages+0x37/0x50
[36299.121991] [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0
[36299.128034] [<ffffffff8116b0e7>] perf_aux_output_end+0xb7/0xf0
[36299.134793] [<ffffffff81037b0e>] intel_bts_interrupt+0x8e/0xd0
[36299.141543] [<ffffffff810338bf>] intel_pmu_handle_irq+0x4f/0x450
[36299.148482] [<ffffffff810bc288>] ? check_chain_key+0x128/0x1e0
[36299.155249] [<ffffffff8102a4ab>] perf_event_nmi_handler+0x2b/0x50
[36299.162273] [<ffffffff810185d0>] nmi_handle+0xa0/0x150
[36299.168278] [<ffffffff81018535>] ? nmi_handle+0x5/0x150
[36299.174377] [<ffffffff8101887a>] default_do_nmi+0x4a/0x140
[36299.180735] [<ffffffff81018a08>] do_nmi+0x98/0xe0
[36299.186219] [<ffffffff816e13ef>] end_repeat_nmi+0x1e/0x2e
[36299.192501] [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
[36299.199951] [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
[36299.207410] [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
[36299.214898] <<EOE>> [<ffffffff810bdd89>] ? __lock_acquire.isra.31+0x3b9/0x1000

and while I was trying to cut and paste that, the locked haswell just took
down the network switch so I can't get the rest until tomorrow.

The warning was
[27716.785131] WARNING: CPU: 2 PID: 17655 at kernel/events/ring_buffer.c:282 perf_aux_output_begin+0x1ce/0x1f0()

which corresponds to
/*
* Nesting is not supported for AUX area, make sure nested
* writers are caught early
*/
if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1)))
goto err_put;

again just lost access to the machine with the serial console, for the
full backtrace it will have to wait until I'm not remote.

Vince


2015-06-12 18:36:50

by Vince Weaver

[permalink] [raw]
Subject: Re: perf: aux area related crash and warnings

On Thu, 11 Jun 2015, Vince Weaver wrote:

> and while I was trying to cut and paste that, the locked haswell just took
> down the network switch so I can't get the rest until tomorrow.

here are the full dumps if anyone is interested

the warning is reproducible, the spinlock disaster is not.

[36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
[36298.992915] lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4
[36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G W 4.1.0-rc7+ #155
[36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[36299.020606] ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054
[36299.029199] ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0
[36299.037796] ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0
[36299.046338] Call Trace:
[36299.049501] <NMI> [<ffffffff816d7229>] dump_stack+0x45/0x57
[36299.056284] [<ffffffff810c2895>] spin_dump+0x85/0xe0
[36299.062282] [<ffffffff810c2916>] spin_bug+0x26/0x30
[36299.068111] [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
[36299.074897] [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
[36299.081276] [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
[36299.088340] [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
[36299.095182] [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
[36299.102291] [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
[36299.108987] [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
[36299.115850] [<ffffffff8117aa47>] __free_pages+0x37/0x50
[36299.121991] [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0
[36299.128034] [<ffffffff8116b0e7>] perf_aux_output_end+0xb7/0xf0
[36299.134793] [<ffffffff81037b0e>] intel_bts_interrupt+0x8e/0xd0
[36299.141543] [<ffffffff810338bf>] intel_pmu_handle_irq+0x4f/0x450
[36299.148482] [<ffffffff810bc288>] ? check_chain_key+0x128/0x1e0
[36299.155249] [<ffffffff8102a4ab>] perf_event_nmi_handler+0x2b/0x50
[36299.162273] [<ffffffff810185d0>] nmi_handle+0xa0/0x150
[36299.168278] [<ffffffff81018535>] ? nmi_handle+0x5/0x150
[36299.174377] [<ffffffff8101887a>] default_do_nmi+0x4a/0x140
[36299.180735] [<ffffffff81018a08>] do_nmi+0x98/0xe0
[36299.186219] [<ffffffff816e13ef>] end_repeat_nmi+0x1e/0x2e
[36299.192501] [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
[36299.199951] [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
[36299.207410] [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
[36299.214898] <<EOE>> [<ffffffff810bdd89>] ? __lock_acquire.isra.31+0x3b9/0x1000
[36299.223222] [<ffffffff810bf365>] lock_acquire+0xa5/0x130
[36299.229409] [<ffffffff8117b379>] ? __rmqueue+0x109/0x510
[36299.235551] [<ffffffff8117b379>] ? __rmqueue+0x109/0x510
[36299.241720] [<ffffffff8117810b>] bad_range+0x6b/0x150
[36299.247612] [<ffffffff8117b379>] ? __rmqueue+0x109/0x510
[36299.253757] [<ffffffff8116a8c5>] ? perf_output_begin+0x5/0x270
[36299.260489] [<ffffffff8117b379>] __rmqueue+0x109/0x510
[36299.266501] [<ffffffff8117bf63>] get_page_from_freelist+0x7e3/0xa50
[36299.273680] [<ffffffff8117c4cb>] __alloc_pages_nodemask+0x1fb/0xa90
[36299.280872] [<ffffffff811c416b>] ? alloc_pages_vma+0x10b/0x290
[36299.287586] [<ffffffff810bdd89>] ? __lock_acquire.isra.31+0x3b9/0x1000
[36299.295017] [<ffffffff811c416b>] alloc_pages_vma+0x10b/0x290
[36299.301544] [<ffffffff8119f1a4>] ? wp_page_copy.isra.51+0x84/0x4f0
[36299.308613] [<ffffffff8119f1a4>] wp_page_copy.isra.51+0x84/0x4f0
[36299.315506] [<ffffffff811a0d94>] ? do_wp_page+0x84/0x6a0
[36299.321633] [<ffffffff811a0dad>] do_wp_page+0x9d/0x6a0
[36299.327533] [<ffffffff811a2b69>] ? handle_mm_fault+0x309/0x1770
[36299.334245] [<ffffffff811a364c>] handle_mm_fault+0xdec/0x1770
[36299.340854] [<ffffffff81063034>] __do_page_fault+0x1b4/0x460
[36299.347337] [<ffffffff8106334e>] trace_do_page_fault+0x3e/0xc0
[36299.354021] [<ffffffff816e1068>] trace_page_fault+0x28/0x30
[36315.086708] ------------[ cut here ]------------


and

[27716.785131] WARNING: CPU: 2 PID: 17655 at kernel/events/ring_buffer.c:282 perf_aux_output_begin+0x1ce/0x1f0()
[27716.796157] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec aesni_intel snd_hda_core aes_x86_64 iTCO_wdt lrw snd_hwdep gf128mul snd_pcm iTCO_vendor_support ppdev i915snd_timer glue_helper evdev ablk_helper snd cryptd mei_me mei psmouse pcspkr soundcore serio_raw lpc_ich drm_kms_helper parport_pc battery tpm_tis mfd_core parport drm video tpm i2c_i801 i2c_algo_bit wmi processor button sg sr_mod cdrom sd_mod ehci_pci ehci_hcd ahci libahci libata xhci_pci xhci_hcd e1000e ptp scsi_mod crc32c_intel usbcore pps_core usb_common thermal fan thermal_sys
[27716.873350] CPU: 2 PID: 17655 Comm: perf_fuzzer Tainted: G W 4.1.0-rc7+ #155
[27716.882671] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[27716.891168] ffffffff81a230f2 ffff8800cf9afc68 ffffffff816d7229 0000000000000000
[27716.899782] 0000000000000000 ffff8800cf9afca8 ffffffff81072eba 000000000000cf80
[27716.908372] ffff880036d85a00 ffff88011ea8cf80 ffff880119106000 000000000000cf80
[27716.916981] Call Trace:
[27716.920180] [<ffffffff816d7229>] dump_stack+0x45/0x57
[27716.926250] [<ffffffff81072eba>] warn_slowpath_common+0x8a/0xc0
[27716.933244] [<ffffffff81072faa>] warn_slowpath_null+0x1a/0x20
[27716.939999] [<ffffffff8116b00e>] perf_aux_output_begin+0x1ce/0x1f0
[27716.947225] [<ffffffff81164785>] ? perf_event_update_userpage+0x5/0x170
[27716.954875] [<ffffffff8103767d>] bts_event_add+0x6d/0x150
[27716.961259] [<ffffffff81162ddb>] event_sched_in.isra.85+0xfb/0x330
[27716.968495] [<ffffffff811630ca>] group_sched_in+0xba/0x1e0
[27716.974957] [<ffffffff8101df2a>] ? native_sched_clock+0x2a/0x90
[27716.981888] [<ffffffff811633c0>] ctx_sched_in+0x1d0/0x3f0
[27716.988274] [<ffffffff81163649>] perf_event_sched_in+0x69/0xa0
[27716.995082] [<ffffffff81163bc0>] perf_event_context_sched_in+0x90/0x160
[27717.002714] [<ffffffff81164148>] __perf_event_task_sched_in+0xa8/0xc0
[27717.010160] [<ffffffff8109897d>] finish_task_switch+0xed/0x140
[27717.016988] [<ffffffff81012723>] ? __switch_to+0x263/0x5d0
[27717.023400] [<ffffffff8109e5a1>] schedule_tail+0x11/0xa0
[27717.029652] [<ffffffff816df57f>] ret_from_fork+0xf/0x70
[27717.035806] ---[ end trace bdbfeef5d81e5fa4 ]---

which corresponds to
/*
* Nesting is not supported for AUX area, make sure nested
* writers are caught early
*/
if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1)))
goto err_put;

2015-06-15 12:21:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf: aux area related crash and warnings


Alex, any clue?

On Fri, Jun 12, 2015 at 02:42:36PM -0400, Vince Weaver wrote:
> On Thu, 11 Jun 2015, Vince Weaver wrote:
>
> > and while I was trying to cut and paste that, the locked haswell just took
> > down the network switch so I can't get the rest until tomorrow.
>
> here are the full dumps if anyone is interested
>
> the warning is reproducible, the spinlock disaster is not.
>
> [36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
> [36298.992915] lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4
> [36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G W 4.1.0-rc7+ #155
> [36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [36299.020606] ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054
> [36299.029199] ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0
> [36299.037796] ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0
> [36299.046338] Call Trace:
> [36299.049501] <NMI> [<ffffffff816d7229>] dump_stack+0x45/0x57
> [36299.056284] [<ffffffff810c2895>] spin_dump+0x85/0xe0
> [36299.062282] [<ffffffff810c2916>] spin_bug+0x26/0x30
> [36299.068111] [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
> [36299.074897] [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
> [36299.081276] [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
> [36299.088340] [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
> [36299.095182] [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
> [36299.102291] [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
> [36299.108987] [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
> [36299.115850] [<ffffffff8117aa47>] __free_pages+0x37/0x50
> [36299.121991] [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0
> [36299.128034] [<ffffffff8116b0e7>] perf_aux_output_end+0xb7/0xf0
> [36299.134793] [<ffffffff81037b0e>] intel_bts_interrupt+0x8e/0xd0
> [36299.141543] [<ffffffff810338bf>] intel_pmu_handle_irq+0x4f/0x450
> [36299.148482] [<ffffffff810bc288>] ? check_chain_key+0x128/0x1e0
> [36299.155249] [<ffffffff8102a4ab>] perf_event_nmi_handler+0x2b/0x50
> [36299.162273] [<ffffffff810185d0>] nmi_handle+0xa0/0x150
> [36299.168278] [<ffffffff81018535>] ? nmi_handle+0x5/0x150
> [36299.174377] [<ffffffff8101887a>] default_do_nmi+0x4a/0x140
> [36299.180735] [<ffffffff81018a08>] do_nmi+0x98/0xe0
> [36299.186219] [<ffffffff816e13ef>] end_repeat_nmi+0x1e/0x2e
> [36299.192501] [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
> [36299.199951] [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
> [36299.207410] [<ffffffff810bdc4e>] ? __lock_acquire.isra.31+0x27e/0x1000
> [36299.214898] <<EOE>> [<ffffffff810bdd89>] ? __lock_acquire.isra.31+0x3b9/0x1000
> [36299.223222] [<ffffffff810bf365>] lock_acquire+0xa5/0x130
> [36299.229409] [<ffffffff8117b379>] ? __rmqueue+0x109/0x510
> [36299.235551] [<ffffffff8117b379>] ? __rmqueue+0x109/0x510
> [36299.241720] [<ffffffff8117810b>] bad_range+0x6b/0x150
> [36299.247612] [<ffffffff8117b379>] ? __rmqueue+0x109/0x510
> [36299.253757] [<ffffffff8116a8c5>] ? perf_output_begin+0x5/0x270
> [36299.260489] [<ffffffff8117b379>] __rmqueue+0x109/0x510
> [36299.266501] [<ffffffff8117bf63>] get_page_from_freelist+0x7e3/0xa50
> [36299.273680] [<ffffffff8117c4cb>] __alloc_pages_nodemask+0x1fb/0xa90
> [36299.280872] [<ffffffff811c416b>] ? alloc_pages_vma+0x10b/0x290
> [36299.287586] [<ffffffff810bdd89>] ? __lock_acquire.isra.31+0x3b9/0x1000
> [36299.295017] [<ffffffff811c416b>] alloc_pages_vma+0x10b/0x290
> [36299.301544] [<ffffffff8119f1a4>] ? wp_page_copy.isra.51+0x84/0x4f0
> [36299.308613] [<ffffffff8119f1a4>] wp_page_copy.isra.51+0x84/0x4f0
> [36299.315506] [<ffffffff811a0d94>] ? do_wp_page+0x84/0x6a0
> [36299.321633] [<ffffffff811a0dad>] do_wp_page+0x9d/0x6a0
> [36299.327533] [<ffffffff811a2b69>] ? handle_mm_fault+0x309/0x1770
> [36299.334245] [<ffffffff811a364c>] handle_mm_fault+0xdec/0x1770
> [36299.340854] [<ffffffff81063034>] __do_page_fault+0x1b4/0x460
> [36299.347337] [<ffffffff8106334e>] trace_do_page_fault+0x3e/0xc0
> [36299.354021] [<ffffffff816e1068>] trace_page_fault+0x28/0x30
> [36315.086708] ------------[ cut here ]------------
>
>
> and
>
> [27716.785131] WARNING: CPU: 2 PID: 17655 at kernel/events/ring_buffer.c:282 perf_aux_output_begin+0x1ce/0x1f0()
> [27716.796157] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec aesni_intel snd_hda_core aes_x86_64 iTCO_wdt lrw snd_hwdep gf128mul snd_pcm iTCO_vendor_support ppdev i915snd_timer glue_helper evdev ablk_helper snd cryptd mei_me mei psmouse pcspkr soundcore serio_raw lpc_ich drm_kms_helper parport_pc battery tpm_tis mfd_core parport drm video tpm i2c_i801 i2c_algo_bit wmi processor button sg sr_mod cdrom sd_mod ehci_pci ehci_hcd ahci libahci libata xhci_pci xhci_hcd e1000e ptp scsi_mod crc32c_intel usbcore pps_core usb_common thermal fan thermal_sys
> [27716.873350] CPU: 2 PID: 17655 Comm: perf_fuzzer Tainted: G W 4.1.0-rc7+ #155
> [27716.882671] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [27716.891168] ffffffff81a230f2 ffff8800cf9afc68 ffffffff816d7229 0000000000000000
> [27716.899782] 0000000000000000 ffff8800cf9afca8 ffffffff81072eba 000000000000cf80
> [27716.908372] ffff880036d85a00 ffff88011ea8cf80 ffff880119106000 000000000000cf80
> [27716.916981] Call Trace:
> [27716.920180] [<ffffffff816d7229>] dump_stack+0x45/0x57
> [27716.926250] [<ffffffff81072eba>] warn_slowpath_common+0x8a/0xc0
> [27716.933244] [<ffffffff81072faa>] warn_slowpath_null+0x1a/0x20
> [27716.939999] [<ffffffff8116b00e>] perf_aux_output_begin+0x1ce/0x1f0
> [27716.947225] [<ffffffff81164785>] ? perf_event_update_userpage+0x5/0x170
> [27716.954875] [<ffffffff8103767d>] bts_event_add+0x6d/0x150
> [27716.961259] [<ffffffff81162ddb>] event_sched_in.isra.85+0xfb/0x330
> [27716.968495] [<ffffffff811630ca>] group_sched_in+0xba/0x1e0
> [27716.974957] [<ffffffff8101df2a>] ? native_sched_clock+0x2a/0x90
> [27716.981888] [<ffffffff811633c0>] ctx_sched_in+0x1d0/0x3f0
> [27716.988274] [<ffffffff81163649>] perf_event_sched_in+0x69/0xa0
> [27716.995082] [<ffffffff81163bc0>] perf_event_context_sched_in+0x90/0x160
> [27717.002714] [<ffffffff81164148>] __perf_event_task_sched_in+0xa8/0xc0
> [27717.010160] [<ffffffff8109897d>] finish_task_switch+0xed/0x140
> [27717.016988] [<ffffffff81012723>] ? __switch_to+0x263/0x5d0
> [27717.023400] [<ffffffff8109e5a1>] schedule_tail+0x11/0xa0
> [27717.029652] [<ffffffff816df57f>] ret_from_fork+0xf/0x70
> [27717.035806] ---[ end trace bdbfeef5d81e5fa4 ]---
>
> which corresponds to
> /*
> * Nesting is not supported for AUX area, make sure nested
> * writers are caught early
> */
> if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1)))
> goto err_put;
>

2015-06-15 12:49:30

by Alexander Shishkin

[permalink] [raw]
Subject: Re: perf: aux area related crash and warnings

Peter Zijlstra <[email protected]> writes:

> Alex, any clue?

Let me look into it. Definitely haven't seen anything like that in my
tests.

> On Fri, Jun 12, 2015 at 02:42:36PM -0400, Vince Weaver wrote:
>> On Thu, 11 Jun 2015, Vince Weaver wrote:
>>
>> > and while I was trying to cut and paste that, the locked haswell just took
>> > down the network switch so I can't get the rest until tomorrow.
>>
>> here are the full dumps if anyone is interested
>>
>> the warning is reproducible, the spinlock disaster is not.
>>
>> [36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
>> [36298.992915] lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4
>> [36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G W 4.1.0-rc7+ #155
>> [36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
>> [36299.020606] ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054
>> [36299.029199] ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0
>> [36299.037796] ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0
>> [36299.046338] Call Trace:
>> [36299.049501] <NMI> [<ffffffff816d7229>] dump_stack+0x45/0x57
>> [36299.056284] [<ffffffff810c2895>] spin_dump+0x85/0xe0
>> [36299.062282] [<ffffffff810c2916>] spin_bug+0x26/0x30
>> [36299.068111] [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
>> [36299.074897] [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
>> [36299.081276] [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
>> [36299.088340] [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
>> [36299.095182] [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
>> [36299.102291] [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
>> [36299.108987] [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
>> [36299.115850] [<ffffffff8117aa47>] __free_pages+0x37/0x50
>> [36299.121991] [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0

This one goes to free aux pages from nmi context, looks like aux buffer
was unmapped while the event was running, so here it dropped the last
reference.

>> [36299.128034] [<ffffffff8116b0e7>] perf_aux_output_end+0xb7/0xf0
>> [36299.134793] [<ffffffff81037b0e>] intel_bts_interrupt+0x8e/0xd0
>> [36299.141543] [<ffffffff810338bf>] intel_pmu_handle_irq+0x4f/0x450
>> [36299.148482] [<ffffffff810bc288>] ? check_chain_key+0x128/0x1e0
>> [36299.155249] [<ffffffff8102a4ab>] perf_event_nmi_handler+0x2b/0x50
>> [36299.162273] [<ffffffff810185d0>] nmi_handle+0xa0/0x150

[snip]

>>
>> and
>>
>> [27716.785131] WARNING: CPU: 2 PID: 17655 at kernel/events/ring_buffer.c:282 perf_aux_output_begin+0x1ce/0x1f0()
>> [27716.796157] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_controller crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec aesni_intel snd_hda_core aes_x86_64 iTCO_wdt lrw snd_hwdep gf128mul snd_pcm iTCO_vendor_support ppdev i915snd_timer glue_helper evdev ablk_helper snd cryptd mei_me mei psmouse pcspkr soundcore serio_raw lpc_ich drm_kms_helper parport_pc battery tpm_tis mfd_core parport drm video tpm i2c_i801 i2c_algo_bit wmi processor button sg sr_mod cdrom sd_mod ehci_pci ehci_hcd ahci libahci libata xhci_pci xhci_hcd e1000e ptp scsi_mod crc32c_intel usbcore pps_core usb_common thermal fan thermal_sys
>> [27716.873350] CPU: 2 PID: 17655 Comm: perf_fuzzer Tainted: G W 4.1.0-rc7+ #155
>> [27716.882671] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
>> [27716.891168] ffffffff81a230f2 ffff8800cf9afc68 ffffffff816d7229 0000000000000000
>> [27716.899782] 0000000000000000 ffff8800cf9afca8 ffffffff81072eba 000000000000cf80
>> [27716.908372] ffff880036d85a00 ffff88011ea8cf80 ffff880119106000 000000000000cf80
>> [27716.916981] Call Trace:
>> [27716.920180] [<ffffffff816d7229>] dump_stack+0x45/0x57
>> [27716.926250] [<ffffffff81072eba>] warn_slowpath_common+0x8a/0xc0
>> [27716.933244] [<ffffffff81072faa>] warn_slowpath_null+0x1a/0x20
>> [27716.939999] [<ffffffff8116b00e>] perf_aux_output_begin+0x1ce/0x1f0
>> [27716.947225] [<ffffffff81164785>] ? perf_event_update_userpage+0x5/0x170
>> [27716.954875] [<ffffffff8103767d>] bts_event_add+0x6d/0x150
>> [27716.961259] [<ffffffff81162ddb>] event_sched_in.isra.85+0xfb/0x330
>> [27716.968495] [<ffffffff811630ca>] group_sched_in+0xba/0x1e0
>> [27716.974957] [<ffffffff8101df2a>] ? native_sched_clock+0x2a/0x90
>> [27716.981888] [<ffffffff811633c0>] ctx_sched_in+0x1d0/0x3f0
>> [27716.988274] [<ffffffff81163649>] perf_event_sched_in+0x69/0xa0
>> [27716.995082] [<ffffffff81163bc0>] perf_event_context_sched_in+0x90/0x160
>> [27717.002714] [<ffffffff81164148>] __perf_event_task_sched_in+0xa8/0xc0
>> [27717.010160] [<ffffffff8109897d>] finish_task_switch+0xed/0x140
>> [27717.016988] [<ffffffff81012723>] ? __switch_to+0x263/0x5d0
>> [27717.023400] [<ffffffff8109e5a1>] schedule_tail+0x11/0xa0
>> [27717.029652] [<ffffffff816df57f>] ret_from_fork+0xf/0x70
>> [27717.035806] ---[ end trace bdbfeef5d81e5fa4 ]---
>>
>> which corresponds to
>> /*
>> * Nesting is not supported for AUX area, make sure nested
>> * writers are caught early
>> */
>> if (WARN_ON_ONCE(local_xchg(&rb->aux_nest, 1)))
>> goto err_put;
>>

This really shouldn't happen, I'd need to check with the fuzzer.

Thanks,
--
Alex

2015-06-15 20:24:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: perf: aux area related crash and warnings


* Alexander Shishkin <[email protected]> wrote:

> Peter Zijlstra <[email protected]> writes:
>
> > Alex, any clue?
>
> Let me look into it. Definitely haven't seen anything like that in my
> tests.

That's natural: Vince is running randomize fuzzing tests, so you should look out
for boundary conditions and 'nonsensical' values that won't normally trigger in
functional testing.

In fact Vince is using 'directed fuzzing': i.e. the fuzzer is aware of the general
perf ABI structure and will try to generate partially valid, partially randomized
requests, to be able to test 'leaf' functionality of the perf ABI as well, which
would otherwise need astronomical odds to occur in a pure fuzzing test.

These crashes started popping up when Vince added 'AUX area awareness' to the
fuzzer.

> >> [36299.068111] [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
> >> [36299.074897] [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
> >> [36299.081276] [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
> >> [36299.088340] [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
> >> [36299.095182] [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
> >> [36299.102291] [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
> >> [36299.108987] [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
> >> [36299.115850] [<ffffffff8117aa47>] __free_pages+0x37/0x50
> >> [36299.121991] [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0
>
> This one goes to free aux pages from nmi context, looks like aux buffer was
> unmapped while the event was running, so here it dropped the last reference.

Yeah, that in itself is an absolute no-no - so I guess refcounting went wrong
somewhere? (assuming it exists properly).

Thanks,

Ingo

2015-06-16 11:37:17

by Alexander Shishkin

[permalink] [raw]
Subject: Re: perf: aux area related crash and warnings

Alexander Shishkin <[email protected]> writes:

> Peter Zijlstra <[email protected]> writes:
>
>> Alex, any clue?
>
> Let me look into it. Definitely haven't seen anything like that in my
> tests.
>
>> On Fri, Jun 12, 2015 at 02:42:36PM -0400, Vince Weaver wrote:
>>> On Thu, 11 Jun 2015, Vince Weaver wrote:
>>>
>>> > and while I was trying to cut and paste that, the locked haswell just took
>>> > down the network switch so I can't get the rest until tomorrow.
>>>
>>> here are the full dumps if anyone is interested
>>>
>>> the warning is reproducible, the spinlock disaster is not.
>>>
>>> [36298.986117] BUG: spinlock recursion on CPU#4, perf_fuzzer/3410
>>> [36298.992915] lock: 0xffff88011edf7cd0, .magic: dead4ead, .owner: perf_fuzzer/3410, .owner_cpu: 4
>>> [36299.002919] CPU: 4 PID: 3410 Comm: perf_fuzzer Tainted: G W 4.1.0-rc7+ #155
>>> [36299.012152] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
>>> [36299.020606] ffff88011edf7cd0 ffff88011eb059a0 ffffffff816d7229 0000000000000054
>>> [36299.029199] ffff8800c2f4ac50 ffff88011eb059c0 ffffffff810c2895 ffff88011edf7cd0
>>> [36299.037796] ffffffff81a1e481 ffff88011eb059e0 ffffffff810c2916 ffff88011edf7cd0
>>> [36299.046338] Call Trace:
>>> [36299.049501] <NMI> [<ffffffff816d7229>] dump_stack+0x45/0x57
>>> [36299.056284] [<ffffffff810c2895>] spin_dump+0x85/0xe0
>>> [36299.062282] [<ffffffff810c2916>] spin_bug+0x26/0x30
>>> [36299.068111] [<ffffffff810c2acf>] do_raw_spin_lock+0x13f/0x180
>>> [36299.074897] [<ffffffff816de6e9>] _raw_spin_lock+0x39/0x40
>>> [36299.081276] [<ffffffff8117a039>] ? free_pcppages_bulk+0x39/0x620
>>> [36299.088340] [<ffffffff8117a039>] free_pcppages_bulk+0x39/0x620
>>> [36299.095182] [<ffffffff81177e14>] ? free_pages_prepare+0x3a4/0x550
>>> [36299.102291] [<ffffffff811c9936>] ? kfree_debugcheck+0x16/0x40
>>> [36299.108987] [<ffffffff8117a938>] free_hot_cold_page+0x178/0x1a0
>>> [36299.115850] [<ffffffff8117aa47>] __free_pages+0x37/0x50
>>> [36299.121991] [<ffffffff8116ae0a>] rb_free_aux+0xba/0xf0
>
> This one goes to free aux pages from nmi context, looks like aux buffer
> was unmapped while the event was running, so here it dropped the last
> reference.

Ok, here's what I propose for this one.

>From b8cd18bc440c318e8b30825bf654c815b42aa1e0 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <[email protected]>
Date: Tue, 16 Jun 2015 14:14:04 +0300
Subject: [PATCH] perf: Free AUX area pages in rcu callback

Currently, if the user unmaps AUX area while the corresponding event
is active, perf_aux_output_end() may be the last one to drop the aux
area refcount, and end up freeing the pages in NMI context or scheduler's
fast path. Same can happen in the error path of perf_aux_output_begin().

To avoid the bug, this patch moves actual freeing code to rb_free_rcu(),
which will know whether it is called for AUX area or the ring buffer
proper and act accordingly.

Signed-off-by: Alexander Shishkin <[email protected]>
Reported-by: Vince Weaver <[email protected]>
---
kernel/events/core.c | 31 ++++++++++++++++++++++++++++++-
kernel/events/internal.h | 1 +
kernel/events/ring_buffer.c | 8 +-------
3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index eddf1ed415..5f1cc5976f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4381,7 +4381,36 @@ static void rb_free_rcu(struct rcu_head *rcu_head)
struct ring_buffer *rb;

rb = container_of(rcu_head, struct ring_buffer, rcu_head);
- rb_free(rb);
+
+ /*
+ * are we called for AUX or the rb:
+ * AUX always goes first, then if rb::refcount drops to zero,
+ * free rb synchronously
+ */
+ if (atomic_read(&rb->refcount)) {
+ __rb_free_aux(rb);
+
+ /* matches the increment in rb_free_aux() */
+ if (atomic_dec_and_test(&rb->refcount))
+ rb_free(rb);
+ } else {
+ rb_free(rb);
+ }
+}
+
+void rb_free_aux(struct ring_buffer *rb)
+{
+ /*
+ * hold rb::refcount to make sure rb doesn't disappear
+ * before aux pages are freed
+ */
+ if (WARN_ON_ONCE(!atomic_inc_not_zero(&rb->refcount)))
+ return;
+
+ if (atomic_dec_and_test(&rb->aux_refcount))
+ call_rcu(&rb->rcu_head, rb_free_rcu);
+ else
+ ring_buffer_put(rb); /* matches the increment above */
}

struct ring_buffer *ring_buffer_get(struct perf_event *event)
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 9f6ce9ba4a..7f8242ed85 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -61,6 +61,7 @@ extern void perf_event_wakeup(struct perf_event *event);
extern int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
pgoff_t pgoff, int nr_pages, long watermark, int flags);
extern void rb_free_aux(struct ring_buffer *rb);
+extern void __rb_free_aux(struct ring_buffer *rb);
extern struct ring_buffer *ring_buffer_get(struct perf_event *event);
extern void ring_buffer_put(struct ring_buffer *rb);

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 725c416085..343121e943 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -537,7 +537,7 @@ out:
return ret;
}

-static void __rb_free_aux(struct ring_buffer *rb)
+void __rb_free_aux(struct ring_buffer *rb)
{
int pg;

@@ -554,12 +554,6 @@ static void __rb_free_aux(struct ring_buffer *rb)
rb->aux_nr_pages = 0;
}

-void rb_free_aux(struct ring_buffer *rb)
-{
- if (atomic_dec_and_test(&rb->aux_refcount))
- __rb_free_aux(rb);
-}
-
#ifndef CONFIG_PERF_USE_VMALLOC

/*
--
2.1.4

2015-06-18 09:10:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: perf: aux area related crash and warnings


* Alexander Shishkin <[email protected]> wrote:

> +void rb_free_aux(struct ring_buffer *rb)
> +{
> + /*
> + * hold rb::refcount to make sure rb doesn't disappear
> + * before aux pages are freed
> + */
> + if (WARN_ON_ONCE(!atomic_inc_not_zero(&rb->refcount)))
> + return;
> +
> + if (atomic_dec_and_test(&rb->aux_refcount))
> + call_rcu(&rb->rcu_head, rb_free_rcu);
> + else
> + ring_buffer_put(rb); /* matches the increment above */

Is call_rcu() NMI-safe? I don't think so ...

I think the life time rules of this object are really messed up if they can be
freed from any fast path. How come the freeing can happen in NMI context?
Shouldn't the hardware first stop, then we can free things from the system call
path, or so?

Thanks,

Ingo

2015-06-18 09:35:41

by Alexander Shishkin

[permalink] [raw]
Subject: Re: perf: aux area related crash and warnings

Ingo Molnar <[email protected]> writes:

> * Alexander Shishkin <[email protected]> wrote:
>
>> +void rb_free_aux(struct ring_buffer *rb)
>> +{
>> + /*
>> + * hold rb::refcount to make sure rb doesn't disappear
>> + * before aux pages are freed
>> + */
>> + if (WARN_ON_ONCE(!atomic_inc_not_zero(&rb->refcount)))
>> + return;
>> +
>> + if (atomic_dec_and_test(&rb->aux_refcount))
>> + call_rcu(&rb->rcu_head, rb_free_rcu);
>> + else
>> + ring_buffer_put(rb); /* matches the increment above */
>
> Is call_rcu() NMI-safe? I don't think so ...

Actually, we couldn't establish that without Paul's help.

> I think the life time rules of this object are really messed up if
> they can be freed from any fast path. How come the freeing can happen
> in NMI context?

We ended up like this because we didn't want to stop the event once its
AUX area gets unmapped, to be consistent with the behavior of normal
perf events [1]. And somehow I completely missed the fact that this
means freeing stuff in fast path.

> Shouldn't the hardware first stop, then we can free things from the system call
> path, or so?

Yes, that's how we'll do it.

[1] http://marc.info/?l=linux-kernel&m=141017607804348

Regards,
--
Alex

2015-06-18 10:24:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf: aux area related crash and warnings

On Thu, Jun 18, 2015 at 12:35:31PM +0300, Alexander Shishkin wrote:
> Ingo Molnar <[email protected]> writes:
>
> > * Alexander Shishkin <[email protected]> wrote:
> >
> >> +void rb_free_aux(struct ring_buffer *rb)
> >> +{
> >> + /*
> >> + * hold rb::refcount to make sure rb doesn't disappear
> >> + * before aux pages are freed
> >> + */
> >> + if (WARN_ON_ONCE(!atomic_inc_not_zero(&rb->refcount)))
> >> + return;
> >> +
> >> + if (atomic_dec_and_test(&rb->aux_refcount))
> >> + call_rcu(&rb->rcu_head, rb_free_rcu);
> >> + else
> >> + ring_buffer_put(rb); /* matches the increment above */
> >
> > Is call_rcu() NMI-safe? I don't think so ...
>
> Actually, we couldn't establish that without Paul's help.

I was fairly sure it was not, Paul did confirm last night it is indeed
not.

2015-06-18 10:33:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: perf: aux area related crash and warnings

On Tue, Jun 16, 2015 at 02:37:06PM +0300, Alexander Shishkin wrote:
> Ok, here's what I propose for this one.

As discussed yesterday, I think we want to change to how we do reference
counting for the regular buffers, but that will require quite a bit of
work.

For the interim we can avoid the issue by something like the below;
compile tested only.

---
kernel/events/core.c | 8 --------
kernel/events/internal.h | 10 ++++++++++
kernel/events/ring_buffer.c | 27 +++++++++++++++++++++++++--
3 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1e33b9141f03..76b7694b7909 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4352,14 +4352,6 @@ static void ring_buffer_wakeup(struct perf_event *event)
rcu_read_unlock();
}

-static void rb_free_rcu(struct rcu_head *rcu_head)
-{
- struct ring_buffer *rb;
-
- rb = container_of(rcu_head, struct ring_buffer, rcu_head);
- rb_free(rb);
-}
-
struct ring_buffer *ring_buffer_get(struct perf_event *event)
{
struct ring_buffer *rb;
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2deb24c7a40d..2bbad9c1274c 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -11,6 +11,7 @@
struct ring_buffer {
atomic_t refcount;
struct rcu_head rcu_head;
+ struct irq_work irq_work;
#ifdef CONFIG_PERF_USE_VMALLOC
struct work_struct work;
int page_order; /* allocation order */
@@ -55,6 +56,15 @@ struct ring_buffer {
};

extern void rb_free(struct ring_buffer *rb);
+
+static inline void rb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct ring_buffer *rb;
+
+ rb = container_of(rcu_head, struct ring_buffer, rcu_head);
+ rb_free(rb);
+}
+
extern struct ring_buffer *
rb_alloc(int nr_pages, long watermark, int cpu, int flags);
extern void perf_event_wakeup(struct perf_event *event);
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 96472824a752..b2be01b1aa9d 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -221,6 +221,8 @@ void perf_output_end(struct perf_output_handle *handle)
rcu_read_unlock();
}

+static void rb_irq_work(struct irq_work *work);
+
static void
ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
{
@@ -241,6 +243,16 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)

INIT_LIST_HEAD(&rb->event_list);
spin_lock_init(&rb->event_lock);
+ init_irq_work(&rb->irq_work, rb_irq_work);
+}
+
+static void ring_buffer_put_async(struct ring_buffer *rb)
+{
+ if (!atomic_dec_and_test(&rb->refcount))
+ return;
+
+ rb->rcu_head.next = (void *)rb;
+ irq_work_queue(&rb->irq_work);
}

/*
@@ -319,7 +331,7 @@ void *perf_aux_output_begin(struct perf_output_handle *handle,
rb_free_aux(rb);

err:
- ring_buffer_put(rb);
+ ring_buffer_put_async(rb);
handle->event = NULL;

return NULL;
@@ -370,7 +382,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,

local_set(&rb->aux_nest, 0);
rb_free_aux(rb);
- ring_buffer_put(rb);
+ ring_buffer_put_async(rb);
}

/*
@@ -557,7 +569,18 @@ static void __rb_free_aux(struct ring_buffer *rb)
void rb_free_aux(struct ring_buffer *rb)
{
if (atomic_dec_and_test(&rb->aux_refcount))
+ irq_work_queue(&rb->irq_work);
+}
+
+static void rb_irq_work(struct irq_work *work)
+{
+ struct ring_buffer *rb = container_of(work, struct ring_buffer, irq_work);
+
+ if (!atomic_read(&rb->aux_refcount))
__rb_free_aux(rb);
+
+ if (rb->rcu_head.next == (void *)rb)
+ call_rcu(&rb->rcu_head, rb_free_rcu);
}

#ifndef CONFIG_PERF_USE_VMALLOC

2015-06-18 13:11:13

by Alexander Shishkin

[permalink] [raw]
Subject: Re: perf: aux area related crash and warnings

Peter Zijlstra <[email protected]> writes:

> On Tue, Jun 16, 2015 at 02:37:06PM +0300, Alexander Shishkin wrote:
>> Ok, here's what I propose for this one.
>
> As discussed yesterday, I think we want to change to how we do reference
> counting for the regular buffers, but that will require quite a bit of
> work.
>
> For the interim we can avoid the issue by something like the below;
> compile tested only.

Looks good and works fine on my tests. Only I'd maybe add a small
comment on why reusing rcu_head.next doesn't set off call_rcu()
debugging. Other than that, fwiw,

Reviewed-and-tested-by: Alexander Shishkin <[email protected]>

Thanks,
--
Alex

2015-06-18 16:30:34

by Paul E. McKenney

[permalink] [raw]
Subject: Re: perf: aux area related crash and warnings

On Thu, Jun 18, 2015 at 11:09:55AM +0200, Ingo Molnar wrote:
>
> * Alexander Shishkin <[email protected]> wrote:
>
> > +void rb_free_aux(struct ring_buffer *rb)
> > +{
> > + /*
> > + * hold rb::refcount to make sure rb doesn't disappear
> > + * before aux pages are freed
> > + */
> > + if (WARN_ON_ONCE(!atomic_inc_not_zero(&rb->refcount)))
> > + return;
> > +
> > + if (atomic_dec_and_test(&rb->aux_refcount))
> > + call_rcu(&rb->rcu_head, rb_free_rcu);
> > + else
> > + ring_buffer_put(rb); /* matches the increment above */
>
> Is call_rcu() NMI-safe? I don't think so ...

Definitely not! ;-)

> I think the life time rules of this object are really messed up if they can be
> freed from any fast path. How come the freeing can happen in NMI context?
> Shouldn't the hardware first stop, then we can free things from the system call
> path, or so?
>
> Thanks,
>
> Ingo
>

2015-06-19 14:15:43

by Vince Weaver

[permalink] [raw]
Subject: Re: perf: aux area related crash and warnings

On Thu, 18 Jun 2015, Peter Zijlstra wrote:

> On Tue, Jun 16, 2015 at 02:37:06PM +0300, Alexander Shishkin wrote:
> > Ok, here's what I propose for this one.
>
> As discussed yesterday, I think we want to change to how we do reference
> counting for the regular buffers, but that will require quite a bit of
> work.
>
> For the interim we can avoid the issue by something like the below;
> compile tested only.

I did manage to crash the system with Alexander Shishkin's patch applied
(See below).

I will try with this updated patch instead, but the bug usually takes a
few days to trigger with the fuzzer and I won't be able to let it run
that long this time.

Vince

[183592.949539] BUG: spinlock recursion on CPU#2, perf_fuzzer/434
[183592.956559] lock: pool_lock+0x0/0x40, .magic: dead4ead, .owner: perf_fuzzer/434, .owner_cpu: 2
[183592.966783] CPU: 2 PID: 434 Comm: perf_fuzzer Tainted: G W 4.1.0-rc8+ #157
[183592.976228] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[183592.984997] ffffffff81cc2120 ffff88011ea85970 ffffffff816d7419 0000000000000053
[183592.993871] ffff8800cf26a8d0 ffff88011ea85990 ffffffff810c28a5 ffffffff81cc2120
[183593.002787] ffffffff81a1e4d1 ffff88011ea859b0 ffffffff810c2926 ffffffff81cc2120
[183593.011632] Call Trace:
[183593.015016] <NMI> [<ffffffff816d7419>] dump_stack+0x45/0x57
[183593.021966] [<ffffffff810c28a5>] spin_dump+0x85/0xe0
[183593.028153] [<ffffffff810c2926>] spin_bug+0x26/0x30
[183593.034234] [<ffffffff810c2adf>] do_raw_spin_lock+0x13f/0x180
[183593.041233] [<ffffffff816de8d9>] _raw_spin_lock+0x39/0x40
[183593.047849] [<ffffffff81428d2d>] ? __debug_object_init+0xad/0x450
[183593.055189] [<ffffffff81428d2d>] __debug_object_init+0xad/0x450
[183593.062375] [<ffffffff814290eb>] debug_object_init+0x1b/0x20
[183593.069267] [<ffffffff810d8fb8>] rcuhead_fixup_activate+0x28/0x40
[183593.076638] [<ffffffff81429201>] debug_object_activate+0xf1/0x1e0
[183593.083969] [<ffffffff8115e690>] ? free_event_rcu+0x40/0x40
[183593.090781] [<ffffffff810dc8cb>] __call_rcu.constprop.58+0x3b/0x340
[183593.098321] [<ffffffff8115711c>] ? irq_work_queue+0x6c/0x80
[183593.105123] [<ffffffff810dcc27>] call_rcu_sched+0x17/0x20
[183593.111709] [<ffffffff81166680>] rb_free_aux+0x40/0x90
[183593.118046] [<ffffffff8116b067>] perf_aux_output_end+0xb7/0xf0
[183593.125089] [<ffffffff81037b0e>] intel_bts_interrupt+0x8e/0xd0
[183593.132069] [<ffffffff810338bf>] intel_pmu_handle_irq+0x4f/0x450
[183593.139335] [<ffffffff8102a4ab>] perf_event_nmi_handler+0x2b/0x50
[183593.146687] [<ffffffff810185d0>] nmi_handle+0xa0/0x150
[183593.153012] [<ffffffff81018535>] ? nmi_handle+0x5/0x150
[183593.159406] [<ffffffff8101887a>] default_do_nmi+0x4a/0x140
[183593.166117] [<ffffffff81018a08>] do_nmi+0x98/0xe0
[183593.171983] [<ffffffff816e15ef>] end_repeat_nmi+0x1e/0x2e
[183593.178613] [<ffffffff810c2b70>] ? do_raw_spin_trylock+0x50/0x50
[183593.185823] [<ffffffff810c2b70>] ? do_raw_spin_trylock+0x50/0x50
[183593.193059] [<ffffffff810c2b70>] ? do_raw_spin_trylock+0x50/0x50
[183593.200306] <<EOE>> [<ffffffff816dea7b>] ? _raw_spin_unlock+0x2b/0x40
[183593.208107] [<ffffffff81428ded>] __debug_object_init+0x16d/0x450
[183593.215328] [<ffffffff8142919a>] ? debug_object_activate+0x8a/0x1e0
[183593.222812] [<ffffffff814291ea>] ? debug_object_activate+0xda/0x1e0
[183593.230328] [<ffffffff814290eb>] debug_object_init+0x1b/0x20
[183593.237143] [<ffffffff810d8fb8>] rcuhead_fixup_activate+0x28/0x40
[183593.244413] [<ffffffff81429201>] debug_object_activate+0xf1/0x1e0
[183593.251702] [<ffffffff8115e650>] ? perf_unpin_context+0x40/0x40
[183593.258799] [<ffffffff810dc8cb>] __call_rcu.constprop.58+0x3b/0x340
[183593.266311] [<ffffffff81170b57>] ? __static_key_slow_dec+0x27/0xc0
[183593.273694] [<ffffffff810dcc27>] call_rcu_sched+0x17/0x20
[183593.280180] [<ffffffff81165439>] _free_event+0x109/0x290
[183593.286619] [<ffffffff811656cf>] put_event+0x10f/0x160
[183593.292855] [<ffffffff811655f0>] ? put_event+0x30/0x160
[183593.299153] [<ffffffff81165755>] perf_release+0x15/0x20
[183593.305442] [<ffffffff811edeec>] __fput+0xdc/0x1f0
[183593.311296] [<ffffffff811ee04e>] ____fput+0xe/0x10
[183593.317138] [<ffffffff81091577>] task_work_run+0xb7/0xf0
[183593.323509] [<ffffffff81014189>] do_notify_resume+0x59/0x80
[183593.330207] [<ffffffff816df550>] int_signal+0x12/0x17
[183602.817462] ------------[ cut here ]------------

Subject: [tip:perf/urgent] perf: Fix AUX buffer refcounting

Commit-ID: 57ffc5ca679f499f4704fd9b6a372916f59930ee
Gitweb: http://git.kernel.org/tip/57ffc5ca679f499f4704fd9b6a372916f59930ee
Author: Peter Zijlstra <[email protected]>
AuthorDate: Thu, 18 Jun 2015 12:32:49 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 6 Jul 2015 14:08:30 +0200

perf: Fix AUX buffer refcounting

Its currently possible to drop the last refcount to the aux buffer
from NMI context, which results in the expected fireworks.

The refcounting needs a bigger overhaul, but to cure the immediate
problem, delay the freeing by using an irq_work.

Reviewed-and-tested-by: Alexander Shishkin <[email protected]>
Reported-by: Vince Weaver <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/events/core.c | 8 --------
kernel/events/internal.h | 10 ++++++++++
kernel/events/ring_buffer.c | 27 +++++++++++++++++++++++++--
3 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index e965cfa..d3dae34 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4358,14 +4358,6 @@ static void ring_buffer_wakeup(struct perf_event *event)
rcu_read_unlock();
}

-static void rb_free_rcu(struct rcu_head *rcu_head)
-{
- struct ring_buffer *rb;
-
- rb = container_of(rcu_head, struct ring_buffer, rcu_head);
- rb_free(rb);
-}
-
struct ring_buffer *ring_buffer_get(struct perf_event *event)
{
struct ring_buffer *rb;
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 2deb24c..2bbad9c 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -11,6 +11,7 @@
struct ring_buffer {
atomic_t refcount;
struct rcu_head rcu_head;
+ struct irq_work irq_work;
#ifdef CONFIG_PERF_USE_VMALLOC
struct work_struct work;
int page_order; /* allocation order */
@@ -55,6 +56,15 @@ struct ring_buffer {
};

extern void rb_free(struct ring_buffer *rb);
+
+static inline void rb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct ring_buffer *rb;
+
+ rb = container_of(rcu_head, struct ring_buffer, rcu_head);
+ rb_free(rb);
+}
+
extern struct ring_buffer *
rb_alloc(int nr_pages, long watermark, int cpu, int flags);
extern void perf_event_wakeup(struct perf_event *event);
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 9647282..b2be01b 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -221,6 +221,8 @@ void perf_output_end(struct perf_output_handle *handle)
rcu_read_unlock();
}

+static void rb_irq_work(struct irq_work *work);
+
static void
ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
{
@@ -241,6 +243,16 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)

INIT_LIST_HEAD(&rb->event_list);
spin_lock_init(&rb->event_lock);
+ init_irq_work(&rb->irq_work, rb_irq_work);
+}
+
+static void ring_buffer_put_async(struct ring_buffer *rb)
+{
+ if (!atomic_dec_and_test(&rb->refcount))
+ return;
+
+ rb->rcu_head.next = (void *)rb;
+ irq_work_queue(&rb->irq_work);
}

/*
@@ -319,7 +331,7 @@ err_put:
rb_free_aux(rb);

err:
- ring_buffer_put(rb);
+ ring_buffer_put_async(rb);
handle->event = NULL;

return NULL;
@@ -370,7 +382,7 @@ void perf_aux_output_end(struct perf_output_handle *handle, unsigned long size,

local_set(&rb->aux_nest, 0);
rb_free_aux(rb);
- ring_buffer_put(rb);
+ ring_buffer_put_async(rb);
}

/*
@@ -557,7 +569,18 @@ static void __rb_free_aux(struct ring_buffer *rb)
void rb_free_aux(struct ring_buffer *rb)
{
if (atomic_dec_and_test(&rb->aux_refcount))
+ irq_work_queue(&rb->irq_work);
+}
+
+static void rb_irq_work(struct irq_work *work)
+{
+ struct ring_buffer *rb = container_of(work, struct ring_buffer, irq_work);
+
+ if (!atomic_read(&rb->aux_refcount))
__rb_free_aux(rb);
+
+ if (rb->rcu_head.next == (void *)rb)
+ call_rcu(&rb->rcu_head, rb_free_rcu);
}

#ifndef CONFIG_PERF_USE_VMALLOC