Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755509AbbGCS5E (ORCPT ); Fri, 3 Jul 2015 14:57:04 -0400 Received: from mail-ig0-f181.google.com ([209.85.213.181]:36968 "EHLO mail-ig0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755280AbbGCS4x (ORCPT ); Fri, 3 Jul 2015 14:56:53 -0400 From: Vince Weaver X-Google-Original-From: Vince Weaver Date: Fri, 3 Jul 2015 15:03:05 -0400 (EDT) To: Peter Zijlstra cc: Vince Weaver , linux-kernel@vger.kernel.org, Ingo Molnar , Arnaldo Carvalho de Melo , Stephane Eranian , kan.liang@intel.com Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm() In-Reply-To: <20150703131336.GI19282@twins.programming.kicks-ass.net> Message-ID: References: <20150703131336.GI19282@twins.programming.kicks-ass.net> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5181 Lines: 85 On Fri, 3 Jul 2015, Peter Zijlstra wrote: > On Thu, Jul 02, 2015 at 11:18:10AM -0400, Vince Weaver wrote: > > > > So sad to say the lack of fuzzer reports was because I was out of town for > > a bit, not due to the kernel suddenly getting amazingly better. > > > > In any case I am running against current git and getting a lot of > > warnings, but most of them seem to be old ones. This following one looks > > new though. > > > > This is current linus-git on a Haswell machine with peterz's patch to fix > > the aux buffer spinlock recursion (I can still crash the kernel if that > > patch is not applied). > > > > It corresponds to: > > > > WARN_ON_ONCE(!event->attr.precise_ip); > > > > [ 584.352324] WARNING: CPU: 2 PID: 18924 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0() > > I've not yet tried to reproduce, but the below could explain things. > > On disabling an event we first clear our cpuc->pebs_enabled bits, only > to then check them to see if there are any set, and if so, drain the > buffer. > > If we just cleared the last bit, we'll fail to drain the buffer. > > If we then program another event on that counter and another PEBS event, > we can hit the above WARN with the 'stale' entries left over from the > previous event. with that patch applied I still managed to hit this: WARN_ON_ONCE(!event->attr.precise_ip); I'll let it run some more and see if the watchdog still gets triggered. [ 2217.544901] ------------[ cut here ]------------ [ 2217.550351] WARNING: CPU: 2 PID: 9136 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0() [ 2217.563534] Modules linked in: fuse snd_hda_codec_hdmi i915 x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel psmouse hmac drbg evdev serio_raw ansi_cprng snd_hda_codec_realtek drm_kms_helper snd_hda_codec_generic ppdev iTCO_wdt iTCO_vendor_support pcspkr drm i2c_algo_bit aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper snd_hda_intel cryptd mei_me mei snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer tpm_tis tpm wmi button processor video battery i2c_i801 parport_pc parport snd lpc_ich mfd_core soundcore sg sr_mod sd_mod cdrom ehci_pci ehci_hcd ahci libahci xhci_pci xhci_hcd e1000e libata ptp crc32c_intel scsi_mod pps_core usbcore usb_common fan thermal thermal_sys [ 2217.640998] CPU: 2 PID: 9136 Comm: perf_fuzzer Tainted: G W 4.1.0+ #163 [ 2217.649810] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 [ 2217.658281] ffffffff81a105a0 ffff88011ea85b10 ffffffff8169f823 0000000000000000 [ 2217.666818] 0000000000000000 ffff88011ea85b50 ffffffff8106ec8a ffff88011ea85ba0 [ 2217.675329] 0000000000000002 0000000000000001 ffff88011ea8bd80 ffff8801190400c0 [ 2217.683821] Call Trace: [ 2217.686960] [] dump_stack+0x45/0x57 [ 2217.693638] [] warn_slowpath_common+0x8a/0xc0 [ 2217.700549] [] warn_slowpath_null+0x1a/0x20 [ 2217.707296] [] intel_pmu_drain_pebs_nhm+0x283/0x2e0 [ 2217.714775] [] ? intel_pmu_disable_event+0xa4/0x130 [ 2217.722216] [] intel_pmu_handle_irq+0x255/0x440 [ 2217.729339] [] ? perf_event_ctx_lock_nested+0x5e/0xf0 [ 2217.737026] [] perf_event_nmi_handler+0x26/0x40 [ 2217.744070] [] nmi_handle+0x9d/0x140 [ 2217.750160] [] ? nmi_handle+0x5/0x140 [ 2217.756290] [] default_do_nmi+0x4a/0x120 [ 2217.762688] [] do_nmi+0x8d/0xc0 [ 2217.768280] [] end_repeat_nmi+0x1e/0x2e [ 2217.774627] [] ? __intel_pmu_enable_all+0x5a/0xc0 [ 2217.781894] [] ? __intel_pmu_enable_all+0x5a/0xc0 [ 2217.789153] [] ? __intel_pmu_enable_all+0x5a/0xc0 [ 2217.796415] <> [] intel_pmu_enable_all+0x10/0x20 [ 2217.804847] [] x86_pmu_enable+0x25c/0x2e0 [ 2217.811383] [] perf_pmu_enable+0x22/0x30 [ 2217.817837] [] perf_mux_hrtimer_handler+0x120/0x1f0 [ 2217.825316] [] ? perf_event_context_sched_in+0x150/0x150 [ 2217.833239] [] __hrtimer_run_queues+0xd3/0x260 [ 2217.840239] [] hrtimer_interrupt+0xab/0x1b0 [ 2217.846930] [] local_apic_timer_interrupt+0x3c/0x70 [ 2217.854367] [] smp_apic_timer_interrupt+0x41/0x60 [ 2217.861630] [] apic_timer_interrupt+0x6b/0x70 [ 2217.868540] [ 2217.870633] ---[ end trace 3a31b4d07b4f3450 ]--- [ 2353.824071] Uhhuh. NMI received for unknown reason 31 on CPU 1. [ 2353.831238] Do you have a strange power saving mode enabled? [ 2353.838120] Dazed and confused, but trying to continue -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/