Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753249AbbD3U7D (ORCPT ); Thu, 30 Apr 2015 16:59:03 -0400 Received: from mail-qc0-f170.google.com ([209.85.216.170]:34951 "EHLO mail-qc0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753228AbbD3U65 (ORCPT ); Thu, 30 Apr 2015 16:58:57 -0400 From: Vince Weaver X-Google-Original-From: Vince Weaver Date: Thu, 30 Apr 2015 17:04:00 -0400 (EDT) To: linux-kernel@vger.kernel.org cc: Peter Zijlstra , Arnaldo Carvalho de Melo , Jiri Olsa , Ingo Molnar , Paul Mackerras Subject: perf: WARNING perfevents: irq loop stuck! Message-ID: User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8143 Lines: 126 So this is just a warning, and I've reported it before, but the perf_fuzzer triggers this fairly regularly on my Haswell system. It looks like fixed counter 0 (retired instructions) being set to 0000fffffffffffe occasionally causes an irq loop storm and gets stuck until the PMU state is cleared. [ 8224.179407] ------------[ cut here ]------------ [ 8224.184368] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/perf_event_intel.c:1602 intel_pmu_handle_irq+0x2bc/0x450() [ 8224.195686] perfevents: irq loop stuck! [ 8224.199835] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm crct10dif_pclmul snd_hda_codec_hdmi cr$ [ 8224.273188] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.0.0+ #136 [ 8224.281012] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 [ 8224.288988] ffffffff81a093a8 ffff88011ea05b50 ffffffff816d5639 0000000000000000 [ 8224.297058] ffff88011ea05ba0 ffff88011ea05b90 ffffffff81072e4a ffff880119b87090 [ 8224.306453] 0000000000000064 ffff88011ea0bd80 ffff8800c38c7800 0000000000000040 [ 8224.315894] Call Trace: [ 8224.319886] [] dump_stack+0x45/0x57 [ 8224.327439] [] warn_slowpath_common+0x8a/0xc0 [ 8224.335238] [] warn_slowpath_fmt+0x46/0x50 [ 8224.342812] [] intel_pmu_handle_irq+0x2bc/0x450 [ 8224.350830] [] ? perf_event_task_tick+0x1d5/0x2b0 [ 8224.359025] [] perf_event_nmi_handler+0x2b/0x50 [ 8224.367069] [] nmi_handle+0xa0/0x150 [ 8224.374067] [] ? nmi_handle+0x5/0x150 [ 8224.381169] [] default_do_nmi+0x4a/0x140 [ 8224.388487] [] do_nmi+0x98/0xe0 [ 8224.394886] [] end_repeat_nmi+0x1e/0x2e [ 8224.402082] [] ? native_write_msr_safe+0xa/0x10 [ 8224.409958] [] ? native_write_msr_safe+0xa/0x10 [ 8224.417785] [] ? native_write_msr_safe+0xa/0x10 [ 8224.425536] <> [] intel_pmu_enable_event+0x20e/0x230 [ 8224.435001] [] x86_pmu_start+0x81/0x120 [ 8224.442007] [] x86_pmu_enable+0x295/0x310 [ 8224.449150] [] perf_pmu_enable+0x2a/0x30 [ 8224.456173] [] x86_pmu_commit_txn+0x78/0xa0 [ 8224.456173] [] x86_pmu_commit_txn+0x78/0xa0 [ 8224.463561] [] ? debug_object_activate+0x14b/0x1e0 [ 8224.471591] [] ? __lock_acquire.isra.31+0x3b9/0x1000 [ 8224.479715] [] ? debug_object_activate+0x14b/0x1e0 [ 8224.487676] [] ? __lock_is_held+0x54/0x80 [ 8224.494794] [] ? event_sched_in.isra.85+0x190/0x330 [ 8224.502894] [] group_sched_in+0x1b8/0x1e0 [ 8224.510082] [] ? native_sched_clock+0x2a/0x90 [ 8224.517646] [] __perf_event_enable+0x25c/0x2a0 [ 8224.525297] [] ? tick_nohz_irq_exit+0x29/0x30 [ 8224.532776] [] ? cpu_clock_event_start+0x40/0x40 [ 8224.540595] [] remote_function+0x50/0x60 [ 8224.547668] [] flush_smp_call_function_queue+0x81/0x180 [ 8224.556126] [] generic_smp_call_function_single_interrupt+0x13/0x60 [ 8224.565749] [] smp_trace_call_function_single_interrupt+0x38/0xc0 [ 8224.575132] [] trace_call_function_single_interrupt+0x6e/0x80 [ 8224.584168] [] ? cpuidle_enter_state+0x65/0x160 [ 8224.584168] [] ? cpuidle_enter_state+0x65/0x160 [ 8224.592646] [] ? cpuidle_enter_state+0x51/0x160 [ 8224.600408] [] cpuidle_enter+0x17/0x20 [ 8224.607285] [] cpu_startup_entry+0x399/0x440 [ 8224.614712] [] rest_init+0xbb/0xd0 [ 8224.621283] [] start_kernel+0x44e/0x45b [ 8224.628212] [] ? early_idt_handlers+0x120/0x120 [ 8224.635980] [] x86_64_start_reservations+0x2a/0x2c [ 8224.643999] [] x86_64_start_kernel+0x13b/0x14a [ 8224.651691] ---[ end trace 5679ca0875946dba ]--- [ 8224.657764] [ 8224.660447] CPU#0: ctrl: 0000000000000000 [ 8224.666448] CPU#0: status: 0000000000000000 [ 8224.672414] CPU#0: overflow: 0000000000000000 [ 8224.678373] CPU#0: fixed: 00000000000000b9 [ 8224.684338] CPU#0: pebs: 0000000000000000 [ 8224.690322] CPU#0: debugctl: 0000000000000000 [ 8224.696253] CPU#0: active: 0000000300000000 [ 8224.702232] CPU#0: gen-PMC0 ctrl: 00000000108001c2 [ 8224.708823] CPU#0: gen-PMC0 count: 0000000000000001 [ 8224.715371] CPU#0: gen-PMC0 left: 0000ffffffffffff [ 8224.721914] CPU#0: gen-PMC1 ctrl: 00000000001382d0 [ 8224.728402] CPU#0: gen-PMC1 count: 0000000000ff3792 [ 8224.734849] CPU#0: gen-PMC1 left: 0000ffffff012119 [ 8224.741271] CPU#0: gen-PMC2 ctrl: 00000003ff96764b [ 8224.747600] CPU#0: gen-PMC2 count: 0000000000000001 [ 8224.753884] CPU#0: gen-PMC2 left: 0000ffffffffffff [ 8224.760100] CPU#0: gen-PMC3 ctrl: 0000000000131acd [ 8224.766320] CPU#0: gen-PMC3 count: 000000000000288a [ 8224.772585] CPU#0: gen-PMC3 left: 0000ffffffffffff [ 8224.778769] CPU#0: fixed-PMC0 count: 0000fffffffffffe [ 8224.784973] CPU#0: fixed-PMC1 count: 0000fffda1877268 [ 8224.791196] CPU#0: fixed-PMC2 count: 0000000000f222eb [ 8224.797450] perf_event_intel: clearing PMU state on CPU#0 [ 8224.804063] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 624.630 msecs [ 8224.813789] perf interrupt took too long (4880480 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 [ 8224.825400] perf interrupt took too long (4842512 > 5000), lowering kernel.perf_event_max_sample_rate to 25000 [ 8225.007411] perf interrupt took too long (4804688 > 10000), lowering kernel.perf_event_max_sample_rate to 12500 [ 8225.244832] perf interrupt took too long (4767159 > 20000), lowering kernel.perf_event_max_sample_rate to 6250 [ 8225.295535] perf interrupt took too long (4729923 > 38461), lowering kernel.perf_event_max_sample_rate to 3250 [ 8225.747314] perf interrupt took too long (4692980 > 71428), lowering kernel.perf_event_max_sample_rate to 1750 ... [202220.612133] perf_event_intel: clearing PMU state on CPU#6 [202220.665533] perf_event_intel: clearing PMU state on CPU#6 [202220.674658] perf_event_intel: clearing PMU state on CPU#6 [202220.734200] perf_event_intel: clearing PMU state on CPU#6 [202220.966526] perf_event_intel: clearing PMU state on CPU#6 [202220.976874] perf_event_intel: clearing PMU state on CPU#6 [202222.950479] perf_event_intel: clearing PMU state on CPU#6 [202223.183925] perf_event_intel: clearing PMU state on CPU#6 [212335.610297] perf_event_intel: clearing PMU state on CPU#2 [212335.922425] perf_event_intel: clearing PMU state on CPU#2 [212336.014458] perf_event_intel: clearing PMU state on CPU#2 [212336.258568] perf_event_intel: clearing PMU state on CPU#2 [226783.599841] perf_event_intel: clearing PMU state on CPU#0 [269662.113390] perf_event_intel: clearing PMU state on CPU#4 [401640.647353] perf_event_intel: clearing PMU state on CPU#3 [409463.976440] perf_event_intel: clearing PMU state on CPU#6 [409464.172527] perf_event_intel: clearing PMU state on CPU#6 [409464.692722] perf_event_intel: clearing PMU state on CPU#6 [422018.413829] perf_event_intel: clearing PMU state on CPU#1 [422018.458279] perf_event_intel: clearing PMU state on CPU#1 [422018.496802] perf_event_intel: clearing PMU state on CPU#1 [422018.510058] perf_event_intel: clearing PMU state on CPU#1 [422018.523668] perf_event_intel: clearing PMU state on CPU#1 [422018.549935] perf_event_intel: clearing PMU state on CPU#1 [422019.274748] perf_event_intel: clearing PMU state on CPU#1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/