Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757488AbcLOHwK (ORCPT ); Thu, 15 Dec 2016 02:52:10 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60030 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757356AbcLOHwJ (ORCPT ); Thu, 15 Dec 2016 02:52:09 -0500 Date: Thu, 15 Dec 2016 08:52:05 +0100 From: Jiri Olsa To: Stephane Eranian Cc: Peter Zijlstra , Andi Kleen , LKML , Arnaldo Carvalho de Melo , "mingo@elte.hu" , "Liang, Kan" , Namhyung Kim , Adrian Hunter Subject: Re: [PATCH 2/3] perf/x86/pebs: add workaround for broken OVFL status on HSW Message-ID: <20161215075205.GA19558@krava> References: <20160307202556.GQ6344@twins.programming.kicks-ass.net> <20160308210707.GG6344@twins.programming.kicks-ass.net> <20160310104236.GV6344@twins.programming.kicks-ass.net> <20161214175552.GW3207@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.7.1 (2016-10-04) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 15 Dec 2016 07:52:08 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6828 Lines: 109 On Wed, Dec 14, 2016 at 11:26:49PM -0800, Stephane Eranian wrote: > On Wed, Dec 14, 2016 at 9:55 AM, Peter Zijlstra wrote: > > > > Just spotted this again, ping? > > > Ok, on what processor running what command, so I can try and reproduce? for me it's snb_x (model 45) and peter's ivb-ep model 62 after several hours of fuzzer test, log below.. I'll try again with the change jirka --- [14404.947844] perfevents: irq loop stuck! [14404.952560] ------------[ cut here ]------------ [14404.957720] WARNING: CPU: 0 PID: 0 at arch/x86/events/intel/core.c:2093 intel_pmu_handle_irq+0x2f8/0x4c0 [14404.968305] Modules linked in:\x01c intel_rapl\x01c sb_edac\x01c edac_core\x01c x86_pkg_temp_thermal\x01c intel_powerclamp\x01c coretemp \x01c ipmi_devintf\x01c crct10dif_pclmul\x01c crc32_pclmul\x01c iTCO_wdt\x01c iTCO_vendor_support\x01c ghash_clmulni_intel\x01c pcspkr\x01c ipmi_ssif\x01c tpm_tis\x01c i2c_i801\x01c tpm_tis_core\x01c ipmi_si\x01c tpm\x01c i2c_smbus\x01c ipmi_msghandler\x01c cdc_ether\x01c usbne t\x01c mii\x01c shpchp\x01c ioatdma\x01c wmi\x01c lpc_ich\x01c xfs\x01c libcrc32c\x01c mgag200\x01c drm_kms_helper\x01c ttm\x01c drm\x01c i gb\x01c ptp\x01c crc32c_intel\x01c pps_core\x01c dca\x01c i2c_algo_bit\x01c megaraid_sas\x01c fjes\x01c [14405.019901] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc8+ #51 [14405.026985] Hardware name: IBM System x3650 M4 : -[7915E2G]-/00Y7683, BIOS -[VVE124AUS-1.30]- 11/21/2012 [14405.037568] ffff880277a05b08\x01c ffffffff81463243\x01c ffff880277a05b58\x01c 0000000000000000\x01c [14405.046601] ffff880277a05b48\x01c ffffffff810b698b\x01c 0000082d81133a1d\x01c 0000000000000064\x01c [14405.055634] ffff880277a0a380\x01c ffff880276208800\x01c 0000000000000040\x01c ffff880277a0a580\x01c [14405.064665] Call Trace: [14405.067394] [] dump_stack+0x86/0xc3 [14405.073807] [] __warn+0xcb/0xf0 [14405.079156] [] warn_slowpath_fmt+0x5f/0x80 [14405.085569] [] ? warn_slowpath_fmt+0x5/0x80 [14405.092081] [] intel_pmu_handle_irq+0x2f8/0x4c0 [14405.098971] [] ? perf_event_nmi_handler+0x2c/0x50 [14405.106065] [] ? intel_pmu_save_and_restart+0x50/0x50 [14405.113547] [] ? nmi_raise_cpu_backtrace+0x20/0x20 [14405.120737] [] ? ftrace_ops_test.isra.23+0x65/0xa0 [14405.127917] [] ? bsearch+0x5e/0x90 [14405.133556] [] ? __add_hash_entry+0x50/0x50 [14405.140066] [] ? bsearch+0x5e/0x90 [14405.145704] [] ? __add_hash_entry+0x50/0x50 [14405.152214] [] ? nmi_raise_cpu_backtrace+0x20/0x20 [14405.159403] [] ? nmi_raise_cpu_backtrace+0x20/0x20 [14405.166594] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 [14405.173979] [] ? ftrace_ops_list_func+0xce/0x1d0 [14405.180974] [] ? ftrace_call+0x5/0x34 [14405.186904] [] ? ftrace_call+0x5/0x34 [14405.192824] [] ? printk_nmi_enter+0x20/0x20 [14405.199337] [] ? intel_pmu_handle_irq+0x5/0x4c0 [14405.206235] [] ? perf_event_nmi_handler+0x5/0x50 [14405.213231] [] perf_event_nmi_handler+0x2c/0x50 [14405.220121] [] nmi_handle+0xbd/0x2e0 [14405.225954] [] ? nmi_handle+0x5/0x2e0 [14405.231875] [] ? nmi_handle+0x5/0x2e0 [14405.237804] [] default_do_nmi+0x53/0x100 [14405.244025] [] do_nmi+0x11f/0x170 [14405.249557] [] end_repeat_nmi+0x1a/0x1e [14405.255680] [] ? native_write_msr+0x6/0x30 [14405.262093] [] ? native_write_msr+0x6/0x30 [14405.268507] [] ? native_write_msr+0x6/0x30 [14405.274914] [] ? intel_pmu_pebs_enable_all+0x34/0x40 [14405.283656] [] __intel_pmu_enable_all.constprop.17+0x23/0xa0 [14405.291815] [] intel_pmu_enable_all+0x10/0x20 [14405.298520] [] x86_pmu_enable+0x256/0x2e0 [14405.304836] [] perf_pmu_enable.part.86+0x7/0x10 [14405.311736] [] perf_mux_hrtimer_handler+0x22e/0x2c0 [14405.319014] [] __hrtimer_run_queues+0xfb/0x510 [14405.325808] [] ? ctx_resched+0x90/0x90 [14405.331834] [] hrtimer_interrupt+0x9d/0x1a0 [14405.338343] [] local_apic_timer_interrupt+0x38/0x60 [14405.345629] [] smp_trace_apic_timer_interrupt+0x5b/0x25f [14405.353402] [] trace_apic_timer_interrupt+0x96/0xa0 [14405.360689] [] ? cpuidle_enter_state+0x124/0x380 [14405.368354] [] ? cpuidle_enter_state+0x120/0x380 [14405.375349] [] cpuidle_enter+0x17/0x20 [14405.381375] [] call_cpuidle+0x23/0x40 [14405.387303] [] cpu_startup_entry+0x160/0x250 [14405.393910] [] rest_init+0x135/0x140 [14405.399743] [] start_kernel+0x45e/0x47f [14405.405866] [] ? early_idt_handler_array+0x120/0x120 [14405.413250] [] x86_64_start_reservations+0x2a/0x2c [14405.420432] [] x86_64_start_kernel+0x14c/0x16f [14405.427224] ---[ end trace 62b08c15aaa2825d ]--- [14405.432378] [14405.434043] CPU#0: ctrl: 0000000000000000 [14405.439099] CPU#0: status: 0000000000000008 [14405.444157] CPU#0: overflow: 0000000000000000 [14405.449214] CPU#0: fixed: 00000000000000b0 [14405.454271] CPU#0: pebs: 0000000000000000 [14405.459326] CPU#0: debugctl: 0000000000000000 [14405.464383] CPU#0: active: 000000020000000f [14405.469431] CPU#0: gen-PMC0 ctrl: 0000000001d301b1 [14405.475069] CPU#0: gen-PMC0 count: 0000800090b1c37e [14405.480706] CPU#0: gen-PMC0 left: 00007fff6fb96d3a [14405.486344] CPU#0: gen-PMC1 ctrl: 00000000baf733b1 [14405.491981] CPU#0: gen-PMC1 count: 0000800000000009 [14405.497618] CPU#0: gen-PMC1 left: 00007ffffffffff7 [14405.503256] CPU#0: gen-PMC2 ctrl: 0000000000530020 [14405.508894] CPU#0: gen-PMC2 count: 00008000000000e8 [14405.514534] CPU#0: gen-PMC2 left: 00007fffffffff18 [14405.520172] CPU#0: gen-PMC3 ctrl: 00000000004200c0 [14405.525809] CPU#0: gen-PMC3 count: 0000fffffffffffe [14405.531446] CPU#0: gen-PMC3 left: 0000000000000002 [14405.537085] CPU#0: fixed-PMC0 count: 000080000010c91d [14405.542722] CPU#0: fixed-PMC1 count: 0000fffc1b31bacf [14405.548360] CPU#0: fixed-PMC2 count: 000080000318bf99 [14405.554000] core: clearing PMU state on CPU#0 [14405.559598] core: clearing PMU state on CPU#0