Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757633AbcLOIM1 (ORCPT ); Thu, 15 Dec 2016 03:12:27 -0500 Received: from mail-yw0-f182.google.com ([209.85.161.182]:36034 "EHLO mail-yw0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757613AbcLOIMZ (ORCPT ); Thu, 15 Dec 2016 03:12:25 -0500 MIME-Version: 1.0 In-Reply-To: <20161215075205.GA19558@krava> References: <20160307202556.GQ6344@twins.programming.kicks-ass.net> <20160308210707.GG6344@twins.programming.kicks-ass.net> <20160310104236.GV6344@twins.programming.kicks-ass.net> <20161214175552.GW3207@twins.programming.kicks-ass.net> <20161215075205.GA19558@krava> From: Stephane Eranian Date: Thu, 15 Dec 2016 00:04:08 -0800 Message-ID: Subject: Re: [PATCH 2/3] perf/x86/pebs: add workaround for broken OVFL status on HSW To: Jiri Olsa Cc: Peter Zijlstra , Andi Kleen , LKML , Arnaldo Carvalho de Melo , "mingo@elte.hu" , "Liang, Kan" , Namhyung Kim , Adrian Hunter Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7260 Lines: 114 On Wed, Dec 14, 2016 at 11:52 PM, Jiri Olsa wrote: > On Wed, Dec 14, 2016 at 11:26:49PM -0800, Stephane Eranian wrote: >> On Wed, Dec 14, 2016 at 9:55 AM, Peter Zijlstra wrote: >> > >> > Just spotted this again, ping? >> > >> Ok, on what processor running what command, so I can try and reproduce? > > for me it's snb_x (model 45) and peter's ivb-ep model 62 > > after several hours of fuzzer test, log below.. I'll try again with the change > Ok, but the problem with the fuzzer is hat you have no idea whether you were using PEBS, no-PEBS one or multiple events, so it becomes hard to reproduce. > jirka > > > --- > [14404.947844] perfevents: irq loop stuck! > [14404.952560] ------------[ cut here ]------------ > [14404.957720] WARNING: CPU: 0 PID: 0 at arch/x86/events/intel/core.c:2093 intel_pmu_handle_irq+0x2f8/0x4c0 > [14404.968305] Modules linked in:\x01c intel_rapl\x01c sb_edac\x01c edac_core\x01c x86_pkg_temp_thermal\x01c intel_powerclamp\x01c coretemp > \x01c ipmi_devintf\x01c crct10dif_pclmul\x01c crc32_pclmul\x01c iTCO_wdt\x01c iTCO_vendor_support\x01c ghash_clmulni_intel\x01c pcspkr\x01c > ipmi_ssif\x01c tpm_tis\x01c i2c_i801\x01c tpm_tis_core\x01c ipmi_si\x01c tpm\x01c i2c_smbus\x01c ipmi_msghandler\x01c cdc_ether\x01c usbne > t\x01c mii\x01c shpchp\x01c ioatdma\x01c wmi\x01c lpc_ich\x01c xfs\x01c libcrc32c\x01c mgag200\x01c drm_kms_helper\x01c ttm\x01c drm\x01c i > gb\x01c ptp\x01c crc32c_intel\x01c pps_core\x01c dca\x01c i2c_algo_bit\x01c megaraid_sas\x01c fjes\x01c > [14405.019901] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc8+ #51 > [14405.026985] Hardware name: IBM System x3650 M4 : -[7915E2G]-/00Y7683, BIOS -[VVE124AUS-1.30]- 11/21/2012 > [14405.037568] ffff880277a05b08\x01c ffffffff81463243\x01c ffff880277a05b58\x01c 0000000000000000\x01c > [14405.046601] ffff880277a05b48\x01c ffffffff810b698b\x01c 0000082d81133a1d\x01c 0000000000000064\x01c > [14405.055634] ffff880277a0a380\x01c ffff880276208800\x01c 0000000000000040\x01c ffff880277a0a580\x01c > [14405.064665] Call Trace: > [14405.067394] [] dump_stack+0x86/0xc3 > [14405.073807] [] __warn+0xcb/0xf0 > [14405.079156] [] warn_slowpath_fmt+0x5f/0x80 > [14405.085569] [] ? warn_slowpath_fmt+0x5/0x80 > [14405.092081] [] intel_pmu_handle_irq+0x2f8/0x4c0 > [14405.098971] [] ? perf_event_nmi_handler+0x2c/0x50 > [14405.106065] [] ? intel_pmu_save_and_restart+0x50/0x50 > [14405.113547] [] ? nmi_raise_cpu_backtrace+0x20/0x20 > [14405.120737] [] ? ftrace_ops_test.isra.23+0x65/0xa0 > [14405.127917] [] ? bsearch+0x5e/0x90 > [14405.133556] [] ? __add_hash_entry+0x50/0x50 > [14405.140066] [] ? bsearch+0x5e/0x90 > [14405.145704] [] ? __add_hash_entry+0x50/0x50 > [14405.152214] [] ? nmi_raise_cpu_backtrace+0x20/0x20 > [14405.159403] [] ? nmi_raise_cpu_backtrace+0x20/0x20 > [14405.166594] [] ? debug_lockdep_rcu_enabled+0x1d/0x20 > [14405.173979] [] ? ftrace_ops_list_func+0xce/0x1d0 > [14405.180974] [] ? ftrace_call+0x5/0x34 > [14405.186904] [] ? ftrace_call+0x5/0x34 > [14405.192824] [] ? printk_nmi_enter+0x20/0x20 > [14405.199337] [] ? intel_pmu_handle_irq+0x5/0x4c0 > [14405.206235] [] ? perf_event_nmi_handler+0x5/0x50 > [14405.213231] [] perf_event_nmi_handler+0x2c/0x50 > [14405.220121] [] nmi_handle+0xbd/0x2e0 > [14405.225954] [] ? nmi_handle+0x5/0x2e0 > [14405.231875] [] ? nmi_handle+0x5/0x2e0 > [14405.237804] [] default_do_nmi+0x53/0x100 > [14405.244025] [] do_nmi+0x11f/0x170 > [14405.249557] [] end_repeat_nmi+0x1a/0x1e > [14405.255680] [] ? native_write_msr+0x6/0x30 > [14405.262093] [] ? native_write_msr+0x6/0x30 > [14405.268507] [] ? native_write_msr+0x6/0x30 > [14405.274914] [] ? intel_pmu_pebs_enable_all+0x34/0x40 > [14405.283656] [] __intel_pmu_enable_all.constprop.17+0x23/0xa0 > [14405.291815] [] intel_pmu_enable_all+0x10/0x20 > [14405.298520] [] x86_pmu_enable+0x256/0x2e0 > [14405.304836] [] perf_pmu_enable.part.86+0x7/0x10 > [14405.311736] [] perf_mux_hrtimer_handler+0x22e/0x2c0 > [14405.319014] [] __hrtimer_run_queues+0xfb/0x510 > [14405.325808] [] ? ctx_resched+0x90/0x90 > [14405.331834] [] hrtimer_interrupt+0x9d/0x1a0 > [14405.338343] [] local_apic_timer_interrupt+0x38/0x60 > [14405.345629] [] smp_trace_apic_timer_interrupt+0x5b/0x25f > [14405.353402] [] trace_apic_timer_interrupt+0x96/0xa0 > [14405.360689] [] ? cpuidle_enter_state+0x124/0x380 > [14405.368354] [] ? cpuidle_enter_state+0x120/0x380 > [14405.375349] [] cpuidle_enter+0x17/0x20 > [14405.381375] [] call_cpuidle+0x23/0x40 > [14405.387303] [] cpu_startup_entry+0x160/0x250 > [14405.393910] [] rest_init+0x135/0x140 > [14405.399743] [] start_kernel+0x45e/0x47f > [14405.405866] [] ? early_idt_handler_array+0x120/0x120 > [14405.413250] [] x86_64_start_reservations+0x2a/0x2c > [14405.420432] [] x86_64_start_kernel+0x14c/0x16f > [14405.427224] ---[ end trace 62b08c15aaa2825d ]--- > [14405.432378] > [14405.434043] CPU#0: ctrl: 0000000000000000 > [14405.439099] CPU#0: status: 0000000000000008 > [14405.444157] CPU#0: overflow: 0000000000000000 > [14405.449214] CPU#0: fixed: 00000000000000b0 > [14405.454271] CPU#0: pebs: 0000000000000000 > [14405.459326] CPU#0: debugctl: 0000000000000000 > [14405.464383] CPU#0: active: 000000020000000f > [14405.469431] CPU#0: gen-PMC0 ctrl: 0000000001d301b1 > [14405.475069] CPU#0: gen-PMC0 count: 0000800090b1c37e > [14405.480706] CPU#0: gen-PMC0 left: 00007fff6fb96d3a > [14405.486344] CPU#0: gen-PMC1 ctrl: 00000000baf733b1 > [14405.491981] CPU#0: gen-PMC1 count: 0000800000000009 > [14405.497618] CPU#0: gen-PMC1 left: 00007ffffffffff7 > [14405.503256] CPU#0: gen-PMC2 ctrl: 0000000000530020 > [14405.508894] CPU#0: gen-PMC2 count: 00008000000000e8 > [14405.514534] CPU#0: gen-PMC2 left: 00007fffffffff18 > [14405.520172] CPU#0: gen-PMC3 ctrl: 00000000004200c0 > [14405.525809] CPU#0: gen-PMC3 count: 0000fffffffffffe > [14405.531446] CPU#0: gen-PMC3 left: 0000000000000002 > [14405.537085] CPU#0: fixed-PMC0 count: 000080000010c91d > [14405.542722] CPU#0: fixed-PMC1 count: 0000fffc1b31bacf > [14405.548360] CPU#0: fixed-PMC2 count: 000080000318bf99 > [14405.554000] core: clearing PMU state on CPU#0 > [14405.559598] core: clearing PMU state on CPU#0