Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754081AbbGOGmx (ORCPT ); Wed, 15 Jul 2015 02:42:53 -0400 Received: from mail-ob0-f177.google.com ([209.85.214.177]:35015 "EHLO mail-ob0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754052AbbGOGmu (ORCPT ); Wed, 15 Jul 2015 02:42:50 -0400 MIME-Version: 1.0 Reply-To: eranian@gmail.com In-Reply-To: References: <20150703131336.GI19282@twins.programming.kicks-ass.net> <20150703190420.GS3644@twins.programming.kicks-ass.net> Date: Wed, 15 Jul 2015 08:42:50 +0200 Message-ID: Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm() From: Stephane Eranian To: Vince Weaver Cc: Peter Zijlstra , LKML , Ingo Molnar , Arnaldo Carvalho de Melo , kan.liang@intel.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4642 Lines: 81 On Fri, Jul 3, 2015 at 9:49 PM, Vince Weaver wrote: > On Fri, 3 Jul 2015, Peter Zijlstra wrote: > >> That said, its far too warm and I might just not be making sense. > > you need to come visit Maine! Although I am not sure the cooler weather > necessarily improves my kernel debugging skills. > > I managed to lock the machine (again this is with the patch applied). > I can reproduce the problem on my HSW running the fuzzer. I can see why this could be happening if you are mixing PEBS and non PEBS events in the bottom 4 counters. I suspect: for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) { if ((counts[bit] == 0) && (error[bit] == 0)) continue; This test is not correct when you have non-PEBS events mixed with PEBS events and they overflow at the same time. They will have counts[i] != 0 but error[i] == 0, and thus you fall thru the loop and hit the assert. Or it is something along those lines. > [ 299.366027] ------------[ cut here ]------------ > [ 299.370985] WARNING: CPU: 2 PID: 8241 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0() > [ 299.456929] CPU: 2 PID: 8241 Comm: perf_fuzzer Tainted: G W 4.1.0+ #164 > [ 299.465750] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014 > [ 299.474274] ffffffff81a105a0 ffff88011ea85b10 ffffffff8169f823 0000000000000000 > [ 299.482864] 0000000000000000 ffff88011ea85b50 ffffffff8106ec8a ffff88011ea85ba0 > [ 299.491488] 0000000000000000 0000000000000001 ffff88011ea8bd80 ffff8801190400c0 > [ 299.500029] Call Trace: > [ 299.503190] [] dump_stack+0x45/0x57 > [ 299.509936] [] warn_slowpath_common+0x8a/0xc0 > [ 299.516901] [] warn_slowpath_null+0x1a/0x20 > [ 299.523715] [] intel_pmu_drain_pebs_nhm+0x283/0x2e0 > [ 299.531268] [] intel_pmu_handle_irq+0x255/0x440 > [ 299.538487] [] perf_event_nmi_handler+0x26/0x40 > [ 299.545638] [] nmi_handle+0x9d/0x140 > [ 299.551772] [] ? nmi_handle+0x5/0x140 > [ 299.558013] [] default_do_nmi+0x4a/0x120 > [ 299.564527] [] do_nmi+0x8d/0xc0 > [ 299.570185] [] end_repeat_nmi+0x1e/0x2e > [ 299.576580] [] ? check_poison_obj+0x92/0x230 > [ 299.583390] [] ? check_poison_obj+0x92/0x230 > [ 299.590163] [] ? check_poison_obj+0x92/0x230 > [ 299.596922] <> [] ? perf_event_alloc+0x58/0x680 > [ 299.604594] [] cache_alloc_debugcheck_after.isra.51+0x1cd/0x250 > [ 299.613140] [] kmem_cache_alloc_trace+0xa6/0x510 > [ 299.620330] [] ? perf_event_alloc+0x58/0x680 > [ 299.627088] [] ? get_online_cpus+0x58/0x70 > [ 299.633688] [] perf_event_alloc+0x58/0x680 > [ 299.640319] [] SYSC_perf_event_open+0x3c7/0xd40 > [ 299.647353] [] ? __do_page_fault+0x1ab/0x3f0 > [ 299.654172] [] SyS_perf_event_open+0x9/0x10 > [ 299.660871] [] entry_SYSCALL_64_fastpath+0x16/0x7a > [ 299.668236] ---[ end trace 3356c74581c13f1d ]--- > [ 299.673648] Uhhuh. NMI received for unknown reason 31 on CPU 2. > [ 299.680427] Do you have a strange power saving mode enabled? > [ 299.686963] Dazed and confused, but trying to continue > [ 299.692904] Uhhuh. NMI received for unknown reason 31 on CPU 2. > [ 299.699748] Do you have a strange power saving mode enabled? > [ 299.706227] Dazed and confused, but trying to continue > [ 299.712172] Uhhuh. NMI received for unknown reason 31 on CPU 2. > [ 299.718946] Do you have a strange power saving mode enabled? > [ 299.725446] Dazed and confused, but trying to continue > [ 299.731419] Uhhuh. NMI received for unknown reason 31 on CPU 2. > [ 299.738235] Do you have a strange power saving mode enabled? > [ 299.744740] Dazed and confused, but trying to continue > [ 299.750660] Uhhuh. NMI received for unknown reason 21 on CPU 2. > [ 299.757398] Do you have a strange power saving mode enabled? > [ 299.763862] Dazed and confused, but trying to continue > > (machine eventually locks up after lots of these messages) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/