Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752226Ab3IJOPV (ORCPT ); Tue, 10 Sep 2013 10:15:21 -0400 Received: from mail-ie0-f170.google.com ([209.85.223.170]:54754 "EHLO mail-ie0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751380Ab3IJOPU (ORCPT ); Tue, 10 Sep 2013 10:15:20 -0400 MIME-Version: 1.0 Reply-To: eranian@gmail.com In-Reply-To: <20130910133845.GB7537@gmail.com> References: <20130903132933.GA24955@gmail.com> <20130909100544.GI31370@twins.programming.kicks-ass.net> <20130910115306.GA6091@gmail.com> <20130910133845.GB7537@gmail.com> Date: Tue, 10 Sep 2013 07:15:19 -0700 Message-ID: Subject: Re: PEBS bug on HSW: "Unexpected number of pebs records 10" (was: Re: [GIT PULL] perf changes for v3.12) From: Stephane Eranian To: Ingo Molnar Cc: Peter Zijlstra , Linus Torvalds , Linux Kernel Mailing List , Arnaldo Carvalho de Melo , Thomas Gleixner , Andi Kleen Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4596 Lines: 118 On Tue, Sep 10, 2013 at 6:38 AM, Ingo Molnar wrote: > > * Stephane Eranian wrote: > >> Hi, >> >> Ok, so I am able to reproduce the problem using a simpler >> test case with a simple multithreaded program where >> #threads >> #CPUs. > > Does it go away if you use 'perf record --all-cpus'? > Haven't tried that yet. But I verified the DS pointers: init: CPU6 pebs base=ffff8808262de000 index=ffff8808262de000 intr=ffff8808262de0c0 max=ffff8808262defc0 crash: CPU6 pebs base=ffff8808262de000 index=ffff8808262de9c0 intr=ffff8808262de0c0 max=ffff8808262defc0 Neither the base nor the max are modified. The index simply goes beyond the threshold but that's not a bug. It is 12 after the threshold of 1, so total 13 is my new crash report. Two things to try: - measure only one thread/core - move the threshold a bit farther away (to get 2 or 3 entries) The threshold is where to generate the interrupt. It does not mean where to stop PEBS recording. So it is possible that in HSW, we may get into a situation where it takes time to get to the handler to stop the PMU. I don't know how given we use NMI. Well, unless we were already servicing an NMI at the time. But given that we stop the PMU almost immediately in the handler, I don't see how that would possible. The other oddity in HSW is that we clear the NMI on entry to the handler and not at the end. I never gotten an good explanation as to why that was necessary. So maybe it is related... >> [ 2229.021934] WARNING: CPU: 6 PID: 17496 at >> arch/x86/kernel/cpu/perf_event_intel_ds.c:1003 >> intel_pmu_drain_pebs_hsw+0xa8/0xc0() >> [ 2229.021936] Unexpected number of pebs records 21 >> >> [ 2229.021966] Call Trace: >> [ 2229.021967] [] dump_stack+0x46/0x58 >> [ 2229.021976] [] warn_slowpath_common+0x8c/0xc0 >> [ 2229.021979] [] warn_slowpath_fmt+0x46/0x50 >> [ 2229.021982] [] intel_pmu_drain_pebs_hsw+0xa8/0xc0 >> [ 2229.021986] [] intel_pmu_handle_irq+0x220/0x380 >> [ 2229.021991] [] ? sched_clock_cpu+0xc5/0x120 >> [ 2229.021995] [] perf_event_nmi_handler+0x34/0x60 >> [ 2229.021998] [] nmi_handle.isra.3+0x88/0x180 >> [ 2229.022001] [] do_nmi+0xe0/0x330 >> [ 2229.022004] [] end_repeat_nmi+0x1e/0x2e >> [ 2229.022008] [] ? intel_pmu_pebs_enable_all+0x33/0x40 >> [ 2229.022011] [] ? intel_pmu_pebs_enable_all+0x33/0x40 >> [ 2229.022015] [] ? intel_pmu_pebs_enable_all+0x33/0x40 >> [ 2229.022016] <> [] intel_pmu_enable_all+0x23/0xa0 >> [ 2229.022021] [] x86_pmu_enable+0x274/0x310 >> [ 2229.022025] [] perf_pmu_enable+0x27/0x30 >> [ 2229.022029] [] perf_event_context_sched_in+0x79/0xc0 >> >> Could be a HW race whereby the PEBS of each HT threads get mixed up. > > Yes, that seems plausible and would explain why the overrun is usually a > small integer. We set up the DS with PEBS_BUFFER_SIZE == 4096, so with a > record size of 192 bytes on HSW we should get index values of 0-21. > > That fits within the indices range reported so far. > >> [...] I will add a couple more checks to verify that. The intr_thres >> should not have changed. Yet looks like we have a sitation where the >> index is way past the threshold. > > Btw., it would also be nice to add a check of ds->pebs_index against > ds->pebs_absolute_maximum, to make sure the PEBS record index never goes > outside the DS area. I.e. to protect against random corruption. > > Right now we do only half a check: > > n = top - at; > if (n <= 0) > return; > > this still allows an upwards overflow. We check x86_pmu.max_pebs_events > but then let it continue: > > WARN_ONCE(n > x86_pmu.max_pebs_events, > "Unexpected number of pebs records %d\n", n); > > return __intel_pmu_drain_pebs_nhm(iregs, at, top); > > Instead it should be something more robust, like: > > if (WARN_ONCE(n > max ...)) { > /* Drain the PEBS buffer: */ > ds->pebs_index = ds->pebs_buffer_base; > return; > } > > Thanks, > > Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/