Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933622AbcCIRkO (ORCPT ); Wed, 9 Mar 2016 12:40:14 -0500 Received: from mail-wm0-f48.google.com ([74.125.82.48]:35689 "EHLO mail-wm0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933573AbcCIRkJ (ORCPT ); Wed, 9 Mar 2016 12:40:09 -0500 MIME-Version: 1.0 In-Reply-To: References: <1457034642-21837-1-git-send-email-eranian@google.com> <1457034642-21837-3-git-send-email-eranian@google.com> <20160303214312.GI23621@tassilo.jf.intel.com> <20160307102413.GB6356@twins.programming.kicks-ass.net> <20160307121840.GF6375@twins.programming.kicks-ass.net> <20160307182731.GA12153@krava.redhat.com> <20160307202556.GQ6344@twins.programming.kicks-ass.net> <20160308210707.GG6344@twins.programming.kicks-ass.net> Date: Wed, 9 Mar 2016 09:40:07 -0800 Message-ID: Subject: Re: [PATCH 2/3] perf/x86/pebs: add workaround for broken OVFL status on HSW From: Stephane Eranian To: Peter Zijlstra Cc: Jiri Olsa , Andi Kleen , LKML , Arnaldo Carvalho de Melo , "mingo@elte.hu" , "Liang, Kan" , Namhyung Kim , Adrian Hunter Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2798 Lines: 65 On Tue, Mar 8, 2016 at 9:44 PM, Stephane Eranian wrote: > On Tue, Mar 8, 2016 at 9:34 PM, Stephane Eranian wrote: >> On Tue, Mar 8, 2016 at 1:13 PM, Stephane Eranian wrote: >>> Hi, >>> >>> On Tue, Mar 8, 2016 at 1:07 PM, Peter Zijlstra wrote: >>>> On Tue, Mar 08, 2016 at 12:59:23PM -0800, Stephane Eranian wrote: >>>>> hi, >>>>> >>>>> On Mon, Mar 7, 2016 at 12:25 PM, Peter Zijlstra wrote: >>>>> > >>>>> > On Mon, Mar 07, 2016 at 07:27:31PM +0100, Jiri Olsa wrote: >>>>> > > On Mon, Mar 07, 2016 at 01:18:40PM +0100, Peter Zijlstra wrote: >>>>> > > > On Mon, Mar 07, 2016 at 11:24:13AM +0100, Peter Zijlstra wrote: >>>>> > > > >>>>> > > > > I suspect Andi is having something along: >>>>> > > > > >>>>> > > > > lkml.kernel.org/r/1445458568-16956-1-git-send-email-andi@firstfloor.org >>>>> > > > > >>>>> > > > > applied to his tree. >>>>> > > > >>>>> > > > OK, I munged a bunch of patches together, please have a hard look at the >>>>> > > > end result found in: >>>>> > > > >>>>> > > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core >>>>> > > > >>>>> >>>>> I ran this kernel on Haswell. Even with Andi's fixes the problem I identified is >>>>> still there, so my patch is still needed. >>>> >>>> Right, your patch should be included in that kernel, or did I make a >>>> royal mess of things? >>>> >>> No, it is as expected for the OVF PMI fix. >>> >>>> I put Andi's late status ack on top of your patch. >>>> >> Ok, I ran into a problem on Broadwell with your branch with Andi's >> patches. I see > > Sorry this is with tip.git and not your branch. Will try with it too. With your queue.tip perf/core branch, I run into another problem. I am monitoring with 2 PEBS events and I have the NMI watchdog enabled. I see non-EXACT PEBS records again, despite my change (which is in). I tracked it down to the following issue after the testing of bit 62: [31137.273061] CPU71 status=0x200000001 orig_status=0x200000001 bit62=0 The IRQ handler is called because the fixed counter for the NMI has overflowed and it sees this in bit 33, but it also sees that one of the PEBS events has also overflowed, yet bit 62 is not set. Therefore both overflows are treated as regular and the drain_pebs() is not called generating a non-EXACT record for the PEBS counter (counter 0). So something is wrong still and this is on Broadwell. First, I don't understand why the OVF bit for counter 0 is set. It should not according to specs because the counter is in PEBS mode. There must be a race there. So we have to handle it by relying on cpuc->pebs_enabled. I will try that. We likely also need to force OVF bit 62 to 1 so we can ack it in the end (and in case it gets set).