From: Stephane Eranian <eranian@google.com>
To: linux-kernel@vger.kernel.org
Cc: acme@redhat.com, peterz@infradead.org, mingo@elte.hu, ak@linux.intel.com,
        kan.liang@intel.com, jolsa@redhat.com, namhyung@kernel.org,
        adrian.hunter@intel.com
Subject: [PATCH 2/3] perf/x86/pebs: add workaround for broken OVFL status on HSW
Date: Thu,  3 Mar 2016 20:50:41 +0100
Message-Id: <1457034642-21837-3-git-send-email-eranian@google.com>
In-Reply-To: <1457034642-21837-1-git-send-email-eranian@google.com>
References: <1457034642-21837-1-git-send-email-eranian@google.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2968
Lines: 71

This patch fixes an issue with the GLOBAL_OVERFLOW_STATUS bits
on Haswell, Broadwell and Skylake processors when using PEBS.

The SDM stipulates that when the PEBS iterrupt threshold is crossed, an
interrupt is posted and the kernel is interrupted. The kernel will find
GLOBAL_OVF_SATUS bit 62 set indicating there are PEBS records
to drain. But the bits corresponding to the actual counters should
NOT be set. The kernel follows the SDM and assumes that all PEBS
events are processed in the drain_pebs() callback. The kernel then
checks for remaining overflows on any other (non-PEBS) events and
processes these in the for_each_bit_set(&status) loop.

As it turns out, under certain conditions on HSW and later processors,
on PEBS buffer interrupt, bit 62 is set but the counter bits may be set
as well. In that case, the kernel drains PEBS and generates SAMPLES with
the EXACT tag, then it processes the counter bits, and generates
normal (non-EXACT) SAMPLES.

I ran into this problem by trying to understand why on HSW sampling
on a PEBS event was sometimes returning SAMPLES without the EXACT tag.
This should not happen on user level code because HSW has the
eventing_ip which always point to the instruction that caused the
event.

The workaround in this patch simply ensures that the bits for
the counters used for PEBS events are cleared after the PEBS
buffer has been drained. With this fix 100% of the PEBS
samples on my user code report the EXACT tag.

Before:
$ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
$ perf report -D | fgrep SAMPLES
PERF_RECORD_SAMPLE(IP, 0x2): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
                         \--- EXACT tag is missing

After:
$ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
$ perf report -D | fgrep SAMPLES
PERF_RECORD_SAMPLE(IP, 0x4002): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
                         \--- EXACT tag is set

The problem tends to appear more often when multiple PEBS events are used.

Signed-off-by: Stephane Eranian <eranian@google.com>
---
 arch/x86/events/intel/core.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a7ec685..bdb77ed 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -1884,6 +1884,16 @@ again:
 	if (__test_and_clear_bit(62, (unsigned long *)&status)) {
 		handled++;
 		x86_pmu.drain_pebs(regs);
+		/*
+		 * There are cases where, even though, the PEBS ovfl bit is set in
+		 * GLOBAL_OVF_STATUS, the PEBS events may also have their overflow bits
+		 * set for their counters. We must clear them here because they have
+		 * been processed as exact samples in the drain_pebs() routine. They
+		 * must not be processed again in the for_each_bit_set() loop for
+		 * regular samples below.
+		 */
+		status &= ~cpuc->pebs_enabled;
+		status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
 	}
 
 	/*
-- 
2.5.0