This short patch series fixes several issues related to PEBS on Haswell
and later processors.
- The HT corruption bug (HSD29) is fixed on Broadwell, there is no need
to mark events 0xd0-0xd3 as exclusive. This was fixed for non-PEBS mode
but not for PEBS mode.
- On Haswell and later, there is a bug with the GLOBAL_OVF_STATUS bits
reporting counter overflows for PEBS events. SDM says it should not
but I ran into cases where bits 0-3 were set, causing double processing
of the samples and samples missing the EXACT tag. See changelog
Tested on Haswell, Broadwell, SkyLake
Patch relative to tip.git
Stephane Eranian (3):
perf/x86/intel: add definition for PT PMI bit
perf/x86/pebs: add workaround for broken OVFL status on HSW
perf/x86/pebs: add proper PEBS constraints for Broadwell
arch/x86/events/intel/core.c | 12 +++++++++++-
arch/x86/events/intel/ds.c | 24 ++++++++++++++++++++++++
arch/x86/include/asm/perf_event.h | 1 +
3 files changed, 36 insertions(+), 1 deletion(-)
--
2.5.0
This patch adds a definition for GLOBAL_OVFL_STATUS bit 55
which is used with the Processor Trace (PT) feature.
Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/include/asm/perf_event.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 7bcb861..5a2ed3e 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -165,6 +165,7 @@ struct x86_pmu_capability {
#define GLOBAL_STATUS_ASIF BIT_ULL(60)
#define GLOBAL_STATUS_COUNTERS_FROZEN BIT_ULL(59)
#define GLOBAL_STATUS_LBRS_FROZEN BIT_ULL(58)
+#define GLOBAL_STATUS_TRACE_TOPAPMI BIT_ULL(55)
/*
* IBS cpuid feature detection
--
2.5.0
This patch fixes an issue with the GLOBAL_OVERFLOW_STATUS bits
on Haswell, Broadwell and Skylake processors when using PEBS.
The SDM stipulates that when the PEBS iterrupt threshold is crossed, an
interrupt is posted and the kernel is interrupted. The kernel will find
GLOBAL_OVF_SATUS bit 62 set indicating there are PEBS records
to drain. But the bits corresponding to the actual counters should
NOT be set. The kernel follows the SDM and assumes that all PEBS
events are processed in the drain_pebs() callback. The kernel then
checks for remaining overflows on any other (non-PEBS) events and
processes these in the for_each_bit_set(&status) loop.
As it turns out, under certain conditions on HSW and later processors,
on PEBS buffer interrupt, bit 62 is set but the counter bits may be set
as well. In that case, the kernel drains PEBS and generates SAMPLES with
the EXACT tag, then it processes the counter bits, and generates
normal (non-EXACT) SAMPLES.
I ran into this problem by trying to understand why on HSW sampling
on a PEBS event was sometimes returning SAMPLES without the EXACT tag.
This should not happen on user level code because HSW has the
eventing_ip which always point to the instruction that caused the
event.
The workaround in this patch simply ensures that the bits for
the counters used for PEBS events are cleared after the PEBS
buffer has been drained. With this fix 100% of the PEBS
samples on my user code report the EXACT tag.
Before:
$ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
$ perf report -D | fgrep SAMPLES
PERF_RECORD_SAMPLE(IP, 0x2): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
\--- EXACT tag is missing
After:
$ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
$ perf report -D | fgrep SAMPLES
PERF_RECORD_SAMPLE(IP, 0x4002): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
\--- EXACT tag is set
The problem tends to appear more often when multiple PEBS events are used.
Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/events/intel/core.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a7ec685..bdb77ed 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -1884,6 +1884,16 @@ again:
if (__test_and_clear_bit(62, (unsigned long *)&status)) {
handled++;
x86_pmu.drain_pebs(regs);
+ /*
+ * There are cases where, even though, the PEBS ovfl bit is set in
+ * GLOBAL_OVF_STATUS, the PEBS events may also have their overflow bits
+ * set for their counters. We must clear them here because they have
+ * been processed as exact samples in the drain_pebs() routine. They
+ * must not be processed again in the for_each_bit_set() loop for
+ * regular samples below.
+ */
+ status &= ~cpuc->pebs_enabled;
+ status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
}
/*
--
2.5.0
This patch adds a Broadwell specific PEBS event constraint table.
Broadwell has a fix for the HT corruption bug erratum HSD29 on Haswell.
Therefore, there is no need to mark events 0xd0, 0xd1, 0xd2, 0xd3 has
requiring the exclusive mode across both sibling HT threads. This holds
true for regular counting and sampling (see core.c) and PEBS (ds.c) which
we fix in this patch.
In doing so, we relax evnt scheduling for these events, they can
now be programmed on any 4 counters without impacting what is measured
on the sibling thread.
Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/events/intel/core.c | 2 +-
arch/x86/events/intel/ds.c | 24 ++++++++++++++++++++++++
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index bdb77ed..a70d150 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3591,7 +3591,7 @@ __init int intel_pmu_init(void)
intel_pmu_lbr_init_hsw();
x86_pmu.event_constraints = intel_bdw_event_constraints;
- x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
+ x86_pmu.pebs_constraints = intel_bdw_pebs_event_constraints;
x86_pmu.extra_regs = intel_snbep_extra_regs;
x86_pmu.pebs_aliases = intel_pebs_aliases_ivb;
x86_pmu.pebs_prec_dist = true;
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index c8a243d..c6a2f45 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -722,6 +722,30 @@ struct event_constraint intel_hsw_pebs_event_constraints[] = {
EVENT_CONSTRAINT_END
};
+struct event_constraint intel_bdw_pebs_event_constraints[] = {
+ INTEL_FLAGS_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
+ INTEL_PLD_CONSTRAINT(0x01cd, 0xf), /* MEM_TRANS_RETIRED.* */
+ /* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */
+ INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf),
+ /* INST_RETIRED.PREC_DIST, inv=1, cmask=16 (cycles:ppp). */
+ INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c0, 0x2),
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_LOADS */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x41d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_LOADS */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x81d0, 0xf), /* MEM_UOPS_RETIRED.ALL_LOADS */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_STORES */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x42d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_STORES */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x82d0, 0xf), /* MEM_UOPS_RETIRED.ALL_STORES */
+ INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
+ INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd2, 0xf), /* MEM_LOAD_UOPS_L3_HIT_RETIRED.* */
+ INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd3, 0xf), /* MEM_LOAD_UOPS_L3_MISS_RETIRED.* */
+ /* Allow all events as PEBS with no flags */
+ INTEL_ALL_EVENT_CONSTRAINT(0, 0xf),
+ EVENT_CONSTRAINT_END
+};
+
+
struct event_constraint intel_skl_pebs_event_constraints[] = {
INTEL_FLAGS_UEVENT_CONSTRAINT(0x1c0, 0x2), /* INST_RETIRED.PREC_DIST */
/* INST_RETIRED.PREC_DIST, inv=1, cmask=16 (cycles:ppp). */
--
2.5.0
> + /*
> + * There are cases where, even though, the PEBS ovfl bit is set in
> + * GLOBAL_OVF_STATUS, the PEBS events may also have their overflow bits
> + * set for their counters. We must clear them here because they have
> + * been processed as exact samples in the drain_pebs() routine. They
> + * must not be processed again in the for_each_bit_set() loop for
> + * regular samples below.
> + */
> + status &= ~cpuc->pebs_enabled;
> + status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
If you just clear the bits here they will not be acked and stay around
forever in GLOBAL_STATUS, which causes other problems.
You need a separate ack_status variable that contains all bits and is always
acked.
-Andi
--
[email protected] -- Speaking for myself only
On Thu, Mar 3, 2016 at 1:43 PM, Andi Kleen <[email protected]> wrote:
>
> > + /*
> > + * There are cases where, even though, the PEBS ovfl bit is set in
> > + * GLOBAL_OVF_STATUS, the PEBS events may also have their overflow bits
> > + * set for their counters. We must clear them here because they have
> > + * been processed as exact samples in the drain_pebs() routine. They
> > + * must not be processed again in the for_each_bit_set() loop for
> > + * regular samples below.
> > + */
> > + status &= ~cpuc->pebs_enabled;
> > + status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
>
> If you just clear the bits here they will not be acked and stay around
> forever in GLOBAL_STATUS, which causes other problems.
>
> You need a separate ack_status variable that contains all bits and is always
> acked.
I understand that. You mean I need to that has all the bits that were set to
call intel_pmu_ack_status().
But if you look at the code, and where I made the change, there is no more
intel_pmu_ack_status() BEFORE you read the status again via
intel_pmu_get_status().
So why would I need to keep another variable around?
>
>
>
> -Andi
> --
> [email protected] -- Speaking for myself only
On Thu, Mar 03, 2016 at 03:40:49PM -0800, Stephane Eranian wrote:
> On Thu, Mar 3, 2016 at 1:43 PM, Andi Kleen <[email protected]> wrote:
> >
> > > + /*
> > > + * There are cases where, even though, the PEBS ovfl bit is set in
> > > + * GLOBAL_OVF_STATUS, the PEBS events may also have their overflow bits
> > > + * set for their counters. We must clear them here because they have
> > > + * been processed as exact samples in the drain_pebs() routine. They
> > > + * must not be processed again in the for_each_bit_set() loop for
> > > + * regular samples below.
> > > + */
> > > + status &= ~cpuc->pebs_enabled;
> > > + status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
> >
> > If you just clear the bits here they will not be acked and stay around
> > forever in GLOBAL_STATUS, which causes other problems.
> >
> > You need a separate ack_status variable that contains all bits and is always
> > acked.
>
> I understand that. You mean I need to that has all the bits that were set to
> call intel_pmu_ack_status().
>
> But if you look at the code, and where I made the change, there is no more
> intel_pmu_ack_status() BEFORE you read the status again via
> intel_pmu_get_status().
>
> So why would I need to keep another variable around?
I suspect Andi is having something along:
lkml.kernel.org/r/[email protected]
applied to his tree.
On Mon, Mar 07, 2016 at 11:24:13AM +0100, Peter Zijlstra wrote:
> I suspect Andi is having something along:
>
> lkml.kernel.org/r/[email protected]
>
> applied to his tree.
OK, I munged a bunch of patches together, please have a hard look at the
end result found in:
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
On Mon, Mar 07, 2016 at 01:18:40PM +0100, Peter Zijlstra wrote:
> On Mon, Mar 07, 2016 at 11:24:13AM +0100, Peter Zijlstra wrote:
>
> > I suspect Andi is having something along:
> >
> > lkml.kernel.org/r/[email protected]
> >
> > applied to his tree.
>
> OK, I munged a bunch of patches together, please have a hard look at the
> end result found in:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
>
>
I needed declaration below to compile
jirka
---
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 0e37370..097e4c1 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -862,6 +862,8 @@ extern struct event_constraint intel_hsw_pebs_event_constraints[];
extern struct event_constraint intel_skl_pebs_event_constraints[];
+extern struct event_constraint intel_bdw_pebs_event_constraints[];
+
struct event_constraint *intel_pebs_constraints(struct perf_event *event);
void intel_pmu_pebs_enable(struct perf_event *event);
On Mon, Mar 07, 2016 at 07:27:31PM +0100, Jiri Olsa wrote:
> On Mon, Mar 07, 2016 at 01:18:40PM +0100, Peter Zijlstra wrote:
> > On Mon, Mar 07, 2016 at 11:24:13AM +0100, Peter Zijlstra wrote:
> >
> > > I suspect Andi is having something along:
> > >
> > > lkml.kernel.org/r/[email protected]
> > >
> > > applied to his tree.
> >
> > OK, I munged a bunch of patches together, please have a hard look at the
> > end result found in:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
> >
> >
>
> I needed declaration below to compile
Yes indeed, (because Monday) I only compiled part of the patch series
and then the build bot initial emails were all builds success.
I have since fixed this (and gotten a fail email from the build bot).
In any case, lots of fail, but it should be fixed if you've pulled from
that git branch in the last 5 hours or so.
Commit-ID: 5690ae28e472d25e330ad0c637a5cea3fc39fb32
Gitweb: http://git.kernel.org/tip/5690ae28e472d25e330ad0c637a5cea3fc39fb32
Author: Stephane Eranian <[email protected]>
AuthorDate: Thu, 3 Mar 2016 20:50:40 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 8 Mar 2016 12:18:34 +0100
perf/x86/intel: Add definition for PT PMI bit
This patch adds a definition for GLOBAL_OVFL_STATUS bit 55
which is used with the Processor Trace (PT) feature.
Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/perf_event.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 7bcb861..5a2ed3e 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -165,6 +165,7 @@ struct x86_pmu_capability {
#define GLOBAL_STATUS_ASIF BIT_ULL(60)
#define GLOBAL_STATUS_COUNTERS_FROZEN BIT_ULL(59)
#define GLOBAL_STATUS_LBRS_FROZEN BIT_ULL(58)
+#define GLOBAL_STATUS_TRACE_TOPAPMI BIT_ULL(55)
/*
* IBS cpuid feature detection
Commit-ID: 8077eca079a212f26419c57226f28696b7100683
Gitweb: http://git.kernel.org/tip/8077eca079a212f26419c57226f28696b7100683
Author: Stephane Eranian <[email protected]>
AuthorDate: Thu, 3 Mar 2016 20:50:41 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 8 Mar 2016 12:18:35 +0100
perf/x86/pebs: Add workaround for broken OVFL status on HSW+
This patch fixes an issue with the GLOBAL_OVERFLOW_STATUS bits on
Haswell, Broadwell and Skylake processors when using PEBS.
The SDM stipulates that when the PEBS iterrupt threshold is crossed,
an interrupt is posted and the kernel is interrupted. The kernel will
find GLOBAL_OVF_SATUS bit 62 set indicating there are PEBS records to
drain. But the bits corresponding to the actual counters should NOT be
set. The kernel follows the SDM and assumes that all PEBS events are
processed in the drain_pebs() callback. The kernel then checks for
remaining overflows on any other (non-PEBS) events and processes these
in the for_each_bit_set(&status) loop.
As it turns out, under certain conditions on HSW and later processors,
on PEBS buffer interrupt, bit 62 is set but the counter bits may be
set as well. In that case, the kernel drains PEBS and generates
SAMPLES with the EXACT tag, then it processes the counter bits, and
generates normal (non-EXACT) SAMPLES.
I ran into this problem by trying to understand why on HSW sampling on
a PEBS event was sometimes returning SAMPLES without the EXACT tag.
This should not happen on user level code because HSW has the
eventing_ip which always point to the instruction that caused the
event.
The workaround in this patch simply ensures that the bits for the
counters used for PEBS events are cleared after the PEBS buffer has
been drained. With this fix 100% of the PEBS samples on my user code
report the EXACT tag.
Before:
$ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
$ perf report -D | fgrep SAMPLES
PERF_RECORD_SAMPLE(IP, 0x2): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
\--- EXACT tag is missing
After:
$ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
$ perf report -D | fgrep SAMPLES
PERF_RECORD_SAMPLE(IP, 0x4002): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
\--- EXACT tag is set
The problem tends to appear more often when multiple PEBS events are used.
Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vince Weaver <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/events/intel/core.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index b3f6349..6567c62 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -1892,6 +1892,16 @@ again:
if (__test_and_clear_bit(62, (unsigned long *)&status)) {
handled++;
x86_pmu.drain_pebs(regs);
+ /*
+ * There are cases where, even though, the PEBS ovfl bit is set
+ * in GLOBAL_OVF_STATUS, the PEBS events may also have their
+ * overflow bits set for their counters. We must clear them
+ * here because they have been processed as exact samples in
+ * the drain_pebs() routine. They must not be processed again
+ * in the for_each_bit_set() loop for regular samples below.
+ */
+ status &= ~cpuc->pebs_enabled;
+ status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
}
/*
Commit-ID: b3e6246336a4a329644418a1c66e2c6bed44ef81
Gitweb: http://git.kernel.org/tip/b3e6246336a4a329644418a1c66e2c6bed44ef81
Author: Stephane Eranian <[email protected]>
AuthorDate: Thu, 3 Mar 2016 20:50:42 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 8 Mar 2016 12:19:12 +0100
perf/x86/pebs: Add proper PEBS constraints for Broadwell
This patch adds a Broadwell specific PEBS event constraint table.
Broadwell has a fix for the HT corruption bug erratum HSD29 on
Haswell. Therefore, there is no need to mark events 0xd0, 0xd1, 0xd2,
0xd3 has requiring the exclusive mode across both sibling HT threads.
This holds true for regular counting and sampling (see core.c) and
PEBS (ds.c) which we fix in this patch.
In doing so, we relax evnt scheduling for these events, they can now
be programmed on any 4 counters without impacting what is measured on
the sibling thread.
Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/events/intel/core.c | 2 +-
arch/x86/events/intel/ds.c | 24 ++++++++++++++++++++++++
arch/x86/events/perf_event.h | 2 ++
3 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 6567c62..edac81c 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3602,7 +3602,7 @@ __init int intel_pmu_init(void)
intel_pmu_lbr_init_hsw();
x86_pmu.event_constraints = intel_bdw_event_constraints;
- x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
+ x86_pmu.pebs_constraints = intel_bdw_pebs_event_constraints;
x86_pmu.extra_regs = intel_snbep_extra_regs;
x86_pmu.pebs_aliases = intel_pebs_aliases_ivb;
x86_pmu.pebs_prec_dist = true;
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 22ece02..a99a8cb 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -722,6 +722,30 @@ struct event_constraint intel_hsw_pebs_event_constraints[] = {
EVENT_CONSTRAINT_END
};
+struct event_constraint intel_bdw_pebs_event_constraints[] = {
+ INTEL_FLAGS_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
+ INTEL_PLD_CONSTRAINT(0x01cd, 0xf), /* MEM_TRANS_RETIRED.* */
+ /* UOPS_RETIRED.ALL, inv=1, cmask=16 (cycles:p). */
+ INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c2, 0xf),
+ /* INST_RETIRED.PREC_DIST, inv=1, cmask=16 (cycles:ppp). */
+ INTEL_FLAGS_EVENT_CONSTRAINT(0x108001c0, 0x2),
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x11d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_LOADS */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x41d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_LOADS */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x81d0, 0xf), /* MEM_UOPS_RETIRED.ALL_LOADS */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x12d0, 0xf), /* MEM_UOPS_RETIRED.STLB_MISS_STORES */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x42d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_STORES */
+ INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x82d0, 0xf), /* MEM_UOPS_RETIRED.ALL_STORES */
+ INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
+ INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd2, 0xf), /* MEM_LOAD_UOPS_L3_HIT_RETIRED.* */
+ INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(0xd3, 0xf), /* MEM_LOAD_UOPS_L3_MISS_RETIRED.* */
+ /* Allow all events as PEBS with no flags */
+ INTEL_ALL_EVENT_CONSTRAINT(0, 0xf),
+ EVENT_CONSTRAINT_END
+};
+
+
struct event_constraint intel_skl_pebs_event_constraints[] = {
INTEL_FLAGS_UEVENT_CONSTRAINT(0x1c0, 0x2), /* INST_RETIRED.PREC_DIST */
/* INST_RETIRED.PREC_DIST, inv=1, cmask=16 (cycles:ppp). */
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 1ab6279..24e259e 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -861,6 +861,8 @@ extern struct event_constraint intel_ivb_pebs_event_constraints[];
extern struct event_constraint intel_hsw_pebs_event_constraints[];
+extern struct event_constraint intel_bdw_pebs_event_constraints[];
+
extern struct event_constraint intel_skl_pebs_event_constraints[];
struct event_constraint *intel_pebs_constraints(struct perf_event *event);
hi,
On Mon, Mar 7, 2016 at 12:25 PM, Peter Zijlstra <[email protected]> wrote:
>
> On Mon, Mar 07, 2016 at 07:27:31PM +0100, Jiri Olsa wrote:
> > On Mon, Mar 07, 2016 at 01:18:40PM +0100, Peter Zijlstra wrote:
> > > On Mon, Mar 07, 2016 at 11:24:13AM +0100, Peter Zijlstra wrote:
> > >
> > > > I suspect Andi is having something along:
> > > >
> > > > lkml.kernel.org/r/[email protected]
> > > >
> > > > applied to his tree.
> > >
> > > OK, I munged a bunch of patches together, please have a hard look at the
> > > end result found in:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
> > >
I ran this kernel on Haswell. Even with Andi's fixes the problem I identified is
still there, so my patch is still needed.
>
> > >
> >
> > I needed declaration below to compile
>
> Yes indeed, (because Monday) I only compiled part of the patch series
> and then the build bot initial emails were all builds success.
>
> I have since fixed this (and gotten a fail email from the build bot).
>
> In any case, lots of fail, but it should be fixed if you've pulled from
> that git branch in the last 5 hours or so.
On Tue, Mar 08, 2016 at 12:59:23PM -0800, Stephane Eranian wrote:
> hi,
>
> On Mon, Mar 7, 2016 at 12:25 PM, Peter Zijlstra <[email protected]> wrote:
> >
> > On Mon, Mar 07, 2016 at 07:27:31PM +0100, Jiri Olsa wrote:
> > > On Mon, Mar 07, 2016 at 01:18:40PM +0100, Peter Zijlstra wrote:
> > > > On Mon, Mar 07, 2016 at 11:24:13AM +0100, Peter Zijlstra wrote:
> > > >
> > > > > I suspect Andi is having something along:
> > > > >
> > > > > lkml.kernel.org/r/[email protected]
> > > > >
> > > > > applied to his tree.
> > > >
> > > > OK, I munged a bunch of patches together, please have a hard look at the
> > > > end result found in:
> > > >
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
> > > >
>
> I ran this kernel on Haswell. Even with Andi's fixes the problem I identified is
> still there, so my patch is still needed.
Right, your patch should be included in that kernel, or did I make a
royal mess of things?
I put Andi's late status ack on top of your patch.
Also note, Ingo merged most of those patches today, all except the top
3, because Andi wanted to double check something.
Hi,
On Tue, Mar 8, 2016 at 1:07 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, Mar 08, 2016 at 12:59:23PM -0800, Stephane Eranian wrote:
>> hi,
>>
>> On Mon, Mar 7, 2016 at 12:25 PM, Peter Zijlstra <[email protected]> wrote:
>> >
>> > On Mon, Mar 07, 2016 at 07:27:31PM +0100, Jiri Olsa wrote:
>> > > On Mon, Mar 07, 2016 at 01:18:40PM +0100, Peter Zijlstra wrote:
>> > > > On Mon, Mar 07, 2016 at 11:24:13AM +0100, Peter Zijlstra wrote:
>> > > >
>> > > > > I suspect Andi is having something along:
>> > > > >
>> > > > > lkml.kernel.org/r/[email protected]
>> > > > >
>> > > > > applied to his tree.
>> > > >
>> > > > OK, I munged a bunch of patches together, please have a hard look at the
>> > > > end result found in:
>> > > >
>> > > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
>> > > >
>>
>> I ran this kernel on Haswell. Even with Andi's fixes the problem I identified is
>> still there, so my patch is still needed.
>
> Right, your patch should be included in that kernel, or did I make a
> royal mess of things?
>
No, it is as expected for the OVF PMI fix.
> I put Andi's late status ack on top of your patch.
>
> Also note, Ingo merged most of those patches today, all except the top
> 3, because Andi wanted to double check something.
Ok. I will try on SKL today.
On Tue, Mar 8, 2016 at 1:13 PM, Stephane Eranian <[email protected]> wrote:
> Hi,
>
> On Tue, Mar 8, 2016 at 1:07 PM, Peter Zijlstra <[email protected]> wrote:
>> On Tue, Mar 08, 2016 at 12:59:23PM -0800, Stephane Eranian wrote:
>>> hi,
>>>
>>> On Mon, Mar 7, 2016 at 12:25 PM, Peter Zijlstra <[email protected]> wrote:
>>> >
>>> > On Mon, Mar 07, 2016 at 07:27:31PM +0100, Jiri Olsa wrote:
>>> > > On Mon, Mar 07, 2016 at 01:18:40PM +0100, Peter Zijlstra wrote:
>>> > > > On Mon, Mar 07, 2016 at 11:24:13AM +0100, Peter Zijlstra wrote:
>>> > > >
>>> > > > > I suspect Andi is having something along:
>>> > > > >
>>> > > > > lkml.kernel.org/r/[email protected]
>>> > > > >
>>> > > > > applied to his tree.
>>> > > >
>>> > > > OK, I munged a bunch of patches together, please have a hard look at the
>>> > > > end result found in:
>>> > > >
>>> > > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
>>> > > >
>>>
>>> I ran this kernel on Haswell. Even with Andi's fixes the problem I identified is
>>> still there, so my patch is still needed.
>>
>> Right, your patch should be included in that kernel, or did I make a
>> royal mess of things?
>>
> No, it is as expected for the OVF PMI fix.
>
>> I put Andi's late status ack on top of your patch.
>>
Ok, I ran into a problem on Broadwell with your branch with Andi's
patches. I see
a problem which had disappeared since SandyBridge:
11551.128422] ------------[ cut here ]------------
[11551.128435] WARNING: CPU: 3 PID: 12114 at
arch/x86/events/intel/core.c:1868 intel_pmu_handle_irq+0x2da/0x4b0()
[11551.128437] perfevents: irq loop stuck!
[11551.128469] <NMI> [<ffffffff81663975>] dump_stack+0x4d/0x63
[11551.128479] [<ffffffff810b8657>] warn_slowpath_common+0x97/0xe0
[11551.128482] [<ffffffff810b8756>] warn_slowpath_fmt+0x46/0x50
[11551.128486] [<ffffffff8100b6ca>] intel_pmu_handle_irq+0x2da/0x4b0
[11551.128491] [<ffffffff81004569>] perf_event_nmi_handler+0x39/0x60
[11551.128494] [<ffffffff8107be61>] nmi_handle+0x61/0x110
[11551.128497] [<ffffffff8107c684>] default_do_nmi+0x44/0x110
[11551.128500] [<ffffffff8107c827>] do_nmi+0xd7/0x140
[11551.128504] [<ffffffff8166e127>] end_repeat_nmi+0x1a/0x1e
[11551.128507] [<ffffffff81009dd6>] ? native_write_msr+0x6/0x30
[11551.128510] [<ffffffff81009dd6>] ? native_write_msr+0x6/0x30
[11551.128514] [<ffffffff81009dd6>] ? native_write_msr+0x6/0x30
[11551.128515] <<EOE>> [<ffffffff8100b385>] ?
intel_pmu_enable_event+0x215/0x230
[11551.128520] [<ffffffff81005a0d>] x86_pmu_start+0x8d/0x120
[11551.128523] [<ffffffff810061db>] x86_pmu_enable+0x27b/0x2f0
[11551.128527] [<ffffffff8118d63d>] perf_pmu_enable+0x1d/0x30
[11551.128530] [<ffffffff81191bca>] ctx_resched+0x5a/0x70
[11551.128532] [<ffffffff81191d8c>] __perf_event_enable+0x1ac/0x210
[11551.128537] [<ffffffff81188f81>] event_function+0xa1/0x170
[11551.128540] [<ffffffff811899b0>] ? perf_duration_warn+0x70/0x70
[11551.128543] [<ffffffff811899f7>] remote_function+0x47/0x60
[11551.128547] [<ffffffff8112a178>] generic_exec_single+0xa8/0xb0
[11551.128550] [<ffffffff811899b0>] ? perf_duration_warn+0x70/0x70
[11551.128553] [<ffffffff811899b0>] ? perf_duration_warn+0x70/0x70
[11551.128555] [<ffffffff8112a298>] smp_call_function_single+0xa8/0x100
[11551.128559] [<ffffffff8118aec4>] event_function_call+0x84/0x100
[11551.128561] [<ffffffff81191be0>] ? ctx_resched+0x70/0x70
[11551.128564] [<ffffffff81191be0>] ? ctx_resched+0x70/0x70
[11551.128566] [<ffffffff81188ee0>] ? perf_ctx_lock+0x30/0x30
[11551.128570] [<ffffffff8118b050>] _perf_event_enable+0x60/0x80
[11551.128572] [<ffffffff8118fc61>] perf_ioctl+0x271/0x3e0
The infinite loop in the irq handler!
But here it seems there is a race with a perf_events ioctl() to likely
reset the period.
I am not using the perf tool here just running a self-monitoring task.
>> Also note, Ingo merged most of those patches today, all except the top
>> 3, because Andi wanted to double check something.
>
On Tue, Mar 8, 2016 at 9:34 PM, Stephane Eranian <[email protected]> wrote:
> On Tue, Mar 8, 2016 at 1:13 PM, Stephane Eranian <[email protected]> wrote:
>> Hi,
>>
>> On Tue, Mar 8, 2016 at 1:07 PM, Peter Zijlstra <[email protected]> wrote:
>>> On Tue, Mar 08, 2016 at 12:59:23PM -0800, Stephane Eranian wrote:
>>>> hi,
>>>>
>>>> On Mon, Mar 7, 2016 at 12:25 PM, Peter Zijlstra <[email protected]> wrote:
>>>> >
>>>> > On Mon, Mar 07, 2016 at 07:27:31PM +0100, Jiri Olsa wrote:
>>>> > > On Mon, Mar 07, 2016 at 01:18:40PM +0100, Peter Zijlstra wrote:
>>>> > > > On Mon, Mar 07, 2016 at 11:24:13AM +0100, Peter Zijlstra wrote:
>>>> > > >
>>>> > > > > I suspect Andi is having something along:
>>>> > > > >
>>>> > > > > lkml.kernel.org/r/[email protected]
>>>> > > > >
>>>> > > > > applied to his tree.
>>>> > > >
>>>> > > > OK, I munged a bunch of patches together, please have a hard look at the
>>>> > > > end result found in:
>>>> > > >
>>>> > > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
>>>> > > >
>>>>
>>>> I ran this kernel on Haswell. Even with Andi's fixes the problem I identified is
>>>> still there, so my patch is still needed.
>>>
>>> Right, your patch should be included in that kernel, or did I make a
>>> royal mess of things?
>>>
>> No, it is as expected for the OVF PMI fix.
>>
>>> I put Andi's late status ack on top of your patch.
>>>
> Ok, I ran into a problem on Broadwell with your branch with Andi's
> patches. I see
Sorry this is with tip.git and not your branch. Will try with it too.
> a problem which had disappeared since SandyBridge:
>
> 11551.128422] ------------[ cut here ]------------
> [11551.128435] WARNING: CPU: 3 PID: 12114 at
> arch/x86/events/intel/core.c:1868 intel_pmu_handle_irq+0x2da/0x4b0()
> [11551.128437] perfevents: irq loop stuck!
> [11551.128469] <NMI> [<ffffffff81663975>] dump_stack+0x4d/0x63
> [11551.128479] [<ffffffff810b8657>] warn_slowpath_common+0x97/0xe0
> [11551.128482] [<ffffffff810b8756>] warn_slowpath_fmt+0x46/0x50
> [11551.128486] [<ffffffff8100b6ca>] intel_pmu_handle_irq+0x2da/0x4b0
> [11551.128491] [<ffffffff81004569>] perf_event_nmi_handler+0x39/0x60
> [11551.128494] [<ffffffff8107be61>] nmi_handle+0x61/0x110
> [11551.128497] [<ffffffff8107c684>] default_do_nmi+0x44/0x110
> [11551.128500] [<ffffffff8107c827>] do_nmi+0xd7/0x140
> [11551.128504] [<ffffffff8166e127>] end_repeat_nmi+0x1a/0x1e
> [11551.128507] [<ffffffff81009dd6>] ? native_write_msr+0x6/0x30
> [11551.128510] [<ffffffff81009dd6>] ? native_write_msr+0x6/0x30
> [11551.128514] [<ffffffff81009dd6>] ? native_write_msr+0x6/0x30
> [11551.128515] <<EOE>> [<ffffffff8100b385>] ?
> intel_pmu_enable_event+0x215/0x230
> [11551.128520] [<ffffffff81005a0d>] x86_pmu_start+0x8d/0x120
> [11551.128523] [<ffffffff810061db>] x86_pmu_enable+0x27b/0x2f0
> [11551.128527] [<ffffffff8118d63d>] perf_pmu_enable+0x1d/0x30
> [11551.128530] [<ffffffff81191bca>] ctx_resched+0x5a/0x70
> [11551.128532] [<ffffffff81191d8c>] __perf_event_enable+0x1ac/0x210
> [11551.128537] [<ffffffff81188f81>] event_function+0xa1/0x170
> [11551.128540] [<ffffffff811899b0>] ? perf_duration_warn+0x70/0x70
> [11551.128543] [<ffffffff811899f7>] remote_function+0x47/0x60
> [11551.128547] [<ffffffff8112a178>] generic_exec_single+0xa8/0xb0
> [11551.128550] [<ffffffff811899b0>] ? perf_duration_warn+0x70/0x70
> [11551.128553] [<ffffffff811899b0>] ? perf_duration_warn+0x70/0x70
> [11551.128555] [<ffffffff8112a298>] smp_call_function_single+0xa8/0x100
> [11551.128559] [<ffffffff8118aec4>] event_function_call+0x84/0x100
> [11551.128561] [<ffffffff81191be0>] ? ctx_resched+0x70/0x70
> [11551.128564] [<ffffffff81191be0>] ? ctx_resched+0x70/0x70
> [11551.128566] [<ffffffff81188ee0>] ? perf_ctx_lock+0x30/0x30
> [11551.128570] [<ffffffff8118b050>] _perf_event_enable+0x60/0x80
> [11551.128572] [<ffffffff8118fc61>] perf_ioctl+0x271/0x3e0
>
> The infinite loop in the irq handler!
>
> But here it seems there is a race with a perf_events ioctl() to likely
> reset the period.
> I am not using the perf tool here just running a self-monitoring task.
>
>
>>> Also note, Ingo merged most of those patches today, all except the top
>>> 3, because Andi wanted to double check something.
>>
On Tue, Mar 8, 2016 at 9:44 PM, Stephane Eranian <[email protected]> wrote:
> On Tue, Mar 8, 2016 at 9:34 PM, Stephane Eranian <[email protected]> wrote:
>> On Tue, Mar 8, 2016 at 1:13 PM, Stephane Eranian <[email protected]> wrote:
>>> Hi,
>>>
>>> On Tue, Mar 8, 2016 at 1:07 PM, Peter Zijlstra <[email protected]> wrote:
>>>> On Tue, Mar 08, 2016 at 12:59:23PM -0800, Stephane Eranian wrote:
>>>>> hi,
>>>>>
>>>>> On Mon, Mar 7, 2016 at 12:25 PM, Peter Zijlstra <[email protected]> wrote:
>>>>> >
>>>>> > On Mon, Mar 07, 2016 at 07:27:31PM +0100, Jiri Olsa wrote:
>>>>> > > On Mon, Mar 07, 2016 at 01:18:40PM +0100, Peter Zijlstra wrote:
>>>>> > > > On Mon, Mar 07, 2016 at 11:24:13AM +0100, Peter Zijlstra wrote:
>>>>> > > >
>>>>> > > > > I suspect Andi is having something along:
>>>>> > > > >
>>>>> > > > > lkml.kernel.org/r/[email protected]
>>>>> > > > >
>>>>> > > > > applied to his tree.
>>>>> > > >
>>>>> > > > OK, I munged a bunch of patches together, please have a hard look at the
>>>>> > > > end result found in:
>>>>> > > >
>>>>> > > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
>>>>> > > >
>>>>>
>>>>> I ran this kernel on Haswell. Even with Andi's fixes the problem I identified is
>>>>> still there, so my patch is still needed.
>>>>
>>>> Right, your patch should be included in that kernel, or did I make a
>>>> royal mess of things?
>>>>
>>> No, it is as expected for the OVF PMI fix.
>>>
>>>> I put Andi's late status ack on top of your patch.
>>>>
>> Ok, I ran into a problem on Broadwell with your branch with Andi's
>> patches. I see
>
> Sorry this is with tip.git and not your branch. Will try with it too.
With your queue.tip perf/core branch, I run into another problem.
I am monitoring with 2 PEBS events and I have the NMI watchdog enabled.
I see non-EXACT PEBS records again, despite my change (which is in).
I tracked it down to the following issue after the testing of bit 62:
[31137.273061] CPU71 status=0x200000001 orig_status=0x200000001 bit62=0
The IRQ handler is called because the fixed counter for the NMI has overflowed
and it sees this in bit 33, but it also sees that one of the PEBS
events has also
overflowed, yet bit 62 is not set. Therefore both overflows are
treated as regular
and the drain_pebs() is not called generating a non-EXACT record for the PEBS
counter (counter 0). So something is wrong still and this is on Broadwell.
First, I don't understand why the OVF bit for counter 0 is set. It
should not according
to specs because the counter is in PEBS mode. There must be a race there. So we
have to handle it by relying on cpuc->pebs_enabled. I will try that.
We likely also
need to force OVF bit 62 to 1 so we can ack it in the end (and in case
it gets set).
On Wed, Mar 09, 2016 at 09:40:07AM -0800, Stephane Eranian wrote:
> With your queue.tip perf/core branch, I run into another problem.
> I am monitoring with 2 PEBS events and I have the NMI watchdog enabled.
>
> I see non-EXACT PEBS records again, despite my change (which is in).
> I tracked it down to the following issue after the testing of bit 62:
>
> [31137.273061] CPU71 status=0x200000001 orig_status=0x200000001 bit62=0
>
> The IRQ handler is called because the fixed counter for the NMI has overflowed
> and it sees this in bit 33, but it also sees that one of the PEBS
> events has also
> overflowed, yet bit 62 is not set. Therefore both overflows are
> treated as regular
> and the drain_pebs() is not called generating a non-EXACT record for the PEBS
> counter (counter 0). So something is wrong still and this is on Broadwell.
>
> First, I don't understand why the OVF bit for counter 0 is set. It
> should not according
> to specs because the counter is in PEBS mode. There must be a race there. So we
> have to handle it by relying on cpuc->pebs_enabled. I will try that.
> We likely also
> need to force OVF bit 62 to 1 so we can ack it in the end (and in case
> it gets set).
How about we make the clear of pebs_enabled unconditional?
---
arch/x86/events/intel/core.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 68fa55b4d42e..dc9579665425 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -1883,6 +1883,16 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
status &= ~(GLOBAL_STATUS_COND_CHG |
GLOBAL_STATUS_ASIF |
GLOBAL_STATUS_LBRS_FROZEN);
+ /*
+ * There are cases where, even though, the PEBS ovfl bit is set
+ * in GLOBAL_OVF_STATUS, the PEBS events may also have their
+ * overflow bits set for their counters. We must clear them
+ * here because they have been processed as exact samples in
+ * the drain_pebs() routine. They must not be processed again
+ * in the for_each_bit_set() loop for regular samples below.
+ */
+ status &= ~cpuc->pebs_enabled;
+
if (!status)
goto done;
@@ -1892,16 +1902,6 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
if (__test_and_clear_bit(62, (unsigned long *)&status)) {
handled++;
x86_pmu.drain_pebs(regs);
- /*
- * There are cases where, even though, the PEBS ovfl bit is set
- * in GLOBAL_OVF_STATUS, the PEBS events may also have their
- * overflow bits set for their counters. We must clear them
- * here because they have been processed as exact samples in
- * the drain_pebs() routine. They must not be processed again
- * in the for_each_bit_set() loop for regular samples below.
- */
- status &= ~cpuc->pebs_enabled;
- status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
}
/*
On Mon, Mar 07, 2016 at 01:18:40PM +0100, Peter Zijlstra wrote:
> On Mon, Mar 07, 2016 at 11:24:13AM +0100, Peter Zijlstra wrote:
>
> > I suspect Andi is having something along:
> >
> > lkml.kernel.org/r/[email protected]
> >
> > applied to his tree.
>
> OK, I munged a bunch of patches together, please have a hard look at the
> end result found in:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
So while running tip/master I just triggered the "irq loop stuck" warn
on my IVB, with status==1, and PMC0 didn't look weird at all.
I've not seen that for a fair while. But it does show it still happens
on not quite antique hardware.
On Thu, Mar 10, 2016 at 5:53 AM, Peter Zijlstra <[email protected]> wrote:
> On Mon, Mar 07, 2016 at 01:18:40PM +0100, Peter Zijlstra wrote:
>> On Mon, Mar 07, 2016 at 11:24:13AM +0100, Peter Zijlstra wrote:
>>
>> > I suspect Andi is having something along:
>> >
>> > lkml.kernel.org/r/[email protected]
>> >
>> > applied to his tree.
>>
>> OK, I munged a bunch of patches together, please have a hard look at the
>> end result found in:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
>
> So while running tip/master I just triggered the "irq loop stuck" warn
> on my IVB, with status==1, and PMC0 didn't look weird at all.
>
> I've not seen that for a fair while. But it does show it still happens
> on not quite antique hardware.
Yes, as I said, tip.git is missing something, with Andi's missing 3
patches I do not get
that. But with them, I was getting the problem I described earlier, so
I just posted a
patch to handle that particular case where bit 62 is not set, yet PEBS
counters show
overflow. Test it.
Thanks.
Just spotted this again, ping?
On Thu, Mar 10, 2016 at 11:42:36AM +0100, Peter Zijlstra wrote:
> On Wed, Mar 09, 2016 at 09:40:07AM -0800, Stephane Eranian wrote:
> > With your queue.tip perf/core branch, I run into another problem.
> > I am monitoring with 2 PEBS events and I have the NMI watchdog enabled.
> >
> > I see non-EXACT PEBS records again, despite my change (which is in).
> > I tracked it down to the following issue after the testing of bit 62:
> >
> > [31137.273061] CPU71 status=0x200000001 orig_status=0x200000001 bit62=0
> >
> > The IRQ handler is called because the fixed counter for the NMI has overflowed
> > and it sees this in bit 33, but it also sees that one of the PEBS
> > events has also
> > overflowed, yet bit 62 is not set. Therefore both overflows are
> > treated as regular
> > and the drain_pebs() is not called generating a non-EXACT record for the PEBS
> > counter (counter 0). So something is wrong still and this is on Broadwell.
> >
> > First, I don't understand why the OVF bit for counter 0 is set. It
> > should not according
> > to specs because the counter is in PEBS mode. There must be a race there. So we
> > have to handle it by relying on cpuc->pebs_enabled. I will try that.
> > We likely also
> > need to force OVF bit 62 to 1 so we can ack it in the end (and in case
> > it gets set).
>
> How about we make the clear of pebs_enabled unconditional?
>
> ---
> arch/x86/events/intel/core.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 68fa55b4d42e..dc9579665425 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -1883,6 +1883,16 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
> status &= ~(GLOBAL_STATUS_COND_CHG |
> GLOBAL_STATUS_ASIF |
> GLOBAL_STATUS_LBRS_FROZEN);
> + /*
> + * There are cases where, even though, the PEBS ovfl bit is set
> + * in GLOBAL_OVF_STATUS, the PEBS events may also have their
> + * overflow bits set for their counters. We must clear them
> + * here because they have been processed as exact samples in
> + * the drain_pebs() routine. They must not be processed again
> + * in the for_each_bit_set() loop for regular samples below.
> + */
> + status &= ~cpuc->pebs_enabled;
> +
> if (!status)
> goto done;
>
> @@ -1892,16 +1902,6 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
> if (__test_and_clear_bit(62, (unsigned long *)&status)) {
> handled++;
> x86_pmu.drain_pebs(regs);
> - /*
> - * There are cases where, even though, the PEBS ovfl bit is set
> - * in GLOBAL_OVF_STATUS, the PEBS events may also have their
> - * overflow bits set for their counters. We must clear them
> - * here because they have been processed as exact samples in
> - * the drain_pebs() routine. They must not be processed again
> - * in the for_each_bit_set() loop for regular samples below.
> - */
> - status &= ~cpuc->pebs_enabled;
> - status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
> }
>
> /*
On Wed, Dec 14, 2016 at 9:55 AM, Peter Zijlstra <[email protected]> wrote:
>
> Just spotted this again, ping?
>
Ok, on what processor running what command, so I can try and reproduce?
> On Thu, Mar 10, 2016 at 11:42:36AM +0100, Peter Zijlstra wrote:
>> On Wed, Mar 09, 2016 at 09:40:07AM -0800, Stephane Eranian wrote:
>> > With your queue.tip perf/core branch, I run into another problem.
>> > I am monitoring with 2 PEBS events and I have the NMI watchdog enabled.
>> >
>> > I see non-EXACT PEBS records again, despite my change (which is in).
>> > I tracked it down to the following issue after the testing of bit 62:
>> >
>> > [31137.273061] CPU71 status=0x200000001 orig_status=0x200000001 bit62=0
>> >
>> > The IRQ handler is called because the fixed counter for the NMI has overflowed
>> > and it sees this in bit 33, but it also sees that one of the PEBS
>> > events has also
>> > overflowed, yet bit 62 is not set. Therefore both overflows are
>> > treated as regular
>> > and the drain_pebs() is not called generating a non-EXACT record for the PEBS
>> > counter (counter 0). So something is wrong still and this is on Broadwell.
>> >
>> > First, I don't understand why the OVF bit for counter 0 is set. It
>> > should not according
>> > to specs because the counter is in PEBS mode. There must be a race there. So we
>> > have to handle it by relying on cpuc->pebs_enabled. I will try that.
>> > We likely also
>> > need to force OVF bit 62 to 1 so we can ack it in the end (and in case
>> > it gets set).
>>
>> How about we make the clear of pebs_enabled unconditional?
>>
>> ---
>> arch/x86/events/intel/core.c | 20 ++++++++++----------
>> 1 file changed, 10 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 68fa55b4d42e..dc9579665425 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -1883,6 +1883,16 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
>> status &= ~(GLOBAL_STATUS_COND_CHG |
>> GLOBAL_STATUS_ASIF |
>> GLOBAL_STATUS_LBRS_FROZEN);
>> + /*
>> + * There are cases where, even though, the PEBS ovfl bit is set
>> + * in GLOBAL_OVF_STATUS, the PEBS events may also have their
>> + * overflow bits set for their counters. We must clear them
>> + * here because they have been processed as exact samples in
>> + * the drain_pebs() routine. They must not be processed again
>> + * in the for_each_bit_set() loop for regular samples below.
>> + */
>> + status &= ~cpuc->pebs_enabled;
>> +
>> if (!status)
>> goto done;
>>
>> @@ -1892,16 +1902,6 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
>> if (__test_and_clear_bit(62, (unsigned long *)&status)) {
>> handled++;
>> x86_pmu.drain_pebs(regs);
>> - /*
>> - * There are cases where, even though, the PEBS ovfl bit is set
>> - * in GLOBAL_OVF_STATUS, the PEBS events may also have their
>> - * overflow bits set for their counters. We must clear them
>> - * here because they have been processed as exact samples in
>> - * the drain_pebs() routine. They must not be processed again
>> - * in the for_each_bit_set() loop for regular samples below.
>> - */
>> - status &= ~cpuc->pebs_enabled;
>> - status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
>> }
>>
>> /*
On Wed, Dec 14, 2016 at 11:26:49PM -0800, Stephane Eranian wrote:
> On Wed, Dec 14, 2016 at 9:55 AM, Peter Zijlstra <[email protected]> wrote:
> >
> > Just spotted this again, ping?
> >
> Ok, on what processor running what command, so I can try and reproduce?
for me it's snb_x (model 45) and peter's ivb-ep model 62
after several hours of fuzzer test, log below.. I'll try again with the change
jirka
---
[14404.947844] perfevents: irq loop stuck!
[14404.952560] ------------[ cut here ]------------
[14404.957720] WARNING: CPU: 0 PID: 0 at arch/x86/events/intel/core.c:2093 intel_pmu_handle_irq+0x2f8/0x4c0
[14404.968305] Modules linked in:\x01c intel_rapl\x01c sb_edac\x01c edac_core\x01c x86_pkg_temp_thermal\x01c intel_powerclamp\x01c coretemp
\x01c ipmi_devintf\x01c crct10dif_pclmul\x01c crc32_pclmul\x01c iTCO_wdt\x01c iTCO_vendor_support\x01c ghash_clmulni_intel\x01c pcspkr\x01c
ipmi_ssif\x01c tpm_tis\x01c i2c_i801\x01c tpm_tis_core\x01c ipmi_si\x01c tpm\x01c i2c_smbus\x01c ipmi_msghandler\x01c cdc_ether\x01c usbne
t\x01c mii\x01c shpchp\x01c ioatdma\x01c wmi\x01c lpc_ich\x01c xfs\x01c libcrc32c\x01c mgag200\x01c drm_kms_helper\x01c ttm\x01c drm\x01c i
gb\x01c ptp\x01c crc32c_intel\x01c pps_core\x01c dca\x01c i2c_algo_bit\x01c megaraid_sas\x01c fjes\x01c
[14405.019901] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc8+ #51
[14405.026985] Hardware name: IBM System x3650 M4 : -[7915E2G]-/00Y7683, BIOS -[VVE124AUS-1.30]- 11/21/2012
[14405.037568] ffff880277a05b08\x01c ffffffff81463243\x01c ffff880277a05b58\x01c 0000000000000000\x01c
[14405.046601] ffff880277a05b48\x01c ffffffff810b698b\x01c 0000082d81133a1d\x01c 0000000000000064\x01c
[14405.055634] ffff880277a0a380\x01c ffff880276208800\x01c 0000000000000040\x01c ffff880277a0a580\x01c
[14405.064665] Call Trace:
[14405.067394] <NMI> [<ffffffff81463243>] dump_stack+0x86/0xc3
[14405.073807] [<ffffffff810b698b>] __warn+0xcb/0xf0
[14405.079156] [<ffffffff810b6a0f>] warn_slowpath_fmt+0x5f/0x80
[14405.085569] [<ffffffff810b69b5>] ? warn_slowpath_fmt+0x5/0x80
[14405.092081] [<ffffffff8100d448>] intel_pmu_handle_irq+0x2f8/0x4c0
[14405.098971] [<ffffffff810060dc>] ? perf_event_nmi_handler+0x2c/0x50
[14405.106065] [<ffffffff8100d150>] ? intel_pmu_save_and_restart+0x50/0x50
[14405.113547] [<ffffffff810609f0>] ? nmi_raise_cpu_backtrace+0x20/0x20
[14405.120737] [<ffffffff811a2555>] ? ftrace_ops_test.isra.23+0x65/0xa0
[14405.127917] [<ffffffff8147a2ee>] ? bsearch+0x5e/0x90
[14405.133556] [<ffffffff811a0f50>] ? __add_hash_entry+0x50/0x50
[14405.140066] [<ffffffff8147a2ee>] ? bsearch+0x5e/0x90
[14405.145704] [<ffffffff811a0f50>] ? __add_hash_entry+0x50/0x50
[14405.152214] [<ffffffff810609f0>] ? nmi_raise_cpu_backtrace+0x20/0x20
[14405.159403] [<ffffffff810609f0>] ? nmi_raise_cpu_backtrace+0x20/0x20
[14405.166594] [<ffffffff81133a1d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
[14405.173979] [<ffffffff811a31be>] ? ftrace_ops_list_func+0xce/0x1d0
[14405.180974] [<ffffffff818e2957>] ? ftrace_call+0x5/0x34
[14405.186904] [<ffffffff818e2957>] ? ftrace_call+0x5/0x34
[14405.192824] [<ffffffff81129f00>] ? printk_nmi_enter+0x20/0x20
[14405.199337] [<ffffffff8100d155>] ? intel_pmu_handle_irq+0x5/0x4c0
[14405.206235] [<ffffffff810060b5>] ? perf_event_nmi_handler+0x5/0x50
[14405.213231] [<ffffffff810060dc>] perf_event_nmi_handler+0x2c/0x50
[14405.220121] [<ffffffff8103983d>] nmi_handle+0xbd/0x2e0
[14405.225954] [<ffffffff81039785>] ? nmi_handle+0x5/0x2e0
[14405.231875] [<ffffffff81039785>] ? nmi_handle+0x5/0x2e0
[14405.237804] [<ffffffff81039f13>] default_do_nmi+0x53/0x100
[14405.244025] [<ffffffff8103a0df>] do_nmi+0x11f/0x170
[14405.249557] [<ffffffff818e26f1>] end_repeat_nmi+0x1a/0x1e
[14405.255680] [<ffffffff8106fd16>] ? native_write_msr+0x6/0x30
[14405.262093] [<ffffffff8106fd16>] ? native_write_msr+0x6/0x30
[14405.268507] [<ffffffff8106fd16>] ? native_write_msr+0x6/0x30
[14405.274914] <EOE> <IRQ> [<ffffffff810117e4>] ? intel_pmu_pebs_enable_all+0x34/0x40
[14405.283656] [<ffffffff8100cc43>] __intel_pmu_enable_all.constprop.17+0x23/0xa0
[14405.291815] [<ffffffff8100ccd0>] intel_pmu_enable_all+0x10/0x20
[14405.298520] [<ffffffff81007836>] x86_pmu_enable+0x256/0x2e0
[14405.304836] [<ffffffff811e32e7>] perf_pmu_enable.part.86+0x7/0x10
[14405.311736] [<ffffffff811e73be>] perf_mux_hrtimer_handler+0x22e/0x2c0
[14405.319014] [<ffffffff811418bb>] __hrtimer_run_queues+0xfb/0x510
[14405.325808] [<ffffffff811e7190>] ? ctx_resched+0x90/0x90
[14405.331834] [<ffffffff811424bd>] hrtimer_interrupt+0x9d/0x1a0
[14405.338343] [<ffffffff8105d958>] local_apic_timer_interrupt+0x38/0x60
[14405.345629] [<ffffffff818e3b3b>] smp_trace_apic_timer_interrupt+0x5b/0x25f
[14405.353402] [<ffffffff818e2ce6>] trace_apic_timer_interrupt+0x96/0xa0
[14405.360689] <EOI> [<ffffffff8172f414>] ? cpuidle_enter_state+0x124/0x380
[14405.368354] [<ffffffff8172f410>] ? cpuidle_enter_state+0x120/0x380
[14405.375349] [<ffffffff8172f6a7>] cpuidle_enter+0x17/0x20
[14405.381375] [<ffffffff81109013>] call_cpuidle+0x23/0x40
[14405.387303] [<ffffffff811092a0>] cpu_startup_entry+0x160/0x250
[14405.393910] [<ffffffff818d0725>] rest_init+0x135/0x140
[14405.399743] [<ffffffff81facff9>] start_kernel+0x45e/0x47f
[14405.405866] [<ffffffff81fac120>] ? early_idt_handler_array+0x120/0x120
[14405.413250] [<ffffffff81fac2d6>] x86_64_start_reservations+0x2a/0x2c
[14405.420432] [<ffffffff81fac424>] x86_64_start_kernel+0x14c/0x16f
[14405.427224] ---[ end trace 62b08c15aaa2825d ]---
[14405.432378]
[14405.434043] CPU#0: ctrl: 0000000000000000
[14405.439099] CPU#0: status: 0000000000000008
[14405.444157] CPU#0: overflow: 0000000000000000
[14405.449214] CPU#0: fixed: 00000000000000b0
[14405.454271] CPU#0: pebs: 0000000000000000
[14405.459326] CPU#0: debugctl: 0000000000000000
[14405.464383] CPU#0: active: 000000020000000f
[14405.469431] CPU#0: gen-PMC0 ctrl: 0000000001d301b1
[14405.475069] CPU#0: gen-PMC0 count: 0000800090b1c37e
[14405.480706] CPU#0: gen-PMC0 left: 00007fff6fb96d3a
[14405.486344] CPU#0: gen-PMC1 ctrl: 00000000baf733b1
[14405.491981] CPU#0: gen-PMC1 count: 0000800000000009
[14405.497618] CPU#0: gen-PMC1 left: 00007ffffffffff7
[14405.503256] CPU#0: gen-PMC2 ctrl: 0000000000530020
[14405.508894] CPU#0: gen-PMC2 count: 00008000000000e8
[14405.514534] CPU#0: gen-PMC2 left: 00007fffffffff18
[14405.520172] CPU#0: gen-PMC3 ctrl: 00000000004200c0
[14405.525809] CPU#0: gen-PMC3 count: 0000fffffffffffe
[14405.531446] CPU#0: gen-PMC3 left: 0000000000000002
[14405.537085] CPU#0: fixed-PMC0 count: 000080000010c91d
[14405.542722] CPU#0: fixed-PMC1 count: 0000fffc1b31bacf
[14405.548360] CPU#0: fixed-PMC2 count: 000080000318bf99
[14405.554000] core: clearing PMU state on CPU#0
[14405.559598] core: clearing PMU state on CPU#0
On Wed, Dec 14, 2016 at 11:52 PM, Jiri Olsa <[email protected]> wrote:
> On Wed, Dec 14, 2016 at 11:26:49PM -0800, Stephane Eranian wrote:
>> On Wed, Dec 14, 2016 at 9:55 AM, Peter Zijlstra <[email protected]> wrote:
>> >
>> > Just spotted this again, ping?
>> >
>> Ok, on what processor running what command, so I can try and reproduce?
>
> for me it's snb_x (model 45) and peter's ivb-ep model 62
>
> after several hours of fuzzer test, log below.. I'll try again with the change
>
Ok, but the problem with the fuzzer is hat you have no idea whether
you were using PEBS, no-PEBS one or multiple events, so it becomes
hard to reproduce.
> jirka
>
>
> ---
> [14404.947844] perfevents: irq loop stuck!
> [14404.952560] ------------[ cut here ]------------
> [14404.957720] WARNING: CPU: 0 PID: 0 at arch/x86/events/intel/core.c:2093 intel_pmu_handle_irq+0x2f8/0x4c0
> [14404.968305] Modules linked in:\x01c intel_rapl\x01c sb_edac\x01c edac_core\x01c x86_pkg_temp_thermal\x01c intel_powerclamp\x01c coretemp
> \x01c ipmi_devintf\x01c crct10dif_pclmul\x01c crc32_pclmul\x01c iTCO_wdt\x01c iTCO_vendor_support\x01c ghash_clmulni_intel\x01c pcspkr\x01c
> ipmi_ssif\x01c tpm_tis\x01c i2c_i801\x01c tpm_tis_core\x01c ipmi_si\x01c tpm\x01c i2c_smbus\x01c ipmi_msghandler\x01c cdc_ether\x01c usbne
> t\x01c mii\x01c shpchp\x01c ioatdma\x01c wmi\x01c lpc_ich\x01c xfs\x01c libcrc32c\x01c mgag200\x01c drm_kms_helper\x01c ttm\x01c drm\x01c i
> gb\x01c ptp\x01c crc32c_intel\x01c pps_core\x01c dca\x01c i2c_algo_bit\x01c megaraid_sas\x01c fjes\x01c
> [14405.019901] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc8+ #51
> [14405.026985] Hardware name: IBM System x3650 M4 : -[7915E2G]-/00Y7683, BIOS -[VVE124AUS-1.30]- 11/21/2012
> [14405.037568] ffff880277a05b08\x01c ffffffff81463243\x01c ffff880277a05b58\x01c 0000000000000000\x01c
> [14405.046601] ffff880277a05b48\x01c ffffffff810b698b\x01c 0000082d81133a1d\x01c 0000000000000064\x01c
> [14405.055634] ffff880277a0a380\x01c ffff880276208800\x01c 0000000000000040\x01c ffff880277a0a580\x01c
> [14405.064665] Call Trace:
> [14405.067394] <NMI> [<ffffffff81463243>] dump_stack+0x86/0xc3
> [14405.073807] [<ffffffff810b698b>] __warn+0xcb/0xf0
> [14405.079156] [<ffffffff810b6a0f>] warn_slowpath_fmt+0x5f/0x80
> [14405.085569] [<ffffffff810b69b5>] ? warn_slowpath_fmt+0x5/0x80
> [14405.092081] [<ffffffff8100d448>] intel_pmu_handle_irq+0x2f8/0x4c0
> [14405.098971] [<ffffffff810060dc>] ? perf_event_nmi_handler+0x2c/0x50
> [14405.106065] [<ffffffff8100d150>] ? intel_pmu_save_and_restart+0x50/0x50
> [14405.113547] [<ffffffff810609f0>] ? nmi_raise_cpu_backtrace+0x20/0x20
> [14405.120737] [<ffffffff811a2555>] ? ftrace_ops_test.isra.23+0x65/0xa0
> [14405.127917] [<ffffffff8147a2ee>] ? bsearch+0x5e/0x90
> [14405.133556] [<ffffffff811a0f50>] ? __add_hash_entry+0x50/0x50
> [14405.140066] [<ffffffff8147a2ee>] ? bsearch+0x5e/0x90
> [14405.145704] [<ffffffff811a0f50>] ? __add_hash_entry+0x50/0x50
> [14405.152214] [<ffffffff810609f0>] ? nmi_raise_cpu_backtrace+0x20/0x20
> [14405.159403] [<ffffffff810609f0>] ? nmi_raise_cpu_backtrace+0x20/0x20
> [14405.166594] [<ffffffff81133a1d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
> [14405.173979] [<ffffffff811a31be>] ? ftrace_ops_list_func+0xce/0x1d0
> [14405.180974] [<ffffffff818e2957>] ? ftrace_call+0x5/0x34
> [14405.186904] [<ffffffff818e2957>] ? ftrace_call+0x5/0x34
> [14405.192824] [<ffffffff81129f00>] ? printk_nmi_enter+0x20/0x20
> [14405.199337] [<ffffffff8100d155>] ? intel_pmu_handle_irq+0x5/0x4c0
> [14405.206235] [<ffffffff810060b5>] ? perf_event_nmi_handler+0x5/0x50
> [14405.213231] [<ffffffff810060dc>] perf_event_nmi_handler+0x2c/0x50
> [14405.220121] [<ffffffff8103983d>] nmi_handle+0xbd/0x2e0
> [14405.225954] [<ffffffff81039785>] ? nmi_handle+0x5/0x2e0
> [14405.231875] [<ffffffff81039785>] ? nmi_handle+0x5/0x2e0
> [14405.237804] [<ffffffff81039f13>] default_do_nmi+0x53/0x100
> [14405.244025] [<ffffffff8103a0df>] do_nmi+0x11f/0x170
> [14405.249557] [<ffffffff818e26f1>] end_repeat_nmi+0x1a/0x1e
> [14405.255680] [<ffffffff8106fd16>] ? native_write_msr+0x6/0x30
> [14405.262093] [<ffffffff8106fd16>] ? native_write_msr+0x6/0x30
> [14405.268507] [<ffffffff8106fd16>] ? native_write_msr+0x6/0x30
> [14405.274914] <EOE> <IRQ> [<ffffffff810117e4>] ? intel_pmu_pebs_enable_all+0x34/0x40
> [14405.283656] [<ffffffff8100cc43>] __intel_pmu_enable_all.constprop.17+0x23/0xa0
> [14405.291815] [<ffffffff8100ccd0>] intel_pmu_enable_all+0x10/0x20
> [14405.298520] [<ffffffff81007836>] x86_pmu_enable+0x256/0x2e0
> [14405.304836] [<ffffffff811e32e7>] perf_pmu_enable.part.86+0x7/0x10
> [14405.311736] [<ffffffff811e73be>] perf_mux_hrtimer_handler+0x22e/0x2c0
> [14405.319014] [<ffffffff811418bb>] __hrtimer_run_queues+0xfb/0x510
> [14405.325808] [<ffffffff811e7190>] ? ctx_resched+0x90/0x90
> [14405.331834] [<ffffffff811424bd>] hrtimer_interrupt+0x9d/0x1a0
> [14405.338343] [<ffffffff8105d958>] local_apic_timer_interrupt+0x38/0x60
> [14405.345629] [<ffffffff818e3b3b>] smp_trace_apic_timer_interrupt+0x5b/0x25f
> [14405.353402] [<ffffffff818e2ce6>] trace_apic_timer_interrupt+0x96/0xa0
> [14405.360689] <EOI> [<ffffffff8172f414>] ? cpuidle_enter_state+0x124/0x380
> [14405.368354] [<ffffffff8172f410>] ? cpuidle_enter_state+0x120/0x380
> [14405.375349] [<ffffffff8172f6a7>] cpuidle_enter+0x17/0x20
> [14405.381375] [<ffffffff81109013>] call_cpuidle+0x23/0x40
> [14405.387303] [<ffffffff811092a0>] cpu_startup_entry+0x160/0x250
> [14405.393910] [<ffffffff818d0725>] rest_init+0x135/0x140
> [14405.399743] [<ffffffff81facff9>] start_kernel+0x45e/0x47f
> [14405.405866] [<ffffffff81fac120>] ? early_idt_handler_array+0x120/0x120
> [14405.413250] [<ffffffff81fac2d6>] x86_64_start_reservations+0x2a/0x2c
> [14405.420432] [<ffffffff81fac424>] x86_64_start_kernel+0x14c/0x16f
> [14405.427224] ---[ end trace 62b08c15aaa2825d ]---
> [14405.432378]
> [14405.434043] CPU#0: ctrl: 0000000000000000
> [14405.439099] CPU#0: status: 0000000000000008
> [14405.444157] CPU#0: overflow: 0000000000000000
> [14405.449214] CPU#0: fixed: 00000000000000b0
> [14405.454271] CPU#0: pebs: 0000000000000000
> [14405.459326] CPU#0: debugctl: 0000000000000000
> [14405.464383] CPU#0: active: 000000020000000f
> [14405.469431] CPU#0: gen-PMC0 ctrl: 0000000001d301b1
> [14405.475069] CPU#0: gen-PMC0 count: 0000800090b1c37e
> [14405.480706] CPU#0: gen-PMC0 left: 00007fff6fb96d3a
> [14405.486344] CPU#0: gen-PMC1 ctrl: 00000000baf733b1
> [14405.491981] CPU#0: gen-PMC1 count: 0000800000000009
> [14405.497618] CPU#0: gen-PMC1 left: 00007ffffffffff7
> [14405.503256] CPU#0: gen-PMC2 ctrl: 0000000000530020
> [14405.508894] CPU#0: gen-PMC2 count: 00008000000000e8
> [14405.514534] CPU#0: gen-PMC2 left: 00007fffffffff18
> [14405.520172] CPU#0: gen-PMC3 ctrl: 00000000004200c0
> [14405.525809] CPU#0: gen-PMC3 count: 0000fffffffffffe
> [14405.531446] CPU#0: gen-PMC3 left: 0000000000000002
> [14405.537085] CPU#0: fixed-PMC0 count: 000080000010c91d
> [14405.542722] CPU#0: fixed-PMC1 count: 0000fffc1b31bacf
> [14405.548360] CPU#0: fixed-PMC2 count: 000080000318bf99
> [14405.554000] core: clearing PMU state on CPU#0
> [14405.559598] core: clearing PMU state on CPU#0
On Wed, Dec 14, 2016 at 11:26:49PM -0800, Stephane Eranian wrote:
> On Wed, Dec 14, 2016 at 9:55 AM, Peter Zijlstra <[email protected]> wrote:
> >
> > Just spotted this again, ping?
> >
> Ok, on what processor running what command, so I can try and reproduce?
For me its more of a correctness issue, i've not actually spotted a
problem as such.
But every time I read this code it makes me wonder.
Supposing that the hardware sets the CTRL overflow flags but hasn't
generated the PEBS record yet (or not enough records to reach the PEBS
buffer threshold) we still don't want to process these events as if they
were !PEBS.
That is, we _never_ want to look at pebs_enabled, hence unconditional
masking of those bits.
Hardware _should_ not set them in the first place, but clearly it does
sometimes.
> >> How about we make the clear of pebs_enabled unconditional?
> >>
> >> ---
> >> arch/x86/events/intel/core.c | 20 ++++++++++----------
> >> 1 file changed, 10 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> >> index 68fa55b4d42e..dc9579665425 100644
> >> --- a/arch/x86/events/intel/core.c
> >> +++ b/arch/x86/events/intel/core.c
> >> @@ -1883,6 +1883,16 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
> >> status &= ~(GLOBAL_STATUS_COND_CHG |
> >> GLOBAL_STATUS_ASIF |
> >> GLOBAL_STATUS_LBRS_FROZEN);
> >> + /*
> >> + * There are cases where, even though, the PEBS ovfl bit is set
> >> + * in GLOBAL_OVF_STATUS, the PEBS events may also have their
> >> + * overflow bits set for their counters. We must clear them
> >> + * here because they have been processed as exact samples in
> >> + * the drain_pebs() routine. They must not be processed again
> >> + * in the for_each_bit_set() loop for regular samples below.
> >> + */
> >> + status &= ~cpuc->pebs_enabled;
> >> +
> >> if (!status)
> >> goto done;
> >>
> >> @@ -1892,16 +1902,6 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
> >> if (__test_and_clear_bit(62, (unsigned long *)&status)) {
> >> handled++;
> >> x86_pmu.drain_pebs(regs);
> >> - /*
> >> - * There are cases where, even though, the PEBS ovfl bit is set
> >> - * in GLOBAL_OVF_STATUS, the PEBS events may also have their
> >> - * overflow bits set for their counters. We must clear them
> >> - * here because they have been processed as exact samples in
> >> - * the drain_pebs() routine. They must not be processed again
> >> - * in the for_each_bit_set() loop for regular samples below.
> >> - */
> >> - status &= ~cpuc->pebs_enabled;
> >> - status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
> >> }
> >>
> >> /*
On Thu, Dec 15, 2016 at 12:42 AM, Peter Zijlstra <[email protected]> wrote:
> On Wed, Dec 14, 2016 at 11:26:49PM -0800, Stephane Eranian wrote:
>> On Wed, Dec 14, 2016 at 9:55 AM, Peter Zijlstra <[email protected]> wrote:
>> >
>> > Just spotted this again, ping?
>> >
>> Ok, on what processor running what command, so I can try and reproduce?
>
> For me its more of a correctness issue, i've not actually spotted a
> problem as such.
>
> But every time I read this code it makes me wonder.
>
> Supposing that the hardware sets the CTRL overflow flags but hasn't
> generated the PEBS record yet (or not enough records to reach the PEBS
> buffer threshold) we still don't want to process these events as if they
> were !PEBS.
>
I am suspicious about the case where you have multiple PEBS events and
they do not quite fire at the same time but close enough that you may have
PEBS in-flight by the time you enter handle_irq.
Last night I ran a simple test on SKL using tip.git:
$ perf record --e
cpu/event=0xd0,umask=0x81/upp,cpu/event=0xc0,umask=1/upp,cpu/event=0xd0,umask=0x81/upp
multichase; perf report -D | fgrep SAMPLE | grep -v 'IP, 0x4' | grep
-v events
Basically, looking for samples missing the EXACT tag, i.e., samples
processed a regular event when I only have PEBS events. Over 8h, I got
about 3 or 4 such samples. So there is still a condition where we see
the overflow as regular and not PEBS. So we need to examine that code
again looking for possible race with PEBS in flight and not having the
PEBS overflow bits yet.
> That is, we _never_ want to look at pebs_enabled, hence unconditional
> masking of those bits.
>
> Hardware _should_ not set them in the first place, but clearly it does
> sometimes.
>
>> >> How about we make the clear of pebs_enabled unconditional?
>> >>
>> >> ---
>> >> arch/x86/events/intel/core.c | 20 ++++++++++----------
>> >> 1 file changed, 10 insertions(+), 10 deletions(-)
>> >>
>> >> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> >> index 68fa55b4d42e..dc9579665425 100644
>> >> --- a/arch/x86/events/intel/core.c
>> >> +++ b/arch/x86/events/intel/core.c
>> >> @@ -1883,6 +1883,16 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
>> >> status &= ~(GLOBAL_STATUS_COND_CHG |
>> >> GLOBAL_STATUS_ASIF |
>> >> GLOBAL_STATUS_LBRS_FROZEN);
>> >> + /*
>> >> + * There are cases where, even though, the PEBS ovfl bit is set
>> >> + * in GLOBAL_OVF_STATUS, the PEBS events may also have their
>> >> + * overflow bits set for their counters. We must clear them
>> >> + * here because they have been processed as exact samples in
>> >> + * the drain_pebs() routine. They must not be processed again
>> >> + * in the for_each_bit_set() loop for regular samples below.
>> >> + */
>> >> + status &= ~cpuc->pebs_enabled;
>> >> +
>> >> if (!status)
>> >> goto done;
>> >>
>> >> @@ -1892,16 +1902,6 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
>> >> if (__test_and_clear_bit(62, (unsigned long *)&status)) {
>> >> handled++;
>> >> x86_pmu.drain_pebs(regs);
>> >> - /*
>> >> - * There are cases where, even though, the PEBS ovfl bit is set
>> >> - * in GLOBAL_OVF_STATUS, the PEBS events may also have their
>> >> - * overflow bits set for their counters. We must clear them
>> >> - * here because they have been processed as exact samples in
>> >> - * the drain_pebs() routine. They must not be processed again
>> >> - * in the for_each_bit_set() loop for regular samples below.
>> >> - */
>> >> - status &= ~cpuc->pebs_enabled;
>> >> - status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
>> >> }
>> >>
>> >> /*
On Thu, Dec 15, 2016 at 08:59:56AM -0800, Stephane Eranian wrote:
> On Thu, Dec 15, 2016 at 12:42 AM, Peter Zijlstra <[email protected]> wrote:
> > On Wed, Dec 14, 2016 at 11:26:49PM -0800, Stephane Eranian wrote:
> >> On Wed, Dec 14, 2016 at 9:55 AM, Peter Zijlstra <[email protected]> wrote:
> >> >
> >> > Just spotted this again, ping?
> >> >
> >> Ok, on what processor running what command, so I can try and reproduce?
> >
> > For me its more of a correctness issue, i've not actually spotted a
> > problem as such.
> >
> > But every time I read this code it makes me wonder.
> >
> > Supposing that the hardware sets the CTRL overflow flags but hasn't
> > generated the PEBS record yet (or not enough records to reach the PEBS
> > buffer threshold) we still don't want to process these events as if they
> > were !PEBS.
> >
> I am suspicious about the case where you have multiple PEBS events and
> they do not quite fire at the same time but close enough that you may have
> PEBS in-flight by the time you enter handle_irq.
>
> Last night I ran a simple test on SKL using tip.git:
>
> $ perf record --e
> cpu/event=0xd0,umask=0x81/upp,cpu/event=0xc0,umask=1/upp,cpu/event=0xd0,umask=0x81/upp
> multichase; perf report -D | fgrep SAMPLE | grep -v 'IP, 0x4' | grep
> -v events
>
> Basically, looking for samples missing the EXACT tag, i.e., samples
> processed a regular event when I only have PEBS events. Over 8h, I got
> about 3 or 4 such samples. So there is still a condition where we see
> the overflow as regular and not PEBS. So we need to examine that code
> again looking for possible race with PEBS in flight and not having the
> PEBS overflow bits yet.
Isn't that exactly the case I was talking about? and would be avoided by
the proposed patch?
So semantically the counter overflows and then arms PEBS to record a
record on the next event once its armed (and this can be multiple events
after the overflow, since arming takes a while too).
Now, if the chip manages to raise the regular overflow bit during that
time, you get exactly what is described.
meaning we should unconditionally clear the pebs_enabled.
On Thu, Dec 15, 2016 at 9:10 AM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Dec 15, 2016 at 08:59:56AM -0800, Stephane Eranian wrote:
>> On Thu, Dec 15, 2016 at 12:42 AM, Peter Zijlstra <[email protected]> wrote:
>> > On Wed, Dec 14, 2016 at 11:26:49PM -0800, Stephane Eranian wrote:
>> >> On Wed, Dec 14, 2016 at 9:55 AM, Peter Zijlstra <[email protected]> wrote:
>> >> >
>> >> > Just spotted this again, ping?
>> >> >
>> >> Ok, on what processor running what command, so I can try and reproduce?
>> >
>> > For me its more of a correctness issue, i've not actually spotted a
>> > problem as such.
>> >
>> > But every time I read this code it makes me wonder.
>> >
>> > Supposing that the hardware sets the CTRL overflow flags but hasn't
>> > generated the PEBS record yet (or not enough records to reach the PEBS
>> > buffer threshold) we still don't want to process these events as if they
>> > were !PEBS.
>> >
>> I am suspicious about the case where you have multiple PEBS events and
>> they do not quite fire at the same time but close enough that you may have
>> PEBS in-flight by the time you enter handle_irq.
>>
>> Last night I ran a simple test on SKL using tip.git:
>>
>> $ perf record --e
>> cpu/event=0xd0,umask=0x81/upp,cpu/event=0xc0,umask=1/upp,cpu/event=0xd0,umask=0x81/upp
>> multichase; perf report -D | fgrep SAMPLE | grep -v 'IP, 0x4' | grep
>> -v events
>>
>> Basically, looking for samples missing the EXACT tag, i.e., samples
>> processed a regular event when I only have PEBS events. Over 8h, I got
>> about 3 or 4 such samples. So there is still a condition where we see
>> the overflow as regular and not PEBS. So we need to examine that code
>> again looking for possible race with PEBS in flight and not having the
>> PEBS overflow bits yet.
>
> Isn't that exactly the case I was talking about? and would be avoided by
> the proposed patch?
>
>
> So semantically the counter overflows and then arms PEBS to record a
> record on the next event once its armed (and this can be multiple events
> after the overflow, since arming takes a while too).
>
>
> Now, if the chip manages to raise the regular overflow bit during that
> time, you get exactly what is described.
>
> meaning we should unconditionally clear the pebs_enabled.
If we unconditionally clear the pebs_enabled counter from status, then
we guarantee
these counters will never be processed as regular overflowed counters.
I am okay with this.
I am testing it right now.
But the part I still don't get is why is the status bitmask showing
some pebs counters set when
the counter are explicitly program with their PMI bit cleared. I need
to check whether somehow
the PMI bit gets set.
Hi,
Some more updates on this problem. I believe I understand how this can
happen based on my observations and what the SDM describes.
The problem appears only if you are sampling on multiple PEBS events.
The SDM says :
"When a performance counter is configured for PEBS, overflow condition
in the counter generates a performance- monitoring interrupt signaling
a PEBS event. On a PEBS event, the processor stores data records into
the buffer area (see Section 18.15.5), clears the counter overflow
status., and sets the “OvfBuffer” bit in IA32_PERF_GLOBAL_STATUS."
PEBS is a two-step mechanism. When the PEBS-enabled counter overflows,
PEBS arms and the next qualifying instruction that retires is
captured. On the overflow the GLOBAL_STATUS bit is set. It remains set
until the PEBS record is finalized.
But there can be a long delay between the overflow and when a
qualifying instruction is actually captured. This is especially true
in cases where you run PEBS on rare events or when you only measure in
kernel or user mode and you transition in the other priv level right
after the overflow. Therefore there is a race condition in there and
this is what we are observing.
When sampling on multiple PEBS events, one could generate the PMI when
its PEBS record is finalized, but by the time we reach the
handle_irq() handler, another PEBS event has overflowed but its PEBS
record is not yet finalized and therefore we observe its bit set in
GLOBAL_STATUS.
It is not clear to me why PEBS does not clear the overflow bit
immediately after the overflow at the beginning of the PEBS arming
code.
Given these observations, I believe that unconditionally clearing the
PEBS_ENABLED bits from GLOBAL_STATUS is the solution. Overflows on the
PEBS_ENABLED counters should always be processed via bit 62 and not
via GLOBAL_STATUS.
I have tried that last night and the problem goes away when measuring
multiple PEBS events simultaneously.
On Fri, Dec 16, 2016 at 12:38 AM, Stephane Eranian <[email protected]> wrote:
> On Thu, Dec 15, 2016 at 9:10 AM, Peter Zijlstra <[email protected]> wrote:
>> On Thu, Dec 15, 2016 at 08:59:56AM -0800, Stephane Eranian wrote:
>>> On Thu, Dec 15, 2016 at 12:42 AM, Peter Zijlstra <[email protected]> wrote:
>>> > On Wed, Dec 14, 2016 at 11:26:49PM -0800, Stephane Eranian wrote:
>>> >> On Wed, Dec 14, 2016 at 9:55 AM, Peter Zijlstra <[email protected]> wrote:
>>> >> >
>>> >> > Just spotted this again, ping?
>>> >> >
>>> >> Ok, on what processor running what command, so I can try and reproduce?
>>> >
>>> > For me its more of a correctness issue, i've not actually spotted a
>>> > problem as such.
>>> >
>>> > But every time I read this code it makes me wonder.
>>> >
>>> > Supposing that the hardware sets the CTRL overflow flags but hasn't
>>> > generated the PEBS record yet (or not enough records to reach the PEBS
>>> > buffer threshold) we still don't want to process these events as if they
>>> > were !PEBS.
>>> >
>>> I am suspicious about the case where you have multiple PEBS events and
>>> they do not quite fire at the same time but close enough that you may have
>>> PEBS in-flight by the time you enter handle_irq.
>>>
>>> Last night I ran a simple test on SKL using tip.git:
>>>
>>> $ perf record --e
>>> cpu/event=0xd0,umask=0x81/upp,cpu/event=0xc0,umask=1/upp,cpu/event=0xd0,umask=0x81/upp
>>> multichase; perf report -D | fgrep SAMPLE | grep -v 'IP, 0x4' | grep
>>> -v events
>>>
>>> Basically, looking for samples missing the EXACT tag, i.e., samples
>>> processed a regular event when I only have PEBS events. Over 8h, I got
>>> about 3 or 4 such samples. So there is still a condition where we see
>>> the overflow as regular and not PEBS. So we need to examine that code
>>> again looking for possible race with PEBS in flight and not having the
>>> PEBS overflow bits yet.
>>
>> Isn't that exactly the case I was talking about? and would be avoided by
>> the proposed patch?
>>
>>
>> So semantically the counter overflows and then arms PEBS to record a
>> record on the next event once its armed (and this can be multiple events
>> after the overflow, since arming takes a while too).
>>
>>
>> Now, if the chip manages to raise the regular overflow bit during that
>> time, you get exactly what is described.
>>
>> meaning we should unconditionally clear the pebs_enabled.
>
> If we unconditionally clear the pebs_enabled counter from status, then
> we guarantee
> these counters will never be processed as regular overflowed counters.
> I am okay with this.
> I am testing it right now.
>
> But the part I still don't get is why is the status bitmask showing
> some pebs counters set when
> the counter are explicitly program with their PMI bit cleared. I need
> to check whether somehow
> the PMI bit gets set.