2013-05-03 12:20:26

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 1/3] perf, x86: Blacklist all MEM_*_RETIRED events for IVB

Errata BV98 states that all MEM_*_RETIRED events corrupt the counter
value of the SMT sibling's counters. Blacklist these events

Cc: [email protected]
Reported-by: Andi Kleen <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
arch/x86/kernel/cpu/perf_event_intel.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -128,10 +128,15 @@ static struct event_constraint intel_ivb
INTEL_UEVENT_CONSTRAINT(0x08a3, 0x4), /* CYCLE_ACTIVITY.CYCLES_L1D_PENDING */
INTEL_UEVENT_CONSTRAINT(0x0ca3, 0x4), /* CYCLE_ACTIVITY.STALLS_L1D_PENDING */
INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */
- INTEL_EVENT_CONSTRAINT(0xd0, 0xf), /* MEM_UOPS_RETIRED.* */
- INTEL_EVENT_CONSTRAINT(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
- INTEL_EVENT_CONSTRAINT(0xd2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
- INTEL_EVENT_CONSTRAINT(0xd3, 0xf), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
+ /*
+ * Errata BV98 -- MEM_*_RETIRED events can leak between counters of SMT
+ * siblings; disable these events because they can corrupt unrelated
+ * counters.
+ */
+ INTEL_EVENT_CONSTRAINT(0xd0, 0x0), /* MEM_UOPS_RETIRED.* */
+ INTEL_EVENT_CONSTRAINT(0xd1, 0x0), /* MEM_LOAD_UOPS_RETIRED.* */
+ INTEL_EVENT_CONSTRAINT(0xd2, 0x0), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
+ INTEL_EVENT_CONSTRAINT(0xd3, 0x0), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
EVENT_CONSTRAINT_END
};



2013-05-03 14:40:22

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf, x86: Blacklist all MEM_*_RETIRED events for IVB

On Fri, May 03, 2013 at 02:11:23PM +0200, Peter Zijlstra wrote:
> Errata BV98 states that all MEM_*_RETIRED events corrupt the counter
> value of the SMT sibling's counters. Blacklist these events

I disagree with this patch. This is just overkill and not needed
at all.

-Andi

2013-05-03 17:01:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf, x86: Blacklist all MEM_*_RETIRED events for IVB

On Fri, May 03, 2013 at 07:35:07AM -0700, Andi Kleen wrote:
> On Fri, May 03, 2013 at 02:11:23PM +0200, Peter Zijlstra wrote:
> > Errata BV98 states that all MEM_*_RETIRED events corrupt the counter
> > value of the SMT sibling's counters. Blacklist these events
>
> I disagree with this patch. This is just overkill and not needed
> at all.

Then give me a patch that both completely describes the problem and ensures
isolation so that no counters get corrupted (and isn't too invasive).

So far you've failed on both counts.

Subject: [tip:perf/urgent] perf/x86: Blacklist all MEM_*_RETIRED events for Ivy Bridge

Commit-ID: 741a698f420c34c458294a6accecfbad702a7c52
Gitweb: http://git.kernel.org/tip/741a698f420c34c458294a6accecfbad702a7c52
Author: Peter Zijlstra <[email protected]>
AuthorDate: Fri, 3 May 2013 14:11:23 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 4 May 2013 08:37:46 +0200

perf/x86: Blacklist all MEM_*_RETIRED events for Ivy Bridge

Errata BV98 states that all MEM_*_RETIRED events corrupt the
counter value of the SMT sibling's counters. Blacklist these
events

Reported-by: Andi Kleen <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
arch/x86/kernel/cpu/perf_event_intel.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index cc45deb..4a0a462 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -125,10 +125,15 @@ static struct event_constraint intel_ivb_event_constraints[] __read_mostly =
INTEL_UEVENT_CONSTRAINT(0x08a3, 0x4), /* CYCLE_ACTIVITY.CYCLES_L1D_PENDING */
INTEL_UEVENT_CONSTRAINT(0x0ca3, 0x4), /* CYCLE_ACTIVITY.STALLS_L1D_PENDING */
INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */
- INTEL_EVENT_CONSTRAINT(0xd0, 0xf), /* MEM_UOPS_RETIRED.* */
- INTEL_EVENT_CONSTRAINT(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
- INTEL_EVENT_CONSTRAINT(0xd2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
- INTEL_EVENT_CONSTRAINT(0xd3, 0xf), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
+ /*
+ * Errata BV98 -- MEM_*_RETIRED events can leak between counters of SMT
+ * siblings; disable these events because they can corrupt unrelated
+ * counters.
+ */
+ INTEL_EVENT_CONSTRAINT(0xd0, 0x0), /* MEM_UOPS_RETIRED.* */
+ INTEL_EVENT_CONSTRAINT(0xd1, 0x0), /* MEM_LOAD_UOPS_RETIRED.* */
+ INTEL_EVENT_CONSTRAINT(0xd2, 0x0), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
+ INTEL_EVENT_CONSTRAINT(0xd3, 0x0), /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */
EVENT_CONSTRAINT_END
};

2013-05-15 14:20:50

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf, x86: Blacklist all MEM_*_RETIRED events for IVB

Hi,

Based on our testing, it appears the corruption occurs only
when the MEM_* events are used and only on the sibling
counter. In other words, if HT0 has MEM_* in cntr0, then
HT1 cntr0 cannot be used, otherwise whatever is there may
get corrupted. So I think we could enhance Andi's initial patch
to handle this case instead of blacklist those events. They are
very important events.


On Fri, May 3, 2013 at 7:00 PM, Peter Zijlstra <[email protected]> wrote:
> On Fri, May 03, 2013 at 07:35:07AM -0700, Andi Kleen wrote:
>> On Fri, May 03, 2013 at 02:11:23PM +0200, Peter Zijlstra wrote:
>> > Errata BV98 states that all MEM_*_RETIRED events corrupt the counter
>> > value of the SMT sibling's counters. Blacklist these events
>>
>> I disagree with this patch. This is just overkill and not needed
>> at all.
>
> Then give me a patch that both completely describes the problem and ensures
> isolation so that no counters get corrupted (and isn't too invasive).
>
> So far you've failed on both counts.
>
>

2013-05-15 16:51:58

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf, x86: Blacklist all MEM_*_RETIRED events for IVB

On Wed, May 15, 2013 at 04:20:46PM +0200, Stephane Eranian wrote:
> Hi,
>
> Based on our testing, it appears the corruption occurs only
> when the MEM_* events are used and only on the sibling
> counter. In other words, if HT0 has MEM_* in cntr0, then
> HT1 cntr0 cannot be used, otherwise whatever is there may
> get corrupted.

Ah, great. That is the least horrid case :-)

> So I think we could enhance Andi's initial patch
> to handle this case instead of blacklist those events. They are
> very important events.

Will you take a stab at it? I suppose you'll have to make each counter
have a shared resource and have the mem_*_retired events mark the
resource taken.

Also, did you test what happens when you turn SMT off in the BIOS so you
get double the amount of counters; do we then leak into CNTn+4 or is all
well again?

2013-05-16 15:42:24

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf, x86: Blacklist all MEM_*_RETIRED events for IVB

On Wed, May 15, 2013 at 6:51 PM, Peter Zijlstra <[email protected]> wrote:
> On Wed, May 15, 2013 at 04:20:46PM +0200, Stephane Eranian wrote:
>> Hi,
>>
>> Based on our testing, it appears the corruption occurs only
>> when the MEM_* events are used and only on the sibling
>> counter. In other words, if HT0 has MEM_* in cntr0, then
>> HT1 cntr0 cannot be used, otherwise whatever is there may
>> get corrupted.
>
> Ah, great. That is the least horrid case :-)
>
Yeah.

>> So I think we could enhance Andi's initial patch
>> to handle this case instead of blacklist those events. They are
>> very important events.
>
> Will you take a stab at it? I suppose you'll have to make each counter
> have a shared resource and have the mem_*_retired events mark the
> resource taken.
>
Yes, I will try to come up with a patch to force multiplexing when those
events are used.

The difficulty is that we cannot use the shared_regs infrastructure because
the constraint is quite different. Here we need to say that if a mem event
uses cnt0 on one thread, then nothing can be using cnt0 on the other thread.
The current shared_regs infrastructure does not work on the counter level.
So we need to invent something else. It cannot even be extended. We need
to schedule the mem event, then look at the assignment and exclude all
the counters used by the mem event from the other thread. Alternatively,
we can only schedule mem event on counters which are not currently used
by the sibling thread. We need to peek at the used counters cross HT threads.
Not so easy to do.


> Also, did you test what happens when you turn SMT off in the BIOS so you
> get double the amount of counters; do we then leak into CNTn+4 or is all
> well again?
>
I have not tried that yet. Hopefully, it won't exhibit the problem.

2013-05-16 16:07:28

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf, x86: Blacklist all MEM_*_RETIRED events for IVB

> The difficulty is that we cannot use the shared_regs infrastructure because
> the constraint is quite different. Here we need to say that if a mem event
> uses cnt0 on one thread, then nothing can be using cnt0 on the other thread.
> The current shared_regs infrastructure does not work on the counter level.
> So we need to invent something else. It cannot even be extended.

Just need a extra reg per counter, that is only triggered with that event.

-Andi

2013-05-16 16:26:30

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/3] perf, x86: Blacklist all MEM_*_RETIRED events for IVB

On Thu, May 16, 2013 at 6:07 PM, Andi Kleen <[email protected]> wrote:
>> The difficulty is that we cannot use the shared_regs infrastructure because
>> the constraint is quite different. Here we need to say that if a mem event
>> uses cnt0 on one thread, then nothing can be using cnt0 on the other thread.
>> The current shared_regs infrastructure does not work on the counter level.
>> So we need to invent something else. It cannot even be extended.
>
> Just need a extra reg per counter, that is only triggered with that event.
>
Yes, it would be per counter and not per-event. But the check has to happen
after the assignment and each time we release the counter.

> -Andi