LinuxLists.cc - perf: odd event scheduling issue

2015-05-20 03:01:45

Subject: perf: odd event scheduling issue

Hello

so I'm running 4.1-rc4 on my Haswell machine and a lot of my perf_event
testsuite is failing. There seems to be weird event scheduling issues.

I can reproduce it with plain perf; simple event groups that should work
fine can't be scheduled.

vince@haswell:~$ perf stat -e \{cycles,instructions\} sleep 1

Performance counter stats for 'sleep 1':

<not counted> cycles
<not counted> instructions

1.003414661 seconds time elapsed

I just wanted to check to see if this was a known problem before I started
bisecting. I was also possibly seeing this with 4.1-rc2.

Vince

2015-05-20 07:01:44

by Peter Zijlstra

[permalink] [raw]

Subject: Re: perf: odd event scheduling issue

On Tue, May 19, 2015 at 11:07:09PM -0400, Vince Weaver wrote:
> Hello
>
> so I'm running 4.1-rc4 on my Haswell machine and a lot of my perf_event
> testsuite is failing. There seems to be weird event scheduling issues.
>
> I can reproduce it with plain perf; simple event groups that should work
> fine can't be scheduled.
>
> vince@haswell:~$ perf stat -e \{cycles,instructions\} sleep 1
>

Hmm indeed, my ivb-ep does the same thing. Lemme go poke at that.

2015-05-20 07:11:49

by Jiri Olsa

[permalink] [raw]

Subject: Re: perf: odd event scheduling issue

2015-05-20 09:28:52

by Peter Zijlstra

[permalink] [raw]

Subject: Re: perf: odd event scheduling issue

On Wed, May 20, 2015 at 09:00:57AM +0200, Peter Zijlstra wrote:
> On Tue, May 19, 2015 at 11:07:09PM -0400, Vince Weaver wrote:
> > Hello
> >
> > so I'm running 4.1-rc4 on my Haswell machine and a lot of my perf_event
> > testsuite is failing. There seems to be weird event scheduling issues.
> >
> > I can reproduce it with plain perf; simple event groups that should work
> > fine can't be scheduled.
> >
> > vince@haswell:~$ perf stat -e \{cycles,instructions\} sleep 1
> >
>
> Hmm indeed, my ivb-ep does the same thing. Lemme go poke at that.

I've gotta go pick the kid up from school, but the bisect just finished
and threw up the below commit.

I'll go stare at it in an hour or so.

---

c02cdbf60b51b8d98a49185535f5d527a2965142 is the first bad commit
commit c02cdbf60b51b8d98a49185535f5d527a2965142
Author: Stephane Eranian <[email protected]>
Date: Mon Nov 17 20:07:02 2014 +0100

perf/x86/intel: Limit to half counters when the HT workaround is enabled, to avoid exclusive mode starvation

This patch limits the number of counters available to each CPU when
the HT bug workaround is enabled.

This is necessary to avoid situation of counter starvation. Such can
arise from configuration where one HT thread, HT0, is using all 4 counters
with corrupting events which require exclusion the the sibling HT, HT1.

In such case, HT1 would not be able to schedule any event until HT0
is done. To mitigate this problem, this patch artificially limits
the number of counters to 2.

That way, we can gurantee that at least 2 counters are not in exclusive
mode and therefore allow the sibling thread to schedule events of the
same type (system vs. per-thread). The 2 counters are not determined
in advance. We simply set the limit to two events per HT.

This helps mitigate starvation in case of events with specific counter
constraints such a PREC_DIST.

Note that this does not elimintate the starvation is all cases. But
it is better than not having it.

(Solution suggested by Peter Zjilstra.)

Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

:040000 040000 d78b3927f7fb17a34079012f7184aeef6120e411 84a0237880834d5b58b357fb58ff816124718545 M arch

2015-05-20 13:52:18

by Peter Zijlstra

[permalink] [raw]

Subject: Re: perf: odd event scheduling issue

On Wed, May 20, 2015 at 11:28:42AM +0200, Peter Zijlstra wrote:

> ---
>
> c02cdbf60b51b8d98a49185535f5d527a2965142 is the first bad commit
> commit c02cdbf60b51b8d98a49185535f5d527a2965142
> Author: Stephane Eranian <[email protected]>
> Date: Mon Nov 17 20:07:02 2014 +0100
>
> perf/x86/intel: Limit to half counters when the HT workaround is enabled, to avoid exclusive mode starvation
>
> This patch limits the number of counters available to each CPU when
> the HT bug workaround is enabled.
>
> This is necessary to avoid situation of counter starvation. Such can
> arise from configuration where one HT thread, HT0, is using all 4 counters
> with corrupting events which require exclusion the the sibling HT, HT1.
>
> In such case, HT1 would not be able to schedule any event until HT0
> is done. To mitigate this problem, this patch artificially limits
> the number of counters to 2.
>
> That way, we can gurantee that at least 2 counters are not in exclusive
> mode and therefore allow the sibling thread to schedule events of the
> same type (system vs. per-thread). The 2 counters are not determined
> in advance. We simply set the limit to two events per HT.
>
> This helps mitigate starvation in case of events with specific counter
> constraints such a PREC_DIST.
>
> Note that this does not elimintate the starvation is all cases. But
> it is better than not having it.
>
> (Solution suggested by Peter Zjilstra.)
>
> Signed-off-by: Stephane Eranian <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>

OK, so if you have the watchdog enabled, that's 1 event, and having a
max of 2 GP events, adding another 2 events is fail.

Jiri, did you SNB have the watchdog disabled?

2015-05-20 15:25:43

by Peter Zijlstra

[permalink] [raw]

Subject: Re: perf: odd event scheduling issue

On Wed, May 20, 2015 at 03:52:09PM +0200, Peter Zijlstra wrote:
> >
> > This patch limits the number of counters available to each CPU when
> > the HT bug workaround is enabled.
> >
> > This is necessary to avoid situation of counter starvation. Such can
> > arise from configuration where one HT thread, HT0, is using all 4 counters
> > with corrupting events which require exclusion the the sibling HT, HT1.
> >

>
> OK, so if you have the watchdog enabled, that's 1 event, and having a
> max of 2 GP events, adding another 2 events is fail.

Hmm, so we count all events, including those scheduled on fixed purpose
counters.

Lemme see if I can cure that.

2015-05-20 16:08:08

by Peter Zijlstra

[permalink] [raw]

Subject: Re: perf: odd event scheduling issue

On Wed, May 20, 2015 at 05:25:34PM +0200, Peter Zijlstra wrote:

> >
> > OK, so if you have the watchdog enabled, that's 1 event, and having a
> > max of 2 GP events, adding another 2 events is fail.
>
> Hmm, so we count all events, including those scheduled on fixed purpose
> counters.
>
> Lemme see if I can cure that.

Stephane, does something like the below make sense?

---
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 3998131..7e779e9 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -2008,27 +2039,15 @@ intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
xl = &excl_cntrs->states[tid];
xlo = &excl_cntrs->states[o_tid];

- /*
- * do not allow scheduling of more than max_alloc_cntrs
- * which is set to half the available generic counters.
- * this helps avoid counter starvation of sibling thread
- * by ensuring at most half the counters cannot be in
- * exclusive mode. There is not designated counters for the
- * limits. Any N/2 counters can be used. This helps with
- * events with specifix counter constraints
- */
- if (xl->num_alloc_cntrs++ == xl->max_alloc_cntrs)
- return &emptyconstraint;
-
cx = c;

/*
- * because we modify the constraint, we need
+ * Because we modify the constraint, we need
* to make a copy. Static constraints come
* from static const tables.
*
- * only needed when constraint has not yet
- * been cloned (marked dynamic)
+ * Only needed when constraint has not yet
+ * been cloned (marked dynamic).
*/
if (!(c->flags & PERF_X86_EVENT_DYNAMIC)) {

@@ -2062,6 +2081,22 @@ intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
*/

/*
+ * Do not allow scheduling of more than max_alloc_cntrs
+ * which is set to half the available generic counters.
+ *
+ * This helps avoid counter starvation of sibling thread
+ * by ensuring at most half the counters cannot be in
+ * exclusive mode. There is not designated counters for the
+ * limits. Any N/2 counters can be used. This helps with
+ * events with specifix counter constraints
+ */
+ if (xl->num_alloc_cntrs++ >= xl->max_alloc_cntrs) {
+ /* wipe the GP counters */
+ cx->idxmsk64 &= ~((1ULL << INTEL_PMC_IDX_FIXED) - 1);
+ goto done;
+ }
+
+ /*
* Modify static constraint with current dynamic
* state of thread
*
@@ -2086,6 +2121,7 @@ intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
__clear_bit(i, cx->idxmsk);
}

+done:
/*
* recompute actual bit weight for scheduling algorithm
*/

2015-05-20 16:25:18

by Jiri Olsa

[permalink] [raw]

Subject: Re: perf: odd event scheduling issue

On Wed, May 20, 2015 at 03:52:09PM +0200, Peter Zijlstra wrote:
> On Wed, May 20, 2015 at 11:28:42AM +0200, Peter Zijlstra wrote:
>
> > ---
> >
> > c02cdbf60b51b8d98a49185535f5d527a2965142 is the first bad commit
> > commit c02cdbf60b51b8d98a49185535f5d527a2965142
> > Author: Stephane Eranian <[email protected]>
> > Date: Mon Nov 17 20:07:02 2014 +0100
> >
> > perf/x86/intel: Limit to half counters when the HT workaround is enabled, to avoid exclusive mode starvation
> >
> > This patch limits the number of counters available to each CPU when
> > the HT bug workaround is enabled.
> >
> > This is necessary to avoid situation of counter starvation. Such can
> > arise from configuration where one HT thread, HT0, is using all 4 counters
> > with corrupting events which require exclusion the the sibling HT, HT1.
> >
> > In such case, HT1 would not be able to schedule any event until HT0
> > is done. To mitigate this problem, this patch artificially limits
> > the number of counters to 2.
> >
> > That way, we can gurantee that at least 2 counters are not in exclusive
> > mode and therefore allow the sibling thread to schedule events of the
> > same type (system vs. per-thread). The 2 counters are not determined
> > in advance. We simply set the limit to two events per HT.
> >
> > This helps mitigate starvation in case of events with specific counter
> > constraints such a PREC_DIST.
> >
> > Note that this does not elimintate the starvation is all cases. But
> > it is better than not having it.
> >
> > (Solution suggested by Peter Zjilstra.)
> >
> > Signed-off-by: Stephane Eranian <[email protected]>
> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Link: http://lkml.kernel.org/r/[email protected]
> > Signed-off-by: Ingo Molnar <[email protected]>
>
> OK, so if you have the watchdog enabled, that's 1 event, and having a
> max of 2 GP events, adding another 2 events is fail.
>
> Jiri, did you SNB have the watchdog disabled?

watchdog was enabled

jirka

2015-05-20 16:28:20

by Peter Zijlstra

[permalink] [raw]

Subject: Re: perf: odd event scheduling issue

On Wed, May 20, 2015 at 06:08:02PM +0200, Peter Zijlstra wrote:
> @@ -2062,6 +2081,22 @@ intel_get_excl_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
> */
>
> /*
> + * Do not allow scheduling of more than max_alloc_cntrs
> + * which is set to half the available generic counters.
> + *
> + * This helps avoid counter starvation of sibling thread
> + * by ensuring at most half the counters cannot be in
> + * exclusive mode. There is not designated counters for the
> + * limits. Any N/2 counters can be used. This helps with
> + * events with specifix counter constraints
> + */
> + if (xl->num_alloc_cntrs++ >= xl->max_alloc_cntrs) {
> + /* wipe the GP counters */
> + cx->idxmsk64 &= ~((1ULL << INTEL_PMC_IDX_FIXED) - 1);
> + goto done;
> + }
> +
> + /*
> * Modify static constraint with current dynamic
> * state of thread
> *

While this improves things, its still sub optimal because we should only
increase num_alloc_cntrs when we actually allocate a GP register, but we
do that at commit time and that callback is too late to back out / retry.

So ideally we'd move the callback into scheduling code, but that means
we also have to move the xlo array into the sched_state etc.

[ which brings me to the whole xl vs xlo thing, I think we done that the
wrong way around. It would be more natural to account to xl and create
constraints based on xlo. ]

Secondly, we should only enforce this limit if and when there are
exclusive events on the system I suppose.

I have some ideas on how to go do this, but I need a break..

2015-05-20 23:10:48

by Andi Kleen

[permalink] [raw]

Subject: Re: perf: odd event scheduling issue

Peter Zijlstra <[email protected]> writes:
>
> Secondly, we should only enforce this limit if and when there are
> exclusive events on the system I suppose.

That's the key part really and absolutely needed.

-Andi

--
[email protected] -- Speaking for myself only

2015-05-21 12:04:10

by Stephane Eranian

[permalink] [raw]

Subject: Re: perf: odd event scheduling issue

On Wed, May 20, 2015 at 4:10 PM, Andi Kleen <[email protected]> wrote:
> Peter Zijlstra <[email protected]> writes:
>>
>> Secondly, we should only enforce this limit if and when there are
>> exclusive events on the system I suppose.
>
> That's the key part really and absolutely needed.
>
Are you talking about the N/2 counters?
If so, they, yes I agree.

> -Andi
>
> --
> [email protected] -- Speaking for myself only

2015-05-21 12:05:31

by Stephane Eranian

[permalink] [raw]

Subject: Re: perf: odd event scheduling issue

On Wed, May 20, 2015 at 9:25 AM, Jiri Olsa <[email protected]> wrote:
> On Wed, May 20, 2015 at 03:52:09PM +0200, Peter Zijlstra wrote:
>> On Wed, May 20, 2015 at 11:28:42AM +0200, Peter Zijlstra wrote:
>>
>> > ---
>> >
>> > c02cdbf60b51b8d98a49185535f5d527a2965142 is the first bad commit
>> > commit c02cdbf60b51b8d98a49185535f5d527a2965142
>> > Author: Stephane Eranian <[email protected]>
>> > Date: Mon Nov 17 20:07:02 2014 +0100
>> >
>> > perf/x86/intel: Limit to half counters when the HT workaround is enabled, to avoid exclusive mode starvation
>> >
>> > This patch limits the number of counters available to each CPU when
>> > the HT bug workaround is enabled.
>> >
>> > This is necessary to avoid situation of counter starvation. Such can
>> > arise from configuration where one HT thread, HT0, is using all 4 counters
>> > with corrupting events which require exclusion the the sibling HT, HT1.
>> >
>> > In such case, HT1 would not be able to schedule any event until HT0
>> > is done. To mitigate this problem, this patch artificially limits
>> > the number of counters to 2.
>> >
>> > That way, we can gurantee that at least 2 counters are not in exclusive
>> > mode and therefore allow the sibling thread to schedule events of the
>> > same type (system vs. per-thread). The 2 counters are not determined
>> > in advance. We simply set the limit to two events per HT.
>> >
>> > This helps mitigate starvation in case of events with specific counter
>> > constraints such a PREC_DIST.
>> >
>> > Note that this does not elimintate the starvation is all cases. But
>> > it is better than not having it.
>> >
>> > (Solution suggested by Peter Zjilstra.)
>> >
>> > Signed-off-by: Stephane Eranian <[email protected]>
>> > Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
>> > Cc: [email protected]
>> > Cc: [email protected]
>> > Cc: [email protected]
>> > Cc: [email protected]
>> > Link: http://lkml.kernel.org/r/[email protected]
>> > Signed-off-by: Ingo Molnar <[email protected]>
>>
>> OK, so if you have the watchdog enabled, that's 1 event, and having a
>> max of 2 GP events, adding another 2 events is fail.
>>
>> Jiri, did you SNB have the watchdog disabled?
>
> watchdog was enabled
>
But maybe HT was disabled on your snb machine.

> jirka