2019-03-14 13:13:04

by Peter Zijlstra

[permalink] [raw]
Subject: [RFC][PATCH 7/8] perf/x86: Optimize x86_schedule_events()

Now that cpuc->event_constraint[] is retained, we can avoid calling
get_event_constraints() over and over again.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/events/core.c | 25 +++++++++++++++++++++----
arch/x86/events/intel/core.c | 3 ++-
2 files changed, 23 insertions(+), 5 deletions(-)

--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -844,6 +844,12 @@ int perf_assign_events(struct event_cons
}
EXPORT_SYMBOL_GPL(perf_assign_events);

+static inline bool is_ht_workaround_active(struct cpu_hw_events *cpuc)
+{
+ return is_ht_workaround_enabled() && !cpuc->is_fake &&
+ READ_ONCE(cpuc->excl_cntrs->exclusive_present);
+}
+
int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
{
struct event_constraint *c;
@@ -858,8 +864,20 @@ int x86_schedule_events(struct cpu_hw_ev
x86_pmu.start_scheduling(cpuc);

for (i = 0, wmin = X86_PMC_IDX_MAX, wmax = 0; i < n; i++) {
- c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
- cpuc->event_constraint[i] = c;
+ c = cpuc->event_constraint[i];
+
+ /*
+ * Request constraints for new events; or for those events that
+ * have a dynamic constraint due to the HT workaround -- for
+ * those the constraint can change due to scheduling activity
+ * on the other sibling.
+ */
+ if (!c || ((c->flags & PERF_X86_EVENT_DYNAMIC) &&
+ is_ht_workaround_active(cpuc))) {
+
+ c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
+ cpuc->event_constraint[i] = c;
+ }

wmin = min(wmin, c->weight);
wmax = max(wmax, c->weight);
@@ -903,8 +921,7 @@ int x86_schedule_events(struct cpu_hw_ev
* N/2 counters can be used. This helps with events with
* specific counter constraints.
*/
- if (is_ht_workaround_enabled() && !cpuc->is_fake &&
- READ_ONCE(cpuc->excl_cntrs->exclusive_present))
+ if (is_ht_workaround_active(cpuc))
gpmax /= 2;

unsched = perf_assign_events(cpuc->event_constraint, n, wmin,
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2945,7 +2945,8 @@ intel_get_event_constraints(struct cpu_h
* - dynamic constraint: handled by intel_get_excl_constraints()
*/
c2 = __intel_get_event_constraints(cpuc, idx, event);
- if (c1 && (c1->flags & PERF_X86_EVENT_DYNAMIC)) {
+ if (c1) {
+ WARN_ON_ONCE(!(c1->flags & PERF_X86_EVENT_DYNAMIC));
bitmap_copy(c1->idxmsk, c2->idxmsk, X86_PMC_IDX_MAX);
c1->weight = c2->weight;
c2 = c1;




2019-03-19 23:56:20

by Stephane Eranian

[permalink] [raw]
Subject: Re: [RFC][PATCH 7/8] perf/x86: Optimize x86_schedule_events()

On Thu, Mar 14, 2019 at 6:11 AM Peter Zijlstra <[email protected]> wrote:
>
> Now that cpuc->event_constraint[] is retained, we can avoid calling
> get_event_constraints() over and over again.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> arch/x86/events/core.c | 25 +++++++++++++++++++++----
> arch/x86/events/intel/core.c | 3 ++-
> 2 files changed, 23 insertions(+), 5 deletions(-)
>
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -844,6 +844,12 @@ int perf_assign_events(struct event_cons
> }
> EXPORT_SYMBOL_GPL(perf_assign_events);
>
> +static inline bool is_ht_workaround_active(struct cpu_hw_events *cpuc)
> +{
> + return is_ht_workaround_enabled() && !cpuc->is_fake &&
> + READ_ONCE(cpuc->excl_cntrs->exclusive_present);
> +}
> +
> int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
> {
> struct event_constraint *c;
> @@ -858,8 +864,20 @@ int x86_schedule_events(struct cpu_hw_ev
> x86_pmu.start_scheduling(cpuc);
>
> for (i = 0, wmin = X86_PMC_IDX_MAX, wmax = 0; i < n; i++) {
> - c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
> - cpuc->event_constraint[i] = c;
> + c = cpuc->event_constraint[i];
> +
> + /*
> + * Request constraints for new events; or for those events that
> + * have a dynamic constraint due to the HT workaround -- for
> + * those the constraint can change due to scheduling activity
> + * on the other sibling.
> + */
> + if (!c || ((c->flags & PERF_X86_EVENT_DYNAMIC) &&
> + is_ht_workaround_active(cpuc))) {
> +
> + c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
> + cpuc->event_constraint[i] = c;
> + }
On this one, I think there may be a problem with events with
shared_regs constraints.
Constraint is dynamic as it depends on other events which share the
same MSR, yet it
is not marked as DYNAMIC. But this may be okay because these other
events are all
on the same CPU and thus scheduled during the same ctx_sched_in(). Yet with the
swapping in intel_alt_er(), we need to double-check that we cannot
reuse a constraint
which could be stale. I believe this is okay, just double-check.

>
> wmin = min(wmin, c->weight);
> wmax = max(wmax, c->weight);
> @@ -903,8 +921,7 @@ int x86_schedule_events(struct cpu_hw_ev
> * N/2 counters can be used. This helps with events with
> * specific counter constraints.
> */
> - if (is_ht_workaround_enabled() && !cpuc->is_fake &&
> - READ_ONCE(cpuc->excl_cntrs->exclusive_present))
> + if (is_ht_workaround_active(cpuc))
> gpmax /= 2;
>
> unsched = perf_assign_events(cpuc->event_constraint, n, wmin,
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2945,7 +2945,8 @@ intel_get_event_constraints(struct cpu_h
> * - dynamic constraint: handled by intel_get_excl_constraints()
> */
> c2 = __intel_get_event_constraints(cpuc, idx, event);
> - if (c1 && (c1->flags & PERF_X86_EVENT_DYNAMIC)) {
> + if (c1) {
> + WARN_ON_ONCE(!(c1->flags & PERF_X86_EVENT_DYNAMIC));
> bitmap_copy(c1->idxmsk, c2->idxmsk, X86_PMC_IDX_MAX);
> c1->weight = c2->weight;
> c2 = c1;
>
>

2019-03-20 13:13:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 7/8] perf/x86: Optimize x86_schedule_events()

On Tue, Mar 19, 2019 at 04:55:16PM -0700, Stephane Eranian wrote:
> On Thu, Mar 14, 2019 at 6:11 AM Peter Zijlstra <[email protected]> wrote:
> > @@ -858,8 +864,20 @@ int x86_schedule_events(struct cpu_hw_ev
> > x86_pmu.start_scheduling(cpuc);
> >
> > for (i = 0, wmin = X86_PMC_IDX_MAX, wmax = 0; i < n; i++) {
> > - c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
> > - cpuc->event_constraint[i] = c;
> > + c = cpuc->event_constraint[i];
> > +
> > + /*
> > + * Request constraints for new events; or for those events that
> > + * have a dynamic constraint due to the HT workaround -- for
> > + * those the constraint can change due to scheduling activity
> > + * on the other sibling.
> > + */
> > + if (!c || ((c->flags & PERF_X86_EVENT_DYNAMIC) &&
> > + is_ht_workaround_active(cpuc))) {
> > +
> > + c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
> > + cpuc->event_constraint[i] = c;
> > + }

> On this one, I think there may be a problem with events with
> shared_regs constraints.

Hmm...

> Constraint is dynamic as it depends on other events which share the
> same MSR, yet it is not marked as DYNAMIC.

it returns &emptyconstraint or a table constraint, depending on register
state.

> But this may be okay because these other events are all on the same
> CPU and thus scheduled during the same ctx_sched_in(). Yet with the
> swapping in intel_alt_er(), we need to double-check that we cannot
> reuse a constraint which could be stale.

> I believe this is okay, just double-check.

I'm not sure I see a problem.

So if we're the first event on a shared register, we claim the register
and scheduling succeeds (barring other constraints).

If we're the second event on a shared register (and have conflicting
register state), we get the empty constraint. This _will_ cause
scheduling to fail. We'll not cache the state and punt it back to the
core code.

So no future scheduling pass will come to see a shared reg constraint
that could've changed.

Now, there is indeed the intel_alt_er() thing, which slightly
complicates this; then suppose we schedule an event on RSP0, another on
RSP1, then remove the RSP0 one. Even in that case, the remaining RSP1
event will not change its constraint, since intel_fixup_er() rewrites
the event to be a native RSP1 event.

So that too reduces to the prior case.


That said; I have simplified the above condition to:

@@ -858,8 +858,17 @@ int x86_schedule_events(struct cpu_hw_ev
x86_pmu.start_scheduling(cpuc);

for (i = 0, wmin = X86_PMC_IDX_MAX, wmax = 0; i < n; i++) {
- c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
- cpuc->event_constraint[i] = c;
+ c = cpuc->event_constraint[i];
+
+ /*
+ * Request constraints for new events; or for those events that
+ * have a dynamic constraint -- for those the constraint can
+ * change due to external factors (sibling state, allow_tfa).
+ */
+ if (!c || (c->flags & PERF_X86_EVENT_DYNAMIC)) {
+ c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
+ cpuc->event_constraint[i] = c;
+ }

wmin = min(wmin, c->weight);
wmax = max(wmax, c->weight);

Because any dynamic event can change from one moment to the next.

2019-03-20 19:33:42

by Stephane Eranian

[permalink] [raw]
Subject: Re: [RFC][PATCH 7/8] perf/x86: Optimize x86_schedule_events()

On Wed, Mar 20, 2019 at 6:11 AM Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Mar 19, 2019 at 04:55:16PM -0700, Stephane Eranian wrote:
> > On Thu, Mar 14, 2019 at 6:11 AM Peter Zijlstra <[email protected]> wrote:
> > > @@ -858,8 +864,20 @@ int x86_schedule_events(struct cpu_hw_ev
> > > x86_pmu.start_scheduling(cpuc);
> > >
> > > for (i = 0, wmin = X86_PMC_IDX_MAX, wmax = 0; i < n; i++) {
> > > - c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
> > > - cpuc->event_constraint[i] = c;
> > > + c = cpuc->event_constraint[i];
> > > +
> > > + /*
> > > + * Request constraints for new events; or for those events that
> > > + * have a dynamic constraint due to the HT workaround -- for
> > > + * those the constraint can change due to scheduling activity
> > > + * on the other sibling.
> > > + */
> > > + if (!c || ((c->flags & PERF_X86_EVENT_DYNAMIC) &&
> > > + is_ht_workaround_active(cpuc))) {
> > > +
> > > + c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
> > > + cpuc->event_constraint[i] = c;
> > > + }
>
> > On this one, I think there may be a problem with events with
> > shared_regs constraints.
>
> Hmm...
>
> > Constraint is dynamic as it depends on other events which share the
> > same MSR, yet it is not marked as DYNAMIC.
>
> it returns &emptyconstraint or a table constraint, depending on register
> state.
>
> > But this may be okay because these other events are all on the same
> > CPU and thus scheduled during the same ctx_sched_in(). Yet with the
> > swapping in intel_alt_er(), we need to double-check that we cannot
> > reuse a constraint which could be stale.
>
> > I believe this is okay, just double-check.
>
> I'm not sure I see a problem.
>
> So if we're the first event on a shared register, we claim the register
> and scheduling succeeds (barring other constraints).
>
> If we're the second event on a shared register (and have conflicting
> register state), we get the empty constraint. This _will_ cause
> scheduling to fail. We'll not cache the state and punt it back to the
> core code.
>
> So no future scheduling pass will come to see a shared reg constraint
> that could've changed.
>
> Now, there is indeed the intel_alt_er() thing, which slightly
> complicates this; then suppose we schedule an event on RSP0, another on
> RSP1, then remove the RSP0 one. Even in that case, the remaining RSP1
> event will not change its constraint, since intel_fixup_er() rewrites
> the event to be a native RSP1 event.
>
> So that too reduces to the prior case.
>
I came the same conclusion later yesterday. I think this is okay.

>
> That said; I have simplified the above condition to:
>
> @@ -858,8 +858,17 @@ int x86_schedule_events(struct cpu_hw_ev
> x86_pmu.start_scheduling(cpuc);
>
> for (i = 0, wmin = X86_PMC_IDX_MAX, wmax = 0; i < n; i++) {
> - c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
> - cpuc->event_constraint[i] = c;
> + c = cpuc->event_constraint[i];
> +
> + /*
> + * Request constraints for new events; or for those events that
> + * have a dynamic constraint -- for those the constraint can
> + * change due to external factors (sibling state, allow_tfa).
> + */
> + if (!c || (c->flags & PERF_X86_EVENT_DYNAMIC)) {
> + c = x86_pmu.get_event_constraints(cpuc, i, cpuc->event_list[i]);
> + cpuc->event_constraint[i] = c;
> + }
>
Right now DYNAMIC is only casued by the HT bug, but it could change later on but
the logic here would remain. If HT workaround is disabled, then no
evnt is tagged with
DYNAMIC.

> wmin = min(wmin, c->weight);
> wmax = max(wmax, c->weight);
>
> Because any dynamic event can change from one moment to the next.
I agree.