2014-10-08 11:07:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 10/11] perf/x86/intel: Support task events with Intel CQM

On Wed, Sep 24, 2014 at 03:04:14PM +0100, Matt Fleming wrote:
> From: Matt Fleming <[email protected]>
>
> Add support for task events as well as system-wide events. This change
> has a big impact on the way that we gather LLC occupancy values in
> intel_cqm_event_read().
>
> Currently, for system-wide (per-cpu) events we defer processing to
> userspace which knows how to discard all but one cpu result per package.
>
> Things aren't so simple for task events because we need to do the value
> aggregation ourselves. To do this, we defer updating the LLC occupancy
> value in event->count from intel_cqm_event_read() and do an SMP
> cross-call to read values for all packages in intel_cqm_event_count().
> We need to ensure that we only do this for one task event per cache
> group, otherwise we'll report duplicate values.
>
> If we're a system-wide event we want to fallback to the default
> perf_event_count() implementation. Refactor this into a common function
> so that we don't duplicate the code.

So it looks like these events will be classified as regular HW events,
this means they'll be mixed with the other HW events, and we'll stop
scheduling the moment either one returns a fail.

There are two alternatives;
1) create an extra task context to keep them in
2) pretend to be a software event and do the scheduling yourself

I think my initial proposal was 2, can you clarify why you've changed
that? Lemme go read the next patch though, maybe that'll clarify things
further.


2014-10-08 12:10:48

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH 10/11] perf/x86/intel: Support task events with Intel CQM

On Wed, 08 Oct, at 01:07:43PM, Peter Zijlstra wrote:
> On Wed, Sep 24, 2014 at 03:04:14PM +0100, Matt Fleming wrote:
> > From: Matt Fleming <[email protected]>
> >
> > Add support for task events as well as system-wide events. This change
> > has a big impact on the way that we gather LLC occupancy values in
> > intel_cqm_event_read().
> >
> > Currently, for system-wide (per-cpu) events we defer processing to
> > userspace which knows how to discard all but one cpu result per package.
> >
> > Things aren't so simple for task events because we need to do the value
> > aggregation ourselves. To do this, we defer updating the LLC occupancy
> > value in event->count from intel_cqm_event_read() and do an SMP
> > cross-call to read values for all packages in intel_cqm_event_count().
> > We need to ensure that we only do this for one task event per cache
> > group, otherwise we'll report duplicate values.
> >
> > If we're a system-wide event we want to fallback to the default
> > perf_event_count() implementation. Refactor this into a common function
> > so that we don't duplicate the code.
>
> So it looks like these events will be classified as regular HW events,
> this means they'll be mixed with the other HW events, and we'll stop
> scheduling the moment either one returns a fail.
>
> There are two alternatives;
> 1) create an extra task context to keep them in
> 2) pretend to be a software event and do the scheduling yourself
>
> I think my initial proposal was 2, can you clarify why you've changed
> that? Lemme go read the next patch though, maybe that'll clarify things
> further.

Ah, interesting.

I dropped the internal scheduling because I preferred the idea of
"failing fast", in the sense that if we can't schedule multiple events
simultaneously because they conflict, we should report that to the user
at event init time, rather than trying to manage the conflict ourselves,
with the resultant loss of accuracy.

But I wasn't aware of the issue you've brought up, and it sounds like we
need to do the scheduling anyway.

--
Matt Fleming, Intel Open Source Technology Center

2014-10-08 14:50:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 10/11] perf/x86/intel: Support task events with Intel CQM

On Wed, Oct 08, 2014 at 01:10:44PM +0100, Matt Fleming wrote:
> Ah, interesting.
>
> I dropped the internal scheduling because I preferred the idea of
> "failing fast", in the sense that if we can't schedule multiple events
> simultaneously because they conflict, we should report that to the user
> at event init time, rather than trying to manage the conflict ourselves,
> with the resultant loss of accuracy.

The thing is, with multiplexing you cannot fail at event creation time
anyhow. The only time where you can 'fail' is when programming the PMU,
when its full its full.

Those that don't fit, get to wait their turn.

2014-10-10 10:54:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 10/11] perf/x86/intel: Support task events with Intel CQM

On Wed, Oct 08, 2014 at 01:07:43PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 24, 2014 at 03:04:14PM +0100, Matt Fleming wrote:
> > From: Matt Fleming <[email protected]>
> >
> > Add support for task events as well as system-wide events. This change
> > has a big impact on the way that we gather LLC occupancy values in
> > intel_cqm_event_read().
> >
> > Currently, for system-wide (per-cpu) events we defer processing to
> > userspace which knows how to discard all but one cpu result per package.
> >
> > Things aren't so simple for task events because we need to do the value
> > aggregation ourselves. To do this, we defer updating the LLC occupancy
> > value in event->count from intel_cqm_event_read() and do an SMP
> > cross-call to read values for all packages in intel_cqm_event_count().
> > We need to ensure that we only do this for one task event per cache
> > group, otherwise we'll report duplicate values.
> >
> > If we're a system-wide event we want to fallback to the default
> > perf_event_count() implementation. Refactor this into a common function
> > so that we don't duplicate the code.
>
> So it looks like these events will be classified as regular HW events,
> this means they'll be mixed with the other HW events, and we'll stop
> scheduling the moment either one returns a fail.
>
> There are two alternatives;
> 1) create an extra task context to keep them in
> 2) pretend to be a software event and do the scheduling yourself
>
> I think my initial proposal was 2, can you clarify why you've changed
> that? Lemme go read the next patch though, maybe that'll clarify things
> further.

As you rightly pointed out on IRC, the following (as found in patch 8):

+static struct pmu intel_cqm_pmu = {
+ .attr_groups = intel_cqm_attr_groups,
+ .task_ctx_nr = perf_sw_context,

Does indeed make it a software event, so then need to do all the
scheduling outselves.

2014-10-10 11:10:59

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH 10/11] perf/x86/intel: Support task events with Intel CQM

On Fri, 10 Oct, at 12:54:25PM, Peter Zijlstra wrote:
>
> As you rightly pointed out on IRC, the following (as found in patch 8):
>
> +static struct pmu intel_cqm_pmu = {
> + .attr_groups = intel_cqm_attr_groups,
> + .task_ctx_nr = perf_sw_context,
>
> Does indeed make it a software event, so then need to do all the
> scheduling outselves.

OK, I'll take a look at that.

--
Matt Fleming, Intel Open Source Technology Center

2014-10-10 12:02:09

by Matt Fleming

[permalink] [raw]
Subject: Re: [PATCH 10/11] perf/x86/intel: Support task events with Intel CQM

On Wed, 08 Oct, at 04:49:57PM, Peter Zijlstra wrote:
>
> The thing is, with multiplexing you cannot fail at event creation time
> anyhow. The only time where you can 'fail' is when programming the PMU,
> when its full its full.
>
> Those that don't fit, get to wait their turn.

For CQM it's not about "fitting" everything in the PMU but more about
monitoring the same thing (task, cgroup) with different events, i.e. one
thing with two RMIDs. We have the RMID recycling algorithm to make
things fit, but that doesn't help us out here.

An example scenario that isn't supported by this patch series is
monitoring a cgroup while simultaneously monitoring a task that's part
of that cgroup. Whichever event is created second will fail at event
init time.

And that seemed like a fair approach to me. But the more I think about
it, the more I begin to agree that maybe we should allow users the
flexibility to create conflicting events, particularly because there
appears to be precedent in other parts of perf.

Hmm... "rotation" is starting to become my least favourite word.

--
Matt Fleming, Intel Open Source Technology Center

2014-10-10 14:02:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 10/11] perf/x86/intel: Support task events with Intel CQM

On Fri, Oct 10, 2014 at 01:02:02PM +0100, Matt Fleming wrote:
> On Wed, 08 Oct, at 04:49:57PM, Peter Zijlstra wrote:
> >
> > The thing is, with multiplexing you cannot fail at event creation time
> > anyhow. The only time where you can 'fail' is when programming the PMU,
> > when its full its full.
> >
> > Those that don't fit, get to wait their turn.
>
> For CQM it's not about "fitting" everything in the PMU but more about
> monitoring the same thing (task, cgroup) with different events, i.e. one
> thing with two RMIDs. We have the RMID recycling algorithm to make
> things fit, but that doesn't help us out here.
>
> An example scenario that isn't supported by this patch series is
> monitoring a cgroup while simultaneously monitoring a task that's part
> of that cgroup. Whichever event is created second will fail at event
> init time.

That was what I had that conflict thing for, ISTR seeing some parts of
that here.

> And that seemed like a fair approach to me. But the more I think about
> it, the more I begin to agree that maybe we should allow users the
> flexibility to create conflicting events, particularly because there
> appears to be precedent in other parts of perf.

Yeah, although typically its not this hard. CQM is 'interesting' because
its so different.

> Hmm... "rotation" is starting to become my least favourite word.

:-)