LinuxLists.cc - [PATCH] perf_events: improve x86 event scheduling

[permalink] [raw]

Subject: Re: [PATCH] perf_events: improve x86 event scheduling

On Tue, 2010-01-12 at 12:50 +0200, Stephane Eranian wrote:
> This patch improves event scheduling by maximizing the use
> of PMU registers regardless of the order in which events are
> created in a group.
>
> The algorithm takes into account the list of counter constraints
> for each event. It assigns events to counters from the most
> constrained, i.e., works on only one counter, to the least
> constrained, i.e., works on any counter.
>
> Intel Fixed counter events and the BTS special event are also
> handled via this algorithm which is designed to be fairly generic.
>
> The patch also updates the validation of an event to use the
> scheduling algorithm. This will cause early failure in
> perf_event_open().
>
> This is the 2nd version of this patch. It follows the model used
> by PPC, by running the scheduling algorithm and the actual
> assignment separately. Actual assignment takes place in
> hw_perf_enable() whereas scheduling is implemented in
> hw_perf_group_sched_in() and x86_pmu_enable().

Looks very good, will have to reread it again in the morning to look for
more details but from an initial reading its fine.

One concern I do have is the loss of error checking on
event_sched_in()'s event->pmu->enable(), that could be another
'hardware' PMU like breakpoints, in which case it could fail.

Not sure what to do with that.. maybe we should not allow mixing
different hardware PMU events, but only 1 hardware + software events, in
which case the below patch should retain some of the
is_software_event()s.

Frederic, Paul?

> Signed-off-by: Stephane Eranian <[email protected]>
> --

> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index 8d9f854..16beb34 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -19,6 +19,7 @@
> #define MSR_ARCH_PERFMON_EVENTSEL1 0x187
>
> #define ARCH_PERFMON_EVENTSEL0_ENABLE (1 << 22)
> +#define ARCH_PERFMON_EVENTSEL_ANY (1 << 21)
> #define ARCH_PERFMON_EVENTSEL_INT (1 << 20)
> #define ARCH_PERFMON_EVENTSEL_OS (1 << 17)
> #define ARCH_PERFMON_EVENTSEL_USR (1 << 16)

> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index d616c06..d68c3e5 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -1343,6 +1579,13 @@ intel_pmu_enable_fixed(struct hw_perf_event *hwc, int __idx)
> bits |= 0x2;
> if (hwc->config & ARCH_PERFMON_EVENTSEL_OS)
> bits |= 0x1;
> +
> + /*
> + * ANY bit is supported in v3 and up
> + */
> + if (x86_pmu.version > 2 && hwc->config & ARCH_PERFMON_EVENTSEL_ANY)
> + bits |= 0x4;
> +
> bits <<= (idx * 4);
> mask = 0xfULL << (idx * 4);
>

This looks like a separate patch all by itself.

Also, the below fixes a few style nits and that is_software_event()
usage:

---
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -225,7 +225,8 @@ static struct event_constraint intel_cor
EVENT_CONSTRAINT_END
};

-static struct event_constraint intel_nehalem_event_constraints[] = {
+static struct event_constraint intel_nehalem_event_constraints[] =
+{
EVENT_CONSTRAINT(0xc0, (0x3|(1ULL<<32)), INTEL_ARCH_FIXED_MASK), /* INSTRUCTIONS_RETIRED */
EVENT_CONSTRAINT(0x3c, (0x3|(1ULL<<33)), INTEL_ARCH_FIXED_MASK), /* UNHALTED_CORE_CYCLES */
EVENT_CONSTRAINT(0x40, 0x3, INTEL_ARCH_EVENT_MASK), /* L1D_CACHE_LD */
@@ -241,7 +242,8 @@ static struct event_constraint intel_neh
EVENT_CONSTRAINT_END
};

-static struct event_constraint intel_gen_event_constraints[] = {
+static struct event_constraint intel_gen_event_constraints[] =
+{
EVENT_CONSTRAINT(0xc0, (0x3|(1ULL<<32)), INTEL_ARCH_FIXED_MASK), /* INSTRUCTIONS_RETIRED */
EVENT_CONSTRAINT(0x3c, (0x3|(1ULL<<33)), INTEL_ARCH_FIXED_MASK), /* UNHALTED_CORE_CYCLES */
};
@@ -1310,6 +1312,13 @@ skip:
return 0;
}

+static const struct pmu pmu;
+
+static inline bool is_x86_event(struct perf_event *event)
+{
+ return event->pmu == pmu;
+}
+
/*
* dogrp: true if must collect siblings events (group)
* returns total number of events and error code
@@ -1324,7 +1333,7 @@ static int collect_events(struct cpu_hw_
/* current number of events already accepted */
n = cpuc->n_events;

- if (!is_software_event(leader)) {
+ if (is_x86_event(leader)) {
if (n >= max_count)
return -ENOSPC;
cpuc->event_list[n] = leader;
@@ -1334,7 +1343,7 @@ static int collect_events(struct cpu_hw_
return n;

list_for_each_entry(event, &leader->sibling_list, group_entry) {
- if (is_software_event(event) ||
+ if (!is_x86_event(event) ||
event->state == PERF_EVENT_STATE_OFF)
continue;

@@ -2154,7 +2163,7 @@ static void event_sched_in(struct perf_e
event->state = PERF_EVENT_STATE_ACTIVE;
event->oncpu = cpu;
event->tstamp_running += event->ctx->time - event->tstamp_stopped;
- if (is_software_event(event))
+ if (!is_x86_event(event))
event->pmu->enable(event);
}

@@ -2209,7 +2218,7 @@ int hw_perf_group_sched_in(struct perf_e
* 1 means successful and events are active
* This is not quite true because we defer
* actual activation until hw_perf_enable() but
- * this way we* ensure caller won't try to enable
+ * this way we ensure caller won't try to enable
* individual events
*/
return 1;

2010-01-13 09:54:50

[permalink] [raw]

Subject: Re: [PATCH] perf_events: improve x86 event scheduling

On Tue, Jan 12, 2010 at 5:10 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2010-01-12 at 12:50 +0200, Stephane Eranian wrote:
>> This patch improves event scheduling by maximizing the use
>> of PMU registers regardless of the order in which events are
>> created in a group.
>>
>> The algorithm takes into account the list of counter constraints
>> for each event. It assigns events to counters from the most
>> constrained, i.e., works on only one counter, to the least
>> constrained, i.e., works on any counter.
>>
>> Intel Fixed counter events and the BTS special event are also
>> handled via this algorithm which is designed to be fairly generic.
>>
>> The patch also updates the validation of an event to use the
>> scheduling algorithm. This will cause early failure in
>> perf_event_open().
>>
>> This is the 2nd version of this patch. It follows the model used
>> by PPC, by running the scheduling algorithm and the actual
>> assignment separately. Actual assignment takes place in
>> hw_perf_enable() whereas scheduling is implemented in
>> hw_perf_group_sched_in() and x86_pmu_enable().
>
> Looks very good, will have to reread it again in the morning to look for
> more details but from an initial reading its fine.
>
> One concern I do have is the loss of error checking on
> event_sched_in()'s event->pmu->enable(), that could be another
> 'hardware' PMU like breakpoints, in which case it could fail.
>
Well, x86_pmu_enable() does return an error code, so it is up
to the upper layer to handle the error gracefully and in particular
in perf_ctx_adjust_freq().

>> Signed-off-by: Stephane Eranian <[email protected]>
>> --
>
>> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
>> index 8d9f854..16beb34 100644
>> --- a/arch/x86/include/asm/perf_event.h
>> +++ b/arch/x86/include/asm/perf_event.h
>> @@ -19,6 +19,7 @@
>> #define MSR_ARCH_PERFMON_EVENTSEL1 0x187
>>
>> #define ARCH_PERFMON_EVENTSEL0_ENABLE (1 << 22)
>> +#define ARCH_PERFMON_EVENTSEL_ANY (1 << 21)
>> #define ARCH_PERFMON_EVENTSEL_INT (1 << 20)
>> #define ARCH_PERFMON_EVENTSEL_OS (1 << 17)
>> #define ARCH_PERFMON_EVENTSEL_USR (1 << 16)
>
>
>> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
>> index d616c06..d68c3e5 100644
>> --- a/arch/x86/kernel/cpu/perf_event.c
>> +++ b/arch/x86/kernel/cpu/perf_event.c
>> @@ -1343,6 +1579,13 @@ intel_pmu_enable_fixed(struct hw_perf_event *hwc, int __idx)
>> bits |= 0x2;
>> if (hwc->config & ARCH_PERFMON_EVENTSEL_OS)
>> bits |= 0x1;
>> +
>> + /*
>> + * ANY bit is supported in v3 and up
>> + */
>> + if (x86_pmu.version > 2 && hwc->config & ARCH_PERFMON_EVENTSEL_ANY)
>> + bits |= 0x4;
>> +
>> bits <<= (idx * 4);
>> mask = 0xfULL << (idx * 4);
>>
>
> This looks like a separate patch all by itself.
>
> Also, the below fixes a few style nits and that is_software_event()
> usage:
>
> ---
> Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
> +++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
> @@ -225,7 +225,8 @@ static struct event_constraint intel_cor
> EVENT_CONSTRAINT_END
> };
>
> -static struct event_constraint intel_nehalem_event_constraints[] = {
> +static struct event_constraint intel_nehalem_event_constraints[] =
> +{
> EVENT_CONSTRAINT(0xc0, (0x3|(1ULL<<32)), INTEL_ARCH_FIXED_MASK), /* INSTRUCTIONS_RETIRED */
> EVENT_CONSTRAINT(0x3c, (0x3|(1ULL<<33)), INTEL_ARCH_FIXED_MASK), /* UNHALTED_CORE_CYCLES */
> EVENT_CONSTRAINT(0x40, 0x3, INTEL_ARCH_EVENT_MASK), /* L1D_CACHE_LD */
> @@ -241,7 +242,8 @@ static struct event_constraint intel_neh
> EVENT_CONSTRAINT_END
> };
>
> -static struct event_constraint intel_gen_event_constraints[] = {
> +static struct event_constraint intel_gen_event_constraints[] =
> +{
> EVENT_CONSTRAINT(0xc0, (0x3|(1ULL<<32)), INTEL_ARCH_FIXED_MASK), /* INSTRUCTIONS_RETIRED */
> EVENT_CONSTRAINT(0x3c, (0x3|(1ULL<<33)), INTEL_ARCH_FIXED_MASK), /* UNHALTED_CORE_CYCLES */
> };
> @@ -1310,6 +1312,13 @@ skip:
> return 0;
> }
>
> +static const struct pmu pmu;
> +
> +static inline bool is_x86_event(struct perf_event *event)
> +{
> + return event->pmu == pmu;
> +}
> +
> /*
> * dogrp: true if must collect siblings events (group)
> * returns total number of events and error code
> @@ -1324,7 +1333,7 @@ static int collect_events(struct cpu_hw_
> /* current number of events already accepted */
> n = cpuc->n_events;
>
> - if (!is_software_event(leader)) {
> + if (is_x86_event(leader)) {
> if (n >= max_count)
> return -ENOSPC;
> cpuc->event_list[n] = leader;
> @@ -1334,7 +1343,7 @@ static int collect_events(struct cpu_hw_
> return n;
>
> list_for_each_entry(event, &leader->sibling_list, group_entry) {
> - if (is_software_event(event) ||
> + if (!is_x86_event(event) ||
> event->state == PERF_EVENT_STATE_OFF)
> continue;
>
> @@ -2154,7 +2163,7 @@ static void event_sched_in(struct perf_e
> event->state = PERF_EVENT_STATE_ACTIVE;
> event->oncpu = cpu;
> event->tstamp_running += event->ctx->time - event->tstamp_stopped;
> - if (is_software_event(event))
> + if (!is_x86_event(event))
> event->pmu->enable(event);
> }
>
> @@ -2209,7 +2218,7 @@ int hw_perf_group_sched_in(struct perf_e
> * 1 means successful and events are active
> * This is not quite true because we defer
> * actual activation until hw_perf_enable() but
> - * this way we* ensure caller won't try to enable
> + * this way we ensure caller won't try to enable
> * individual events
> */
> return 1;
>
>
>
>

--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'Opéra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks

2010-01-13 16:30:14

[permalink] [raw]

Subject: Re: [PATCH] perf_events: improve x86 event scheduling

On Wed, 2010-01-13 at 10:54 +0100, Stephane Eranian wrote:

> > One concern I do have is the loss of error checking on
> > event_sched_in()'s event->pmu->enable(), that could be another
> > 'hardware' PMU like breakpoints, in which case it could fail.
> >
> Well, x86_pmu_enable() does return an error code, so it is up
> to the upper layer to handle the error gracefully and in particular
> in perf_ctx_adjust_freq().

> +static void event_sched_in(struct perf_event *event, int cpu)
> +{
> + event->state = PERF_EVENT_STATE_ACTIVE;
> + event->oncpu = cpu;
> + event->tstamp_running += event->ctx->time - event->tstamp_stopped;
> + if (is_software_event(event))
> + event->pmu->enable(event);
> +}
> +
> +/*
> + * Called to enable a whole group of events.
> + * Returns 1 if the group was enabled, or -EAGAIN if it could not be.
> + * Assumes the caller has disabled interrupts and has
> + * frozen the PMU with hw_perf_save_disable.
> + *
> + * called with PMU disabled. If successful and return value 1,
> + * then guaranteed to call perf_enable() and hw_perf_enable()
> + */
> +int hw_perf_group_sched_in(struct perf_event *leader,
> + struct perf_cpu_context *cpuctx,
> + struct perf_event_context *ctx, int cpu)
> +{
> + struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
> + struct perf_event *sub;
> + int assign[X86_PMC_IDX_MAX];
> + int n, n0, ret;
> +
> + n0 = cpuc->n_events;
> +
> + n = collect_events(cpuc, leader, true);
> + if (n < 0)
> + return n;
> +
> + ret = x86_schedule_events(cpuc, n, assign);
> + if (ret)
> + return ret;
> +
> + /*
> + * copy new assignment, now we know it is possible
> + * will be used by hw_perf_enable()
> + */
> + memcpy(cpuc->assign, assign, n*sizeof(int));
> +
> + cpuc->n_events = n;
> + cpuc->n_added = n - n0;
> +
> + n = 1;
> + event_sched_in(leader, cpu);
> + list_for_each_entry(sub, &leader->sibling_list, group_entry) {
> + if (sub->state != PERF_EVENT_STATE_OFF) {
> + event_sched_in(sub, cpu);
> + ++n;
> + }
> + }
> + ctx->nr_active += n;
> +
> + /*
> + * 1 means successful and events are active
> + * This is not quite true because we defer
> + * actual activation until hw_perf_enable() but
> + * this way we* ensure caller won't try to enable
> + * individual events
> + */
> + return 1;
> +}

That most certainly looses error codes for all !is_x86_event() events in
the group.

So you either need to deal with that event_sched_in() failing, or
guarantee that it always succeeds (forcing software events only for
example).

2010-01-13 16:52:36

[permalink] [raw]

Subject: Re: [PATCH] perf_events: improve x86 event scheduling

On Wed, Jan 13, 2010 at 5:29 PM, Peter Zijlstra <[email protected]> wrote:
> On Wed, 2010-01-13 at 10:54 +0100, Stephane Eranian wrote:
>
>> > One concern I do have is the loss of error checking on
>> > event_sched_in()'s event->pmu->enable(), that could be another
>> > 'hardware' PMU like breakpoints, in which case it could fail.
>> >
>> Well, x86_pmu_enable() does return an error code, so it is up
>> to the upper layer to handle the error gracefully and in particular
>> in perf_ctx_adjust_freq().
>
>> +static void event_sched_in(struct perf_event *event, int cpu)
>> +{
>> + event->state = PERF_EVENT_STATE_ACTIVE;
>> + event->oncpu = cpu;
>> + event->tstamp_running += event->ctx->time - event->tstamp_stopped;
>> + if (is_software_event(event))
>> + event->pmu->enable(event);
>> +}
>> +
>> +/*
>> + * Called to enable a whole group of events.
>> + * Returns 1 if the group was enabled, or -EAGAIN if it could not be.
>> + * Assumes the caller has disabled interrupts and has
>> + * frozen the PMU with hw_perf_save_disable.
>> + *
>> + * called with PMU disabled. If successful and return value 1,
>> + * then guaranteed to call perf_enable() and hw_perf_enable()
>> + */
>> +int hw_perf_group_sched_in(struct perf_event *leader,
>> + struct perf_cpu_context *cpuctx,
>> + struct perf_event_context *ctx, int cpu)
>> +{
>> + struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
>> + struct perf_event *sub;
>> + int assign[X86_PMC_IDX_MAX];
>> + int n, n0, ret;
>> +
>> + n0 = cpuc->n_events;
>> +
>> + n = collect_events(cpuc, leader, true);
>> + if (n < 0)
>> + return n;
>> +
>> + ret = x86_schedule_events(cpuc, n, assign);
>> + if (ret)
>> + return ret;
>> +
>> + /*
>> + * copy new assignment, now we know it is possible
>> + * will be used by hw_perf_enable()
>> + */
>> + memcpy(cpuc->assign, assign, n*sizeof(int));
>> +
>> + cpuc->n_events = n;
>> + cpuc->n_added = n - n0;
>> +
>> + n = 1;
>> + event_sched_in(leader, cpu);
>> + list_for_each_entry(sub, &leader->sibling_list, group_entry) {
>> + if (sub->state != PERF_EVENT_STATE_OFF) {
>> + event_sched_in(sub, cpu);
>> + ++n;
>> + }
>> + }
>> + ctx->nr_active += n;
>> +
>> + /*
>> + * 1 means successful and events are active
>> + * This is not quite true because we defer
>> + * actual activation until hw_perf_enable() but
>> + * this way we* ensure caller won't try to enable
>> + * individual events
>> + */
>> + return 1;
>> +}
>
> That most certainly looses error codes for all !is_x86_event() events in
> the group.
>
> So you either need to deal with that event_sched_in() failing, or
> guarantee that it always succeeds (forcing software events only for
> example).
>
True, but that one can be fixed because it is only called from
hw_perf_group_sched_in() which returns an error code.

The same code would have to be fixed on PPC as well.

>

--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'Opéra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks

2010-01-13 17:23:06

[permalink] [raw]

Subject: Re: [PATCH] perf_events: improve x86 event scheduling

Ok,

Something like that should problably do it:

static void event_sched_out(struct perf_event *event, int cpu)
{
event->state = PERF_EVENT_STATE_INACTIVE;
event->oncpu = -1;
}

hw_perf_group_sched_in()
{
....
n = 1;
list_for_each_entry(sub, &leader->sibling_list, group_entry) {
if (sub->state > PERF_EVENT_STATE_OFF) {
ret = event_sched_in(sub, cpu);
if (ret)
goto undo;
++n;
}
}
/*
* copy new assignment, now we know it is possible
* will be used by hw_perf_enable()
*/
memcpy(cpuc->assign, assign, n*sizeof(int));

cpuc->n_events = n;
cpuc->n_added = n - n0;
ctx->nr_active += n;
/*
* 1 means successful and events are active
* This is not quite true because we defer
* actual activation until hw_perf_enable() but
* this way we* ensure caller won't try to enable
* individual events
*/
return 1;
undo:
event_sched_out(leader, cpu);
n0 = n;
n = 1;
list_for_each_entry(sub, &leader->sibling_list, group_entry) {
if (sub->state == PERF_EVENT_STATE_ACTIVE) {
event_sched_out(sub, cpu);
if (++n == n0)
break;
}
}
return ret;
}

Looking at the generic and PPC code, there are a few points which
are still unclear to me:

1. I believe that the check on attr_exclusive as is done in
perf_event.c:event_sched_in()
is missing from my patch and the PPC equivalent code.

if (event->attr.exclusive)
cpuctx->exclusive = 1;

So is the smp_wmb() and the accounting in cpuctx->active_oncpu.

2. the management of event->tstamp_stopped, tstamp_running

Looks like we may have to undo the update to stamp_running or
defer the update
until we know for sure, event_sched_in() succeeded for all events,
incl. software
events.

On Wed, Jan 13, 2010 at 5:52 PM, Stephane Eranian <[email protected]> wrote:
> On Wed, Jan 13, 2010 at 5:29 PM, Peter Zijlstra <[email protected]> wrote:
>> On Wed, 2010-01-13 at 10:54 +0100, Stephane Eranian wrote:
>>
>>> > One concern I do have is the loss of error checking on
>>> > event_sched_in()'s event->pmu->enable(), that could be another
>>> > 'hardware' PMU like breakpoints, in which case it could fail.
>>> >
>>> Well, x86_pmu_enable() does return an error code, so it is up
>>> to the upper layer to handle the error gracefully and in particular
>>> in perf_ctx_adjust_freq().
>>
>>> +static void event_sched_in(struct perf_event *event, int cpu)
>>> +{
>>> + event->state = PERF_EVENT_STATE_ACTIVE;
>>> + event->oncpu = cpu;
>>> + event->tstamp_running += event->ctx->time - event->tstamp_stopped;
>>> + if (is_software_event(event))
>>> + event->pmu->enable(event);
>>> +}
>>> +
>>> +/*
>>> + * Called to enable a whole group of events.
>>> + * Returns 1 if the group was enabled, or -EAGAIN if it could not be.
>>> + * Assumes the caller has disabled interrupts and has
>>> + * frozen the PMU with hw_perf_save_disable.
>>> + *
>>> + * called with PMU disabled. If successful and return value 1,
>>> + * then guaranteed to call perf_enable() and hw_perf_enable()
>>> + */
>>> +int hw_perf_group_sched_in(struct perf_event *leader,
>>> + struct perf_cpu_context *cpuctx,
>>> + struct perf_event_context *ctx, int cpu)
>>> +{
>>> + struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
>>> + struct perf_event *sub;
>>> + int assign[X86_PMC_IDX_MAX];
>>> + int n, n0, ret;
>>> +
>>> + n0 = cpuc->n_events;
>>> +
>>> + n = collect_events(cpuc, leader, true);
>>> + if (n < 0)
>>> + return n;
>>> +
>>> + ret = x86_schedule_events(cpuc, n, assign);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + /*
>>> + * copy new assignment, now we know it is possible
>>> + * will be used by hw_perf_enable()
>>> + */
>>> + memcpy(cpuc->assign, assign, n*sizeof(int));
>>> +
>>> + cpuc->n_events = n;
>>> + cpuc->n_added = n - n0;
>>> +
>>> + n = 1;
>>> + event_sched_in(leader, cpu);
>>> + list_for_each_entry(sub, &leader->sibling_list, group_entry) {
>>> + if (sub->state != PERF_EVENT_STATE_OFF) {
>>> + event_sched_in(sub, cpu);
>>> + ++n;
>>> + }
>>> + }
>>> + ctx->nr_active += n;
>>> +
>>> + /*
>>> + * 1 means successful and events are active
>>> + * This is not quite true because we defer
>>> + * actual activation until hw_perf_enable() but
>>> + * this way we* ensure caller won't try to enable
>>> + * individual events
>>> + */
>>> + return 1;
>>> +}
>>
>> That most certainly looses error codes for all !is_x86_event() events in
>> the group.
>>
>> So you either need to deal with that event_sched_in() failing, or
>> guarantee that it always succeeds (forcing software events only for
>> example).
>>
> True, but that one can be fixed because it is only called from
> hw_perf_group_sched_in() which returns an error code.
>
> The same code would have to be fixed on PPC as well.
>
>>
>
>
>
> --
> Stephane Eranian | EMEA Software Engineering
> Google France | 38 avenue de l'Opéra | 75002 Paris
> Tel : +33 (0) 1 42 68 53 00
> This email may be confidential or privileged. If you received this
> communication by mistake, please
> don't forward it to anyone else, please erase all copies and
> attachments, and please let me know that
> it went to the wrong person. Thanks
>

--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'Opéra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks

2010-01-17 14:12:42

[permalink] [raw]

Subject: Re: [PATCH] perf_events: improve x86 event scheduling

On Wed, Jan 13, 2010 at 06:22:54PM +0100, Stephane Eranian wrote:
> Ok,
>
> Something like that should problably do it:
>
> static void event_sched_out(struct perf_event *event, int cpu)
> {
> event->state = PERF_EVENT_STATE_INACTIVE;
> event->oncpu = -1;
> }

You need to also call pmu->disable() if it is a software event,
because a breakpoint needs to be unregistered in hardware level
too.

And disable it in x86 level if it is an x86 event?

> hw_perf_group_sched_in()
> {
> ....
> n = 1;
> list_for_each_entry(sub, &leader->sibling_list, group_entry) {
> if (sub->state > PERF_EVENT_STATE_OFF) {
> ret = event_sched_in(sub, cpu);
> if (ret)
> goto undo;

Yeah we indeed really need to check that.

Thanks.

2010-01-17 14:42:24

[permalink] [raw]

Subject: Re: [PATCH] perf_events: improve x86 event scheduling

Frederic,

Here is what I have now in the x86 code.

As for your comment on disabling the x86 event, we don't
need to do this because it is not actually activated yet when
we return from hw_perf_group_sched_in(). Activation occurs
really in hw_perf_enable().

static int x86_event_sched_in(struct perf_event *event,
struct perf_cpu_context *cpuctx, int cpu)
{
int ret = 0;

event->state = PERF_EVENT_STATE_ACTIVE;
event->oncpu = cpu;
event->tstamp_running += event->ctx->time - event->tstamp_stopped;

if (is_software_event(event))
ret = event->pmu->enable(event);

if (!ret && !is_software_event(event))
cpuctx->active_oncpu++;

if (!ret && event->attr.exclusive)
cpuctx->exclusive = 1;

return ret;
}

static void x86_event_sched_out(struct perf_event *event,
struct perf_cpu_context *cpuctx, int cpu)
{
event->state = PERF_EVENT_STATE_INACTIVE;
event->oncpu = -1;

event->tstamp_running -= event->ctx->time - event->tstamp_stopped;

if (is_software_event(event))
event->pmu->disable(event);

if (!is_software_event(event))
cpuctx->active_oncpu--;

if (event->attr.exclusive || !cpuctx->active_oncpu)
cpuctx->exclusive = 0;
}

On Sun, Jan 17, 2010 at 3:12 PM, Frederic Weisbecker <[email protected]> wrote:
> On Wed, Jan 13, 2010 at 06:22:54PM +0100, Stephane Eranian wrote:
>> Ok,
>>
>> Something like that should problably do it:
>>
>> static void event_sched_out(struct perf_event *event, int cpu)
>> {
>> event->state = PERF_EVENT_STATE_INACTIVE;
>> event->oncpu = -1;
>> }
>
>
>
> You need to also call pmu->disable() if it is a software event,
> because a breakpoint needs to be unregistered in hardware level
> too.
>
> And disable it in x86 level if it is an x86 event?
>
>
>
>> hw_perf_group_sched_in()
>> {
>> ....
>> n = 1;
>> list_for_each_entry(sub, &leader->sibling_list, group_entry) {
>> if (sub->state > PERF_EVENT_STATE_OFF) {
>> ret = event_sched_in(sub, cpu);
>> if (ret)
>> goto undo;
>
>
>
> Yeah we indeed really need to check that.
>
> Thanks.
>
>

--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'Opéra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks

2010-01-17 16:19:17

[permalink] [raw]

Subject: Re: [PATCH] perf_events: improve x86 event scheduling

On Sun, Jan 17, 2010 at 03:42:16PM +0100, Stephane Eranian wrote:
> Frederic,
>
>
> Here is what I have now in the x86 code.
>
> As for your comment on disabling the x86 event, we don't
> need to do this because it is not actually activated yet when
> we return from hw_perf_group_sched_in(). Activation occurs
> really in hw_perf_enable().

Ah, indeed.

>
>
> static int x86_event_sched_in(struct perf_event *event,
> struct perf_cpu_context *cpuctx, int cpu)
> {
> int ret = 0;
>
> event->state = PERF_EVENT_STATE_ACTIVE;
> event->oncpu = cpu;
> event->tstamp_running += event->ctx->time - event->tstamp_stopped;
>
> if (is_software_event(event))
> ret = event->pmu->enable(event);
>
> if (!ret && !is_software_event(event))
> cpuctx->active_oncpu++;
>
> if (!ret && event->attr.exclusive)
> cpuctx->exclusive = 1;
>
> return ret;
> }
>
> static void x86_event_sched_out(struct perf_event *event,
> struct perf_cpu_context *cpuctx, int cpu)
> {
> event->state = PERF_EVENT_STATE_INACTIVE;
> event->oncpu = -1;
>
> event->tstamp_running -= event->ctx->time - event->tstamp_stopped;
>
> if (is_software_event(event))
> event->pmu->disable(event);
>
> if (!is_software_event(event))
> cpuctx->active_oncpu--;
>
> if (event->attr.exclusive || !cpuctx->active_oncpu)
> cpuctx->exclusive = 0;
> }

Yeah looks good.

Thanks.

2010-01-17 21:53:53

[permalink] [raw]

Subject: Re: [PATCH] perf_events: improve x86 event scheduling

hi,

Will repost a new version of the patch with these changes.

On Sun, Jan 17, 2010 at 5:19 PM, Frederic Weisbecker <[email protected]> wrote:
> On Sun, Jan 17, 2010 at 03:42:16PM +0100, Stephane Eranian wrote:
>> Frederic,
>>
>>
>> Here is what I have now in the x86 code.
>>
>> As for your comment on disabling the x86 event, we don't
>> need to do this because it is not actually activated yet when
>> we return from hw_perf_group_sched_in(). Activation occurs
>> really in hw_perf_enable().
>
>
> Ah, indeed.
>
>
>>
>>
>> static int x86_event_sched_in(struct perf_event *event,
>> struct perf_cpu_context *cpuctx, int cpu)
>> {
>> int ret = 0;
>>
>> event->state = PERF_EVENT_STATE_ACTIVE;
>> event->oncpu = cpu;
>> event->tstamp_running += event->ctx->time - event->tstamp_stopped;
>>
>> if (is_software_event(event))
>> ret = event->pmu->enable(event);
>>
>> if (!ret && !is_software_event(event))
>> cpuctx->active_oncpu++;
>>
>> if (!ret && event->attr.exclusive)
>> cpuctx->exclusive = 1;
>>
>> return ret;
>> }
>>
>> static void x86_event_sched_out(struct perf_event *event,
>> struct perf_cpu_context *cpuctx, int cpu)
>> {
>> event->state = PERF_EVENT_STATE_INACTIVE;
>> event->oncpu = -1;
>>
>> event->tstamp_running -= event->ctx->time - event->tstamp_stopped;
>>
>> if (is_software_event(event))
>> event->pmu->disable(event);
>>
>> if (!is_software_event(event))
>> cpuctx->active_oncpu--;
>>
>> if (event->attr.exclusive || !cpuctx->active_oncpu)
>> cpuctx->exclusive = 0;
>> }
>
>
>
> Yeah looks good.
>
> Thanks.
>
>

--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'Opéra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks

2010-01-18 11:13:40

[permalink] [raw]

Subject: Re: [PATCH] perf_events: improve x86 event scheduling

On Sun, 2010-01-17 at 15:12 +0100, Frederic Weisbecker wrote:

> You need to also call pmu->disable() if it is a software event,
> because a breakpoint needs to be unregistered in hardware level
> too.

breakpoint isn't a software pmu. But yeah, enable and disable need to
match.

2010-01-18 11:54:01

[permalink] [raw]

Subject: [PATCH] perf: fix the is_software_event() definition

On Mon, 2010-01-18 at 12:13 +0100, Peter Zijlstra wrote:
> On Sun, 2010-01-17 at 15:12 +0100, Frederic Weisbecker wrote:
>
> > You need to also call pmu->disable() if it is a software event,
> > because a breakpoint needs to be unregistered in hardware level
> > too.
>
> breakpoint isn't a software pmu. But yeah, enable and disable need to
> match.

That is, it shouldn't be a software pmu, because we assume software
events can always be scheduled, whereas that's definitely not so for the
breakpoint one.

Which seems to suggest the following

---
Subject: perf: fix the is_software_event() definition

When adding the breakpoint pmu Frederic forgot to exclude it from being
a software event. While we're at it, make it an inclusive expression.

Signed-off-by: Peter Zijlstra <[email protected]>
---
include/linux/perf_event.h | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c66b34f..835ba26 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -814,9 +814,13 @@ extern int perf_event_overflow(struct perf_event *event, int nmi,
*/
static inline int is_software_event(struct perf_event *event)
{
- return (event->attr.type != PERF_TYPE_RAW) &&
- (event->attr.type != PERF_TYPE_HARDWARE) &&
- (event->attr.type != PERF_TYPE_HW_CACHE);
+ switch (event->attr.type) {
+ case PERF_TYPE_SOFTWARE:
+ case PERF_TYPE_TRACEPOINT:
+ case PERF_TYPE_HW_CACHE:
+ return 1;
+ }
+ return 0;
}

extern atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];

2010-01-18 12:07:55

[permalink] [raw]

Subject: Re: [PATCH] perf: fix the is_software_event() definition

On Mon, Jan 18, 2010 at 12:53:36PM +0100, Peter Zijlstra wrote:
> On Mon, 2010-01-18 at 12:13 +0100, Peter Zijlstra wrote:
> > On Sun, 2010-01-17 at 15:12 +0100, Frederic Weisbecker wrote:
> >
> > > You need to also call pmu->disable() if it is a software event,
> > > because a breakpoint needs to be unregistered in hardware level
> > > too.
> >
> > breakpoint isn't a software pmu. But yeah, enable and disable need to
> > match.
>
> That is, it shouldn't be a software pmu, because we assume software
> events can always be scheduled, whereas that's definitely not so for the
> breakpoint one.
>
> Which seems to suggest the following
>
> ---
> Subject: perf: fix the is_software_event() definition
>
> When adding the breakpoint pmu Frederic forgot to exclude it from being
> a software event. While we're at it, make it an inclusive expression.
>
> Signed-off-by: Peter Zijlstra <[email protected]>

Agreed.

But then Stephane will need to update his patch and use
something else than is_software_event() to guess if an event
needs its pmu->enable/disable to be called.

A kind of helper that can tell: I am not handled by
hw_perf_group_sched_in()

But I suck too much in naming to propose something sane :)

2010-01-18 12:19:40

[permalink] [raw]

Subject: Re: [PATCH] perf: fix the is_software_event() definition

On Mon, 2010-01-18 at 13:07 +0100, Frederic Weisbecker wrote:

> Agreed.
>
> But then Stephane will need to update his patch and use
> something else than is_software_event() to guess if an event
> needs its pmu->enable/disable to be called.

Yes, that's what I told him before and even send a patch for, the name I
chose was is_x86_event().

http://lkml.org/lkml/2010/1/12/154

2010-01-18 12:53:45

[permalink] [raw]

Subject: Re: [perfmon2] [PATCH] perf: fix the is_software_event() definition

On Mon, Jan 18, 2010 at 12:53 PM, Peter Zijlstra <[email protected]> wrote:
> On Mon, 2010-01-18 at 12:13 +0100, Peter Zijlstra wrote:
>> On Sun, 2010-01-17 at 15:12 +0100, Frederic Weisbecker wrote:
>>
>> > You need to also call pmu->disable() if it is a software event,
>> > because a breakpoint needs to be unregistered in hardware level
>> > too.
>>
>> breakpoint isn't a software pmu. But yeah, enable and disable need to
>> match.
>
> That is, it shouldn't be a software pmu, because we assume software
> events can always be scheduled, whereas that's definitely not so for the
> breakpoint one.
>
> Which seems to suggest the following
>
> ---
> Subject: perf: fix the is_software_event() definition
>
> When adding the breakpoint pmu Frederic forgot to exclude it from being
> a software event. While we're at it, make it an inclusive expression.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> include/linux/perf_event.h | 10 +++++++---
> 1 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index c66b34f..835ba26 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -814,9 +814,13 @@ extern int perf_event_overflow(struct perf_event *event, int nmi,
> */
> static inline int is_software_event(struct perf_event *event)
> {
> - return (event->attr.type != PERF_TYPE_RAW) &&
> - (event->attr.type != PERF_TYPE_HARDWARE) &&
> - (event->attr.type != PERF_TYPE_HW_CACHE);
> + switch (event->attr.type) {
> + case PERF_TYPE_SOFTWARE:
> + case PERF_TYPE_TRACEPOINT:
> + case PERF_TYPE_HW_CACHE:
> + return 1;
> + }
> + return 0;
> }
>
> extern atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
>
PERF_TYPE_HW_CACHE is a hardware PMU event.

>
>
> ------------------------------------------------------------------------------
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for Conference
> attendees to learn about information security's most important issues through
> interactions with peers, luminaries and emerging and established companies.
> http://p.sf.net/sfu/rsaconf-dev2dev
> _______________________________________________
> perfmon2-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>

2010-01-18 12:57:36

[permalink] [raw]

Subject: Re: [perfmon2] [PATCH] perf: fix the is_software_event() definition

On Mon, Jan 18, 2010 at 1:07 PM, Frederic Weisbecker <[email protected]> wrote:
> On Mon, Jan 18, 2010 at 12:53:36PM +0100, Peter Zijlstra wrote:
>> On Mon, 2010-01-18 at 12:13 +0100, Peter Zijlstra wrote:
>> > On Sun, 2010-01-17 at 15:12 +0100, Frederic Weisbecker wrote:
>> >
>> > > You need to also call pmu->disable() if it is a software event,
>> > > because a breakpoint needs to be unregistered in hardware level
>> > > too.
>> >
>> > breakpoint isn't a software pmu. But yeah, enable and disable need to
>> > match.
>>
>> That is, it shouldn't be a software pmu, because we assume software
>> events can always be scheduled, whereas that's definitely not so for the
>> breakpoint one.
>>
>> Which seems to suggest the following
>>
>> ---
>> Subject: perf: fix the is_software_event() definition
>>
>> When adding the breakpoint pmu Frederic forgot to exclude it from being
>> a software event. While we're at it, make it an inclusive expression.
>>
>> Signed-off-by: Peter Zijlstra <[email protected]>
>
>
>
> Agreed.
>
> But then Stephane will need to update his patch and use
> something else than is_software_event() to guess if an event
> needs its pmu->enable/disable to be called.
>
> A kind of helper that can tell: I am not handled by
> hw_perf_group_sched_in()
>
Then, we should use something that checks if the event
is handled by the X86 PMU layer:

int is_x86_hw_event(struct perf_event *event)
{
return event->pmu == x86_pmu;
}

2010-01-18 13:01:17

[permalink] [raw]

Subject: Re: [perfmon2] [PATCH] perf: fix the is_software_event() definition

On Mon, 2010-01-18 at 13:53 +0100, stephane eranian wrote:

> PERF_TYPE_HW_CACHE is a hardware PMU event.

D'0h!

2010-01-18 13:06:34

[permalink] [raw]

Subject: Re: [perfmon2] [PATCH] perf: fix the is_software_event() definition

2010/1/18 stephane eranian <[email protected]>:
> On Mon, Jan 18, 2010 at 1:07 PM, Frederic Weisbecker <[email protected]> wrote:
>> On Mon, Jan 18, 2010 at 12:53:36PM +0100, Peter Zijlstra wrote:
>>> On Mon, 2010-01-18 at 12:13 +0100, Peter Zijlstra wrote:
>>> > On Sun, 2010-01-17 at 15:12 +0100, Frederic Weisbecker wrote:
>>> >
>>> > > You need to also call pmu->disable() if it is a software event,
>>> > > because a breakpoint needs to be unregistered in hardware level
>>> > > too.
>>> >
>>> > breakpoint isn't a software pmu. But yeah, enable and disable need to
>>> > match.
>>>
>>> That is, it shouldn't be a software pmu, because we assume software
>>> events can always be scheduled, whereas that's definitely not so for the
>>> breakpoint one.
>>>
>>> Which seems to suggest the following
>>>
>>> ---
>>> Subject: perf: fix the is_software_event() definition
>>>
>>> When adding the breakpoint pmu Frederic forgot to exclude it from being
>>> a software event. While we're at it, make it an inclusive expression.
>>>
>>> Signed-off-by: Peter Zijlstra <[email protected]>
>>
>>
>>
>> Agreed.
>>
>> But then Stephane will need to update his patch and use
>> something else than is_software_event() to guess if an event
>> needs its pmu->enable/disable to be called.
>>
>> A kind of helper that can tell: I am not handled by
>> hw_perf_group_sched_in()
>>
> Then, we should use something that checks if the event
> is handled by the X86 PMU layer:
>
> int is_x86_hw_event(struct perf_event *event)
> {
> ? return event->pmu == x86_pmu;
> }
>

Yeah. I missed this patch from Peter in its answer. Looks good.

2010-01-18 13:46:25