2009-10-06 14:43:22

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH 0/2] perf_events: correct event assignments on Intel processors

This series of patches corrects the assignment of events to counters
for the Intel X86 processors.

Not all events can be measured on any counter. Not all filters can
be used on fixed counters.

The current implementation silently returns bogus counts in case the
event is programmed into the wrong counter.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/include/asm/perf_event.h | 11 ++++
arch/x86/kernel/cpu/perf_event.c | 112 +++++++++++++++++++++++++++++++++++-
2 files changed, 119 insertions(+), 4 deletions(-)


2009-10-06 14:42:53

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH 1/2] perf_events: check for filters on fixed counter events

Intel fixed counters do not support all the filters
possible with a generic counter. Thus, if a fixed counter
event is passed but with certain filters set, then the
fixed_mode_idx() function must fail and the event must be
measured in a generic counter instead.

Reject filters are: inv, edge, cnt-mask

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/include/asm/perf_event.h | 11 +++++++++++
arch/x86/kernel/cpu/perf_event.c | 6 ++++++
2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index ad7ce3f..719273b 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -28,6 +28,17 @@
*/
#define ARCH_PERFMON_EVENT_MASK 0xffff

+/*
+ * filter mask to validate fixed counter events.
+ * the following filters disqualify for fixed counters:
+ * - inv
+ * - edge
+ * - cnt-mask
+ * The other filters are supported by fixed counters.
+ * The any-thread option is supported starting with v3.
+ */
+#define ARCH_PERFMON_EVENT_FILTER_MASK 0xff840000
+
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL 0x3c
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK (0x00 << 8)
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX 0
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index b5801c3..1d16bd6 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1349,6 +1349,12 @@ fixed_mode_idx(struct perf_event *event, struct hw_perf_event *hwc)
if (!x86_pmu.num_events_fixed)
return -1;

+ /*
+ * fixed counters do not take all possible filters
+ */
+ if (hwc->config & ARCH_PERFMON_EVENT_FILTER_MASK)
+ return -1;
+
if (unlikely(hw_event == x86_pmu.event_map(PERF_COUNT_HW_INSTRUCTIONS)))
return X86_PMC_IDX_FIXED_INSTRUCTIONS;
if (unlikely(hw_event == x86_pmu.event_map(PERF_COUNT_HW_CPU_CYCLES)))
--
1.5.4.3

2009-10-06 14:42:58

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH 2/2] perf_events: add event constraints support for Intel processors

On some Intel processors, not all events can be measured in
all counters. Some events can only be measured in one particular
counter, for instance. Assigning an event to the wrong counter
does not crash the machine but this yields bogus counts, i.e.,
silent error.

This patch changes the event to counter assignment logic to take
into account event constraints for Intel P6, Core and Nehalem
processors. There is no contraints on Intel Atom. There are
constraints on Intel Yonah (Core Duo) but they are not provided
in this patch given that this processor is not yet supported by
perf_events.

As a result of the constraints, it is possible for some event groups
to never actually be loaded onto the PMU if they contain two events
which can only be measured on a single counter. That situation can be
detected with the scaling information extracted with read().

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 106 ++++++++++++++++++++++++++++++++++++--
1 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 1d16bd6..06ca1b2 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -77,6 +77,18 @@ struct cpu_hw_events {
struct debug_store *ds;
};

+struct evt_cstr {
+ unsigned long idxmsk[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
+ int code;
+};
+
+#define EVT_CSTR0(c, m) { .code = (c), .idxmsk[0] = (m) }
+#define EVT_CSTR_END { .code = 0, .idxmsk[0] = 0 }
+
+#define for_each_evt_cstr(e, c) \
+ for((e) = (c); (e)->idxmsk[0]; (e)++)
+
+
/*
* struct x86_pmu - generic x86 pmu
*/
@@ -102,6 +114,7 @@ struct x86_pmu {
u64 intel_ctrl;
void (*enable_bts)(u64 config);
void (*disable_bts)(void);
+ int (*get_event_idx)(struct hw_perf_event *hwc);
};

static struct x86_pmu x86_pmu __read_mostly;
@@ -110,6 +123,8 @@ static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
.enabled = 1,
};

+static const struct evt_cstr *evt_cstr;
+
/*
* Not sure about some of these
*/
@@ -155,6 +170,15 @@ static u64 p6_pmu_raw_event(u64 hw_event)
return hw_event & P6_EVNTSEL_MASK;
}

+static const struct evt_cstr intel_p6_evt[]={
+ EVT_CSTR0(0xc1, 0x1), /* FLOPS */
+ EVT_CSTR0(0x10, 0x1), /* FP_COMP_OPS_EXE */
+ EVT_CSTR0(0x11, 0x1), /* FP_ASSIST */
+ EVT_CSTR0(0x12, 0x2), /* MUL */
+ EVT_CSTR0(0x13, 0x2), /* DIV */
+ EVT_CSTR0(0x14, 0x1), /* CYCLES_DIV_BUSY */
+ EVT_CSTR_END
+};

/*
* Intel PerfMon v3. Used on Core2 and later.
@@ -170,6 +194,33 @@ static const u64 intel_perfmon_event_map[] =
[PERF_COUNT_HW_BUS_CYCLES] = 0x013c,
};

+static const struct evt_cstr intel_core_evt[]={
+ EVT_CSTR0(0x10, 0x1), /* FP_COMP_OPS_EXE */
+ EVT_CSTR0(0x11, 0x2), /* FP_ASSIST */
+ EVT_CSTR0(0x12, 0x2), /* MUL */
+ EVT_CSTR0(0x13, 0x2), /* DIV */
+ EVT_CSTR0(0x14, 0x1), /* CYCLES_DIV_BUSY */
+ EVT_CSTR0(0x18, 0x1), /* IDLE_DURING_DIV */
+ EVT_CSTR0(0x19, 0x2), /* DELAYED_BYPASS */
+ EVT_CSTR0(0xa1, 0x1), /* RS_UOPS_DISPATCH_CYCLES */
+ EVT_CSTR0(0xcb, 0x1), /* MEM_LOAD_RETIRED */
+ EVT_CSTR_END
+};
+
+static const struct evt_cstr intel_nhm_evt[]={
+ EVT_CSTR0(0x40, 0x3), /* L1D_CACHE_LD */
+ EVT_CSTR0(0x41, 0x3), /* L1D_CACHE_ST */
+ EVT_CSTR0(0x42, 0x3), /* L1D_CACHE_LOCK */
+ EVT_CSTR0(0x43, 0x3), /* L1D_ALL_REF */
+ EVT_CSTR0(0x4e, 0x3), /* L1D_PREFETCH */
+ EVT_CSTR0(0x4c, 0x3), /* LOAD_HIT_PRE */
+ EVT_CSTR0(0x51, 0x3), /* L1D */
+ EVT_CSTR0(0x52, 0x3), /* L1D_CACHE_PREFETCH_LOCK_FB_HIT */
+ EVT_CSTR0(0x53, 0x3), /* L1D_CACHE_LOCK_FB_HIT */
+ EVT_CSTR0(0xc5, 0x3), /* CACHE_LOCK_CYCLES */
+ EVT_CSTR_END
+};
+
static u64 intel_pmu_event_map(int hw_event)
{
return intel_perfmon_event_map[hw_event];
@@ -932,6 +983,8 @@ static int __hw_perf_event_init(struct perf_event *event)
*/
hwc->config = ARCH_PERFMON_EVENTSEL_INT;

+ hwc->idx = -1;
+
/*
* Count user and OS events unless requested not to.
*/
@@ -1366,6 +1419,45 @@ fixed_mode_idx(struct perf_event *event, struct hw_perf_event *hwc)
}

/*
+ * generic counter allocator: get next free counter
+ */
+static int gen_get_event_idx(struct hw_perf_event *hwc)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ int idx;
+
+ idx = find_first_zero_bit(cpuc->used_mask, x86_pmu.num_events);
+ return idx == x86_pmu.num_events ? -1 : idx;
+}
+
+/*
+ * intel-specific counter allocator: check event constraints
+ */
+static int intel_get_event_idx(struct hw_perf_event *hwc)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ const struct evt_cstr *evt;
+ int i, code;
+
+ if (!evt_cstr)
+ goto skip;
+
+ code = hwc->config & 0xff;
+
+ for_each_evt_cstr(evt, evt_cstr) {
+ if (code == evt->code) {
+ for_each_bit(i, evt->idxmsk, X86_PMC_IDX_MAX) {
+ if (!test_and_set_bit(i, cpuc->used_mask))
+ return i;
+ }
+ return -1;
+ }
+ }
+skip:
+ return gen_get_event_idx(hwc);
+}
+
+/*
* Find a PMC slot for the freshly enabled / scheduled in event:
*/
static int x86_pmu_enable(struct perf_event *event)
@@ -1402,11 +1494,10 @@ static int x86_pmu_enable(struct perf_event *event)
} else {
idx = hwc->idx;
/* Try to get the previous generic event again */
- if (test_and_set_bit(idx, cpuc->used_mask)) {
+ if (idx == -1 || test_and_set_bit(idx, cpuc->used_mask)) {
try_generic:
- idx = find_first_zero_bit(cpuc->used_mask,
- x86_pmu.num_events);
- if (idx == x86_pmu.num_events)
+ idx = x86_pmu.get_event_idx(hwc);
+ if (idx == -1)
return -EAGAIN;

set_bit(idx, cpuc->used_mask);
@@ -1883,6 +1974,7 @@ static struct x86_pmu p6_pmu = {
*/
.event_bits = 32,
.event_mask = (1ULL << 32) - 1,
+ .get_event_idx = intel_get_event_idx,
};

static struct x86_pmu intel_pmu = {
@@ -1906,6 +1998,7 @@ static struct x86_pmu intel_pmu = {
.max_period = (1ULL << 31) - 1,
.enable_bts = intel_pmu_enable_bts,
.disable_bts = intel_pmu_disable_bts,
+ .get_event_idx = intel_get_event_idx,
};

static struct x86_pmu amd_pmu = {
@@ -1926,6 +2019,7 @@ static struct x86_pmu amd_pmu = {
.apic = 1,
/* use highest bit to detect overflow */
.max_period = (1ULL << 47) - 1,
+ .get_event_idx = gen_get_event_idx,
};

static int p6_pmu_init(void)
@@ -1938,10 +2032,12 @@ static int p6_pmu_init(void)
case 7:
case 8:
case 11: /* Pentium III */
+ evt_cstr = intel_p6_evt;
break;
case 9:
case 13:
/* Pentium M */
+ evt_cstr = intel_p6_evt;
break;
default:
pr_cont("unsupported p6 CPU model %d ",
@@ -2013,12 +2109,14 @@ static int intel_pmu_init(void)
sizeof(hw_cache_event_ids));

pr_cont("Core2 events, ");
+ evt_cstr = intel_core_evt;
break;
default:
case 26:
memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
sizeof(hw_cache_event_ids));

+ evt_cstr = intel_nhm_evt;
pr_cont("Nehalem/Corei7 events, ");
break;
case 28:
--
1.5.4.3

2009-10-06 16:29:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors

On Tue, 2009-10-06 at 16:42 +0200, Stephane Eranian wrote:

> This patch changes the event to counter assignment logic to take
> into account event constraints for Intel P6, Core and Nehalem
> processors. There is no contraints on Intel Atom. There are
> constraints on Intel Yonah (Core Duo) but they are not provided
> in this patch given that this processor is not yet supported by
> perf_events.

I don't think there's much missing for that, right?

I don't actually have that hardware, so I can't test it.

> As a result of the constraints, it is possible for some event groups
> to never actually be loaded onto the PMU if they contain two events
> which can only be measured on a single counter. That situation can be
> detected with the scaling information extracted with read().

Right, that's a pre existing bug in the x86 code (we can create groups
larger than the PMU) and should be fixed.

Patch looks nice though.

2009-10-06 17:26:45

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors

On Tue, Oct 6, 2009 at 6:29 PM, Peter Zijlstra <[email protected]> wrote:
> On Tue, 2009-10-06 at 16:42 +0200, Stephane Eranian wrote:
>
>>       This patch changes the event to counter assignment logic to take
>>       into account event constraints for Intel P6, Core and Nehalem
>>       processors. There is no contraints on Intel Atom. There are
>>       constraints on Intel Yonah (Core Duo) but they are not provided
>>       in this patch given that this processor is not yet supported by
>>       perf_events.
>
> I don't think there's much missing for that, right?
>
Yonah implements architectural perfmon v1. It has two generic
counters but no fixed counters. You already handled that part.
But what it does not have is all the GLOBAL_* MSR. You could
consider it as P6, but the difference is that the two counters can
be started and stopped independently. Given the organization of
the code, Yonah present a hybrid configuration. That is where
the problem is. So either:

- you go the architected PMU path, but you protect all accesses
to GLOBAL_*

- you go the P6 path, and you make the counters independent.


> I don't actually have that hardware, so I can't test it.
>
I don't have that either but I can find people who have that.

>>       As a result of the constraints, it is possible for some event groups
>>       to never actually be loaded onto the PMU if they contain two events
>>       which can only be measured on a single counter. That situation can be
>>       detected with the scaling information extracted with read().
>
> Right, that's a pre existing bug in the x86 code (we can create groups
> larger than the PMU) and should be fixed.
>
Nope, this happens event if you specify only two events on a two-counter
PMU. For instance, if I want to breakdown the number of multiplication
between user and kernel to compute a ratio. I would measure MUL twice:

perf stat -e MUL:u,MUL:k

Assuming that with perf you could express that you want those events grouped.
This would always yield zero. I am not using all the counters but my two events
compete for the same counter, here counter 0.

The problem is that for the tool it is hard to print some meaningful
messages to help
the user. You can detect something is wrong with the group because time_enabled
will be zero. But there are multiple reasons why this can happen.

But what is more problematic is if I build a group with an event
without a constraint
and one with a constraint. For instance, I want to measure BACLEARS and MUL in
the same group. If I make BACLEARS the group leader then the group will never
be loaded. That's because the assignment code looks at each event individually.
Thus it will assign BACLEARS to the first available counter, i.e.,
counter 0. Then it will
try to assign MUL, which can only run on counter 0, and it will always
fail. The assignment
code needs to look at groups not individual events. Then, it will be
able to find a working
assignment: MUL -> counter0, BACLEARS -> counter 1.

By design of this API, the user should never be concerned about
ordering the events
in a group a certain way to get a successful assignment to counters.
This should all
be handled by the kernel.

> Patch looks nice though.
>

Thanks.

2009-10-06 19:05:57

by Vince Weaver

[permalink] [raw]
Subject: Re: [perfmon2] [PATCH 2/2] perf_events: add event constraints support for Intel processors

On Tue, 6 Oct 2009, stephane eranian wrote:

> On Tue, Oct 6, 2009 at 6:29 PM, Peter Zijlstra <[email protected]> wrote:
> > On Tue, 2009-10-06 at 16:42 +0200, Stephane Eranian wrote:
>
> > I don't actually have that hardware, so I can't test it.
> >
> I don't have that either but I can find people who have that.

I have Yonah hardware and would be glad to run tests.

For what it's worth, the perfmon2 Yonah support (at least on the laptop I
have) stopped working after 2.6.23. It looked like a problem with
interrupts not being delivered properly. I never had time to track down
if it was a perfmon2 issue or a generic kernel issue. It was still broken
as of 2.6.29

Vince

2009-10-07 10:28:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors

On Tue, 2009-10-06 at 19:26 +0200, stephane eranian wrote:
> >> As a result of the constraints, it is possible for some event groups
> >> to never actually be loaded onto the PMU if they contain two events
> >> which can only be measured on a single counter. That situation can be
> >> detected with the scaling information extracted with read().
> >
> > Right, that's a pre existing bug in the x86 code (we can create groups
> > larger than the PMU) and should be fixed.
> >
> Nope, this happens event if you specify only two events on a two-counter
> PMU. For instance, if I want to breakdown the number of multiplication
> between user and kernel to compute a ratio. I would measure MUL twice:
>
> perf stat -e MUL:u,MUL:k
>
> Assuming that with perf you could express that you want those events grouped.
> This would always yield zero. I am not using all the counters but my two events
> compete for the same counter, here counter 0.

Right, so what I meant was, we could specify unschedulable groups by
adding more counters than present, these constraints make that worse.

> The problem is that for the tool it is hard to print some meaningful
> messages to help
> the user. You can detect something is wrong with the group because time_enabled
> will be zero. But there are multiple reasons why this can happen.

Something like the below patch (on top of your two patches, compile
tested only) would give better feedback by returning -ENOSPC when a
group doesn't fit (still relies on the counter order).

> But what is more problematic is if I build a group with an event
> without a constraint
> and one with a constraint. For instance, I want to measure BACLEARS and MUL in
> the same group. If I make BACLEARS the group leader then the group will never
> be loaded. That's because the assignment code looks at each event individually.
> Thus it will assign BACLEARS to the first available counter, i.e.,
> counter 0. Then it will
> try to assign MUL, which can only run on counter 0, and it will always
> fail. The assignment
> code needs to look at groups not individual events. Then, it will be
> able to find a working
> assignment: MUL -> counter0, BACLEARS -> counter 1.
>
> By design of this API, the user should never be concerned about
> ordering the events
> in a group a certain way to get a successful assignment to counters.
> This should all
> be handled by the kernel.

Agreed, the POWER implementation actually does this quite nicely, maybe
we should borrow some of its code for scheduling groups.


---
Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6/arch/x86/kernel/cpu/perf_event.c
@@ -114,7 +114,8 @@ struct x86_pmu {
u64 intel_ctrl;
void (*enable_bts)(u64 config);
void (*disable_bts)(void);
- int (*get_event_idx)(struct hw_perf_event *hwc);
+ int (*get_event_idx)(struct cpu_hw_events *cpuc,
+ struct hw_perf_event *hwc);
};

static struct x86_pmu x86_pmu __read_mostly;
@@ -520,7 +521,7 @@ static u64 intel_pmu_raw_event(u64 hw_ev
#define CORE_EVNTSEL_UNIT_MASK 0x0000FF00ULL
#define CORE_EVNTSEL_EDGE_MASK 0x00040000ULL
#define CORE_EVNTSEL_INV_MASK 0x00800000ULL
-#define CORE_EVNTSEL_REG_MASK 0xFF000000ULL
+#define CORE_EVNTSEL_REG_MASK 0xFF000000ULL

#define CORE_EVNTSEL_MASK \
(CORE_EVNTSEL_EVENT_MASK | \
@@ -1387,8 +1388,7 @@ static void amd_pmu_enable_event(struct
x86_pmu_enable_event(hwc, idx);
}

-static int
-fixed_mode_idx(struct perf_event *event, struct hw_perf_event *hwc)
+static int fixed_mode_idx(struct hw_perf_event *hwc)
{
unsigned int hw_event;

@@ -1421,9 +1421,9 @@ fixed_mode_idx(struct perf_event *event,
/*
* generic counter allocator: get next free counter
*/
-static int gen_get_event_idx(struct hw_perf_event *hwc)
+static int
+gen_get_event_idx(struct cpu_hw_events *cpuc, struct hw_perf_event *hwc)
{
- struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int idx;

idx = find_first_zero_bit(cpuc->used_mask, x86_pmu.num_events);
@@ -1433,16 +1433,16 @@ static int gen_get_event_idx(struct hw_p
/*
* intel-specific counter allocator: check event constraints
*/
-static int intel_get_event_idx(struct hw_perf_event *hwc)
+static int
+intel_get_event_idx(struct cpu_hw_events *cpuc, struct hw_perf_event *hwc)
{
- struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
const struct evt_cstr *evt;
int i, code;

if (!evt_cstr)
goto skip;

- code = hwc->config & 0xff;
+ code = hwc->config & CORE_EVNTSEL_EVENT_MASK;

for_each_evt_cstr(evt, evt_cstr) {
if (code == evt->code) {
@@ -1454,26 +1454,22 @@ static int intel_get_event_idx(struct hw
}
}
skip:
- return gen_get_event_idx(hwc);
+ return gen_get_event_idx(cpuc, hwc);
}

-/*
- * Find a PMC slot for the freshly enabled / scheduled in event:
- */
-static int x86_pmu_enable(struct perf_event *event)
+static int
+x86_schedule_event(struct cpu_hw_events *cpuc, struct hw_perf_event *hwc)
{
- struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
- struct hw_perf_event *hwc = &event->hw;
int idx;

- idx = fixed_mode_idx(event, hwc);
+ idx = fixed_mode_idx(hwc);
if (idx == X86_PMC_IDX_FIXED_BTS) {
/* BTS is already occupied. */
if (test_and_set_bit(idx, cpuc->used_mask))
return -EAGAIN;

hwc->config_base = 0;
- hwc->event_base = 0;
+ hwc->event_base = 0;
hwc->idx = idx;
} else if (idx >= 0) {
/*
@@ -1496,17 +1492,33 @@ static int x86_pmu_enable(struct perf_ev
/* Try to get the previous generic event again */
if (idx == -1 || test_and_set_bit(idx, cpuc->used_mask)) {
try_generic:
- idx = x86_pmu.get_event_idx(hwc);
+ idx = x86_pmu.get_event_idx(cpuc, hwc);
if (idx == -1)
return -EAGAIN;

set_bit(idx, cpuc->used_mask);
hwc->idx = idx;
}
- hwc->config_base = x86_pmu.eventsel;
- hwc->event_base = x86_pmu.perfctr;
+ hwc->config_base = x86_pmu.eventsel;
+ hwc->event_base = x86_pmu.perfctr;
}

+ return idx;
+}
+
+/*
+ * Find a PMC slot for the freshly enabled / scheduled in event:
+ */
+static int x86_pmu_enable(struct perf_event *event)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct hw_perf_event *hwc = &event->hw;
+ int idx;
+
+ idx = x86_schedule_event(cpuc, hwc);
+ if (idx < 0)
+ return idx;
+
perf_events_lapic_init();

x86_pmu.disable(hwc, idx);
@@ -2209,11 +2221,47 @@ static const struct pmu pmu = {
.unthrottle = x86_pmu_unthrottle,
};

+static int
+validate_event(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+ struct hw_perf_event fake_event = event->hw;
+
+ if (event->pmu != &pmu)
+ return 0;
+
+ return x86_schedule_event(cpuc, &fake_event);
+}
+
+static int validate_group(struct perf_event *event)
+{
+ struct perf_event *sibling, *leader = event->group_leader;
+ struct cpu_hw_events fake_pmu;
+
+ memset(&fake_pmu, 0, sizeof(fake_pmu));
+
+ if (!validate_event(&fake_pmu, leader))
+ return -ENOSPC;
+
+ list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
+ if (!validate_event(&fake_pmu, sibling))
+ return -ENOSPC;
+ }
+
+ if (!validate_event(&fake_pmu, event))
+ return -ENOSPC;
+
+ return 0;
+}
+
const struct pmu *hw_perf_event_init(struct perf_event *event)
{
int err;

err = __hw_perf_event_init(event);
+ if (!err) {
+ if (event->group_leader != event)
+ err = validate_group(event);
+ }
if (err) {
if (event->destroy)
event->destroy(event);

2009-10-07 11:15:53

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors

Peter Zijlstra writes:

> > By design of this API, the user should never be concerned about
> > ordering the events
> > in a group a certain way to get a successful assignment to counters.
> > This should all
> > be handled by the kernel.
>
> Agreed, the POWER implementation actually does this quite nicely, maybe
> we should borrow some of its code for scheduling groups.

Yeah, I'm quite pleased with how that code turned out, and I'd be
happy to help adapt it for other architectures. The one design
handles all the POWER PMUs from POWER4 with multiple layers of event
multiplexers feeding an event bus (and some events available through
more than one multiplexer) through to the much simpler and more
straightforward POWER7.

Paul.

2009-10-07 12:32:38

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors

Paul,

On Wed, Oct 7, 2009 at 1:15 PM, Paul Mackerras <[email protected]> wrote:
> Peter Zijlstra writes:
>
>> > By design of this API, the user should never be concerned about
>> > ordering the events
>> > in a group a certain way to get a successful assignment to counters.
>> > This should all
>> > be handled by the kernel.
>>
>> Agreed, the POWER implementation actually does this quite nicely, maybe
>> we should borrow some of its code for scheduling groups.
>
> Yeah, I'm quite pleased with how that code turned out, and I'd be
> happy to help adapt it for other architectures.  The one design
> handles all the POWER PMUs from POWER4 with multiple layers of event
> multiplexers feeding an event bus (and some events available through
> more than one multiplexer) through to the much simpler and more
> straightforward POWER7.
>
I am not an expert on PPC PMU register constraints but I took a quick look
at the code and in particular hw_perf_enable() where the action seems to be.

Given that in kernel/perf_events.c, the PMU specific layer is invoked on a per
event basis in event_sched_in(), you need to have a way to look at the registers
you have already assigned. I think this is what PPC does. it stops the PMU and
re-runs the assignment code. But for that it needs to maintains a
per-cpu structure
which has the current event -> counter assignment.

What PPC does is probably the only way to do this given the interface between
generic and machine-specific code. The one advantage I see is that it works
inside an event group but also across event groups because that code does not
look at group boundary, it only looks at the events and the number of available
registers. The downside is that you duplicate state.

Did I get this right, Paul?

2009-10-07 20:46:33

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors

From: stephane eranian <[email protected]>
Date: Wed, 7 Oct 2009 14:31:58 +0200

> What PPC does is probably the only way to do this given the interface between
> generic and machine-specific code. The one advantage I see is that it works
> inside an event group but also across event groups because that code does not
> look at group boundary, it only looks at the events and the number of available
> registers. The downside is that you duplicate state.
>
> Did I get this right, Paul?

That's basically how his code works, yes. I intend on duplicating
it to some extent on sparc64 since I'm operating in a similar
problem space.

So if at least some of this engine went to a generic place, there'd
be at least a 3rd user :-)

2009-10-07 21:30:58

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors

On Wed, Oct 7, 2009 at 10:46 PM, David Miller <[email protected]> wrote:
> From: stephane eranian <[email protected]>
> Date: Wed, 7 Oct 2009 14:31:58 +0200
>
>> What PPC does is probably the only way to do this given the interface between
>> generic and machine-specific code. The one advantage I see is that it works
>> inside an event group but also across event groups because that code does not
>> look at group boundary, it only looks at the events and the number of available
>> registers. The downside is that you duplicate state.
>>
>> Did I get this right, Paul?
>
> That's basically how his code works, yes.  I intend on duplicating
> it to some extent on sparc64 since I'm operating in a similar
> problem space.
>
> So if at least some of this engine went to a generic place, there'd
> be at least a 3rd user :-)

Based on my experience, the assignment problem is best handled
in the architecture or model specific code. On some PMU models,
it is way more complicated than just two events competing for the
same counter. That's the case on Itanium, for instance. And that
is also the case with AMD64 Northbridge events.

Something I forgot to mention earlier is that when you re-run the
assignment code for the new event, no already assigned event
can be kicked out in favor of the new event. An existing event
can be moved to another counter at worst. Otherwise, you will
evict an event which, the generic layer thinks, is currently running.



>

2009-10-08 20:09:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors


* David Miller <[email protected]> wrote:

> From: stephane eranian <[email protected]>
> Date: Wed, 7 Oct 2009 14:31:58 +0200
>
> > What PPC does is probably the only way to do this given the interface between
> > generic and machine-specific code. The one advantage I see is that it works
> > inside an event group but also across event groups because that code does not
> > look at group boundary, it only looks at the events and the number of available
> > registers. The downside is that you duplicate state.
> >
> > Did I get this right, Paul?
>
> That's basically how his code works, yes. I intend on duplicating it
> to some extent on sparc64 since I'm operating in a similar problem
> space.
>
> So if at least some of this engine went to a generic place, there'd be
> at least a 3rd user :-)

Yeah, i'd definitely suggest to generalize this. We've missed updating
PowerPC lowlevel details a couple of times in perf core updates, just
because it's in a non-obvious place. Even if it's used by just a single
arch, generic code is much more visible.

PowerPC really has this somewhat somewhat weird track record of
privatizing generic facilities and smugly keeping it to themselves as a
competitive advantage ;-) Reminds me of the old semaphore code which was
the best on PowerPC, for years. Lets not go there again :)

Ingo

2009-10-08 20:29:10

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors

On Thu, Oct 8, 2009 at 10:08 PM, Ingo Molnar <[email protected]> wrote:
>
> * David Miller <[email protected]> wrote:
>
>> From: stephane eranian <[email protected]>
>> Date: Wed, 7 Oct 2009 14:31:58 +0200
>>
>> > What PPC does is probably the only way to do this given the interface between
>> > generic and machine-specific code. The one advantage I see is that it works
>> > inside an event group but also across event groups because that code does not
>> > look at group boundary, it only looks at the events and the number of available
>> > registers. The downside is that you duplicate state.
>> >
>> > Did I get this right, Paul?
>>
>> That's basically how his code works, yes.  I intend on duplicating it
>> to some extent on sparc64 since I'm operating in a similar problem
>> space.
>>
>> So if at least some of this engine went to a generic place, there'd be
>> at least a 3rd user :-)
>
> Yeah, i'd definitely suggest to generalize this. We've missed updating
> PowerPC lowlevel details a couple of times in perf core updates, just
> because it's in a non-obvious place. Even if it's used by just a single
> arch, generic code is much more visible.
>

It is not clear how you can do this without creating a monster.
As I said the constraints can be far more difficult than just
event X => allowed_counter_bitmask.

2009-10-08 23:19:01

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors

stephane eranian writes:

> I am not an expert on PPC PMU register constraints but I took a quick look
> at the code and in particular hw_perf_enable() where the action seems to be.
>
> Given that in kernel/perf_events.c, the PMU specific layer is invoked on a per
> event basis in event_sched_in(), you need to have a way to look at the registers
> you have already assigned. I think this is what PPC does. it stops the PMU and
> re-runs the assignment code. But for that it needs to maintains a
> per-cpu structure
> which has the current event -> counter assignment.

The idea is that when switching contexts, the core code does
hw_perf_disable, then calls hw_perf_group_sched_in for each group that
it wants to have on the PMU, then calls hw_perf_enable. So what the
powerpc code does is to defer the actual assignment of perf_events to
hardware counters until the hw_perf_enable call.

As each group is added, I do the constraint checking to ensure that
the group can go on, but I don't do the assignment of perf_events to
hardware counters or the computation of PMU control register values.
I have a way of encoding all the constraints into a pair of 64-bit
values for each event such that I can tell very quickly (using only
some quick integer arithmetic) whether it's possible to add a given
event to the set that are on the PMU without violating any
constraints.

There is a bit of extra complexity that comes in because there are
sometimes alternative event codes for the same event. So as each
event is added to the set to go on the PMU, if the initial constraint
check indicates that it can't go on, I then go and do a search over
the space of alternative codes (for all of the events currently in the
set plus the one I want to add) to see if there's a way to get
everything on using alternative codes for some events. That sounds
expensive but it turns out not to be because only a few events have
alternative codes, and there are generally only a couple of
alternative codes for those events.

The event codes that I use encode settings for the various
multiplexers plus an indication of what set of counters the event can
be counted on. If an event can be counted on all or some subset of
counters with the same settings for all the relevant multiplexers,
then I use a single code for it. If an event can be counted for
example on hardware counter 1 with selector code 0xf0, or hardware
counter 2 with selector code 0x12, then I use two alternative event
codes for that event.

So this all means that I can map an event code into two 64-bit
values -- a value/mask pair. That mapping is processor-specific, but
the code that checks whether a set of events is feasible is generic.
The idea is that the 64-bit value/mask pair is divided into bitfields,
each of which describes one constraint. The big comment at the end of
arch/powerpc/include/asm/perf_event.h describes the three different
types of constraints that can be represented and how that works as a
bitfield. It turns out that this is very powerful and very fast,
since the constraint checking is just a few adds, ands and ors, done
on the whole 64-bit value/mask pairs (there is no need to iterate over
individual bitfields).

I hope this makes it a bit clearer. Let me know if I need to expand
further.

Paul.

2009-10-09 13:56:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors


* Stephane Eranian <[email protected]> wrote:

> +struct evt_cstr {
> + unsigned long idxmsk[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
> + int code;
> +};
> +
> +#define EVT_CSTR0(c, m) { .code = (c), .idxmsk[0] = (m) }
> +#define EVT_CSTR_END { .code = 0, .idxmsk[0] = 0 }
> +
> +#define for_each_evt_cstr(e, c) \
> + for((e) = (c); (e)->idxmsk[0]; (e)++)

Nice patch - but the naming here absolutely sucked, so i changed
evt_cstr, idxmsk, CSTR, etc. to something more palatable. Field names
and abstractions in Linux code really need to be meaningful, and the
code has to be readable and understandable. Wdntusabbrntslkthtinlnx :)

Ingo

2009-10-09 14:23:16

by Stephane Eranian

[permalink] [raw]
Subject: [tip:perf/core] perf_events: Check for filters on fixed counter events

Commit-ID: 04a705df47d1ea27ca2b066f24b1951c51792d0d
Gitweb: http://git.kernel.org/tip/04a705df47d1ea27ca2b066f24b1951c51792d0d
Author: Stephane Eranian <[email protected]>
AuthorDate: Tue, 6 Oct 2009 16:42:08 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 9 Oct 2009 15:56:10 +0200

perf_events: Check for filters on fixed counter events

Intel fixed counters do not support all the filters possible with a
generic counter. Thus, if a fixed counter event is passed but with
certain filters set, then the fixed_mode_idx() function must fail
and the event must be measured in a generic counter instead.

Reject filters are: inv, edge, cnt-mask.

Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/perf_event.h | 13 ++++++++++++-
arch/x86/kernel/cpu/perf_event.c | 6 ++++++
2 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index ad7ce3f..8d9f854 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -28,9 +28,20 @@
*/
#define ARCH_PERFMON_EVENT_MASK 0xffff

+/*
+ * filter mask to validate fixed counter events.
+ * the following filters disqualify for fixed counters:
+ * - inv
+ * - edge
+ * - cnt-mask
+ * The other filters are supported by fixed counters.
+ * The any-thread option is supported starting with v3.
+ */
+#define ARCH_PERFMON_EVENT_FILTER_MASK 0xff840000
+
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL 0x3c
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK (0x00 << 8)
-#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX 0
+#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX 0
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_PRESENT \
(1 << (ARCH_PERFMON_UNHALTED_CORE_CYCLES_INDEX))

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index b5801c3..1d16bd6 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1349,6 +1349,12 @@ fixed_mode_idx(struct perf_event *event, struct hw_perf_event *hwc)
if (!x86_pmu.num_events_fixed)
return -1;

+ /*
+ * fixed counters do not take all possible filters
+ */
+ if (hwc->config & ARCH_PERFMON_EVENT_FILTER_MASK)
+ return -1;
+
if (unlikely(hw_event == x86_pmu.event_map(PERF_COUNT_HW_INSTRUCTIONS)))
return X86_PMC_IDX_FIXED_INSTRUCTIONS;
if (unlikely(hw_event == x86_pmu.event_map(PERF_COUNT_HW_CPU_CYCLES)))

2009-10-09 14:23:32

by Stephane Eranian

[permalink] [raw]
Subject: [tip:perf/core] perf_events: Add event constraints support for Intel processors

Commit-ID: b690081d4d3f6a23541493f1682835c3cd5c54a1
Gitweb: http://git.kernel.org/tip/b690081d4d3f6a23541493f1682835c3cd5c54a1
Author: Stephane Eranian <[email protected]>
AuthorDate: Tue, 6 Oct 2009 16:42:09 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 9 Oct 2009 15:56:12 +0200

perf_events: Add event constraints support for Intel processors

On some Intel processors, not all events can be measured in all
counters. Some events can only be measured in one particular
counter, for instance. Assigning an event to the wrong counter does
not crash the machine but this yields bogus counts, i.e., silent
error.

This patch changes the event to counter assignment logic to take
into account event constraints for Intel P6, Core and Nehalem
processors. There is no contraints on Intel Atom. There are
constraints on Intel Yonah (Core Duo) but they are not provided in
this patch given that this processor is not yet supported by
perf_events.

As a result of the constraints, it is possible for some event
groups to never actually be loaded onto the PMU if they contain two
events which can only be measured on a single counter. That
situation can be detected with the scaling information extracted
with read().

Signed-off-by: Stephane Eranian <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 109 ++++++++++++++++++++++++++++++++++++--
1 files changed, 105 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 1d16bd6..9c75854 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -77,6 +77,18 @@ struct cpu_hw_events {
struct debug_store *ds;
};

+struct event_constraint {
+ unsigned long idxmsk[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
+ int code;
+};
+
+#define EVENT_CONSTRAINT(c, m) { .code = (c), .idxmsk[0] = (m) }
+#define EVENT_CONSTRAINT_END { .code = 0, .idxmsk[0] = 0 }
+
+#define for_each_event_constraint(e, c) \
+ for ((e) = (c); (e)->idxmsk[0]; (e)++)
+
+
/*
* struct x86_pmu - generic x86 pmu
*/
@@ -102,6 +114,7 @@ struct x86_pmu {
u64 intel_ctrl;
void (*enable_bts)(u64 config);
void (*disable_bts)(void);
+ int (*get_event_idx)(struct hw_perf_event *hwc);
};

static struct x86_pmu x86_pmu __read_mostly;
@@ -110,6 +123,8 @@ static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
.enabled = 1,
};

+static const struct event_constraint *event_constraint;
+
/*
* Not sure about some of these
*/
@@ -155,6 +170,16 @@ static u64 p6_pmu_raw_event(u64 hw_event)
return hw_event & P6_EVNTSEL_MASK;
}

+static const struct event_constraint intel_p6_event_constraints[] =
+{
+ EVENT_CONSTRAINT(0xc1, 0x1), /* FLOPS */
+ EVENT_CONSTRAINT(0x10, 0x1), /* FP_COMP_OPS_EXE */
+ EVENT_CONSTRAINT(0x11, 0x1), /* FP_ASSIST */
+ EVENT_CONSTRAINT(0x12, 0x2), /* MUL */
+ EVENT_CONSTRAINT(0x13, 0x2), /* DIV */
+ EVENT_CONSTRAINT(0x14, 0x1), /* CYCLES_DIV_BUSY */
+ EVENT_CONSTRAINT_END
+};

/*
* Intel PerfMon v3. Used on Core2 and later.
@@ -170,6 +195,35 @@ static const u64 intel_perfmon_event_map[] =
[PERF_COUNT_HW_BUS_CYCLES] = 0x013c,
};

+static const struct event_constraint intel_core_event_constraints[] =
+{
+ EVENT_CONSTRAINT(0x10, 0x1), /* FP_COMP_OPS_EXE */
+ EVENT_CONSTRAINT(0x11, 0x2), /* FP_ASSIST */
+ EVENT_CONSTRAINT(0x12, 0x2), /* MUL */
+ EVENT_CONSTRAINT(0x13, 0x2), /* DIV */
+ EVENT_CONSTRAINT(0x14, 0x1), /* CYCLES_DIV_BUSY */
+ EVENT_CONSTRAINT(0x18, 0x1), /* IDLE_DURING_DIV */
+ EVENT_CONSTRAINT(0x19, 0x2), /* DELAYED_BYPASS */
+ EVENT_CONSTRAINT(0xa1, 0x1), /* RS_UOPS_DISPATCH_CYCLES */
+ EVENT_CONSTRAINT(0xcb, 0x1), /* MEM_LOAD_RETIRED */
+ EVENT_CONSTRAINT_END
+};
+
+static const struct event_constraint intel_nehalem_event_constraints[] =
+{
+ EVENT_CONSTRAINT(0x40, 0x3), /* L1D_CACHE_LD */
+ EVENT_CONSTRAINT(0x41, 0x3), /* L1D_CACHE_ST */
+ EVENT_CONSTRAINT(0x42, 0x3), /* L1D_CACHE_LOCK */
+ EVENT_CONSTRAINT(0x43, 0x3), /* L1D_ALL_REF */
+ EVENT_CONSTRAINT(0x4e, 0x3), /* L1D_PREFETCH */
+ EVENT_CONSTRAINT(0x4c, 0x3), /* LOAD_HIT_PRE */
+ EVENT_CONSTRAINT(0x51, 0x3), /* L1D */
+ EVENT_CONSTRAINT(0x52, 0x3), /* L1D_CACHE_PREFETCH_LOCK_FB_HIT */
+ EVENT_CONSTRAINT(0x53, 0x3), /* L1D_CACHE_LOCK_FB_HIT */
+ EVENT_CONSTRAINT(0xc5, 0x3), /* CACHE_LOCK_CYCLES */
+ EVENT_CONSTRAINT_END
+};
+
static u64 intel_pmu_event_map(int hw_event)
{
return intel_perfmon_event_map[hw_event];
@@ -932,6 +986,8 @@ static int __hw_perf_event_init(struct perf_event *event)
*/
hwc->config = ARCH_PERFMON_EVENTSEL_INT;

+ hwc->idx = -1;
+
/*
* Count user and OS events unless requested not to.
*/
@@ -1366,6 +1422,45 @@ fixed_mode_idx(struct perf_event *event, struct hw_perf_event *hwc)
}

/*
+ * generic counter allocator: get next free counter
+ */
+static int gen_get_event_idx(struct hw_perf_event *hwc)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ int idx;
+
+ idx = find_first_zero_bit(cpuc->used_mask, x86_pmu.num_events);
+ return idx == x86_pmu.num_events ? -1 : idx;
+}
+
+/*
+ * intel-specific counter allocator: check event constraints
+ */
+static int intel_get_event_idx(struct hw_perf_event *hwc)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ const struct event_constraint *event_constraint;
+ int i, code;
+
+ if (!event_constraint)
+ goto skip;
+
+ code = hwc->config & 0xff;
+
+ for_each_event_constraint(event_constraint, event_constraint) {
+ if (code == event_constraint->code) {
+ for_each_bit(i, event_constraint->idxmsk, X86_PMC_IDX_MAX) {
+ if (!test_and_set_bit(i, cpuc->used_mask))
+ return i;
+ }
+ return -1;
+ }
+ }
+skip:
+ return gen_get_event_idx(hwc);
+}
+
+/*
* Find a PMC slot for the freshly enabled / scheduled in event:
*/
static int x86_pmu_enable(struct perf_event *event)
@@ -1402,11 +1497,10 @@ static int x86_pmu_enable(struct perf_event *event)
} else {
idx = hwc->idx;
/* Try to get the previous generic event again */
- if (test_and_set_bit(idx, cpuc->used_mask)) {
+ if (idx == -1 || test_and_set_bit(idx, cpuc->used_mask)) {
try_generic:
- idx = find_first_zero_bit(cpuc->used_mask,
- x86_pmu.num_events);
- if (idx == x86_pmu.num_events)
+ idx = x86_pmu.get_event_idx(hwc);
+ if (idx == -1)
return -EAGAIN;

set_bit(idx, cpuc->used_mask);
@@ -1883,6 +1977,7 @@ static struct x86_pmu p6_pmu = {
*/
.event_bits = 32,
.event_mask = (1ULL << 32) - 1,
+ .get_event_idx = intel_get_event_idx,
};

static struct x86_pmu intel_pmu = {
@@ -1906,6 +2001,7 @@ static struct x86_pmu intel_pmu = {
.max_period = (1ULL << 31) - 1,
.enable_bts = intel_pmu_enable_bts,
.disable_bts = intel_pmu_disable_bts,
+ .get_event_idx = intel_get_event_idx,
};

static struct x86_pmu amd_pmu = {
@@ -1926,6 +2022,7 @@ static struct x86_pmu amd_pmu = {
.apic = 1,
/* use highest bit to detect overflow */
.max_period = (1ULL << 47) - 1,
+ .get_event_idx = gen_get_event_idx,
};

static int p6_pmu_init(void)
@@ -1938,10 +2035,12 @@ static int p6_pmu_init(void)
case 7:
case 8:
case 11: /* Pentium III */
+ event_constraint = intel_p6_event_constraints;
break;
case 9:
case 13:
/* Pentium M */
+ event_constraint = intel_p6_event_constraints;
break;
default:
pr_cont("unsupported p6 CPU model %d ",
@@ -2013,12 +2112,14 @@ static int intel_pmu_init(void)
sizeof(hw_cache_event_ids));

pr_cont("Core2 events, ");
+ event_constraint = intel_core_event_constraints;
break;
default:
case 26:
memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
sizeof(hw_cache_event_ids));

+ event_constraint = intel_nehalem_event_constraints;
pr_cont("Nehalem/Corei7 events, ");
break;
case 28:

2009-10-09 14:23:39

by Peter Zijlstra

[permalink] [raw]
Subject: [tip:perf/core] perf, x86: Add simple group validation

Commit-ID: fe9081cc9bdabb0be953a39ad977cea14e35bce5
Gitweb: http://git.kernel.org/tip/fe9081cc9bdabb0be953a39ad977cea14e35bce5
Author: Peter Zijlstra <[email protected]>
AuthorDate: Thu, 8 Oct 2009 11:56:07 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 9 Oct 2009 15:56:14 +0200

perf, x86: Add simple group validation

Refuse to add events when the group wouldn't fit onto the PMU
anymore.

Naive implementation.

Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
LKML-Reference: <1254911461.26976.239.camel@twins>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 90 +++++++++++++++++++++++++++++---------
1 files changed, 69 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 9c75854..9961d84 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -114,7 +114,8 @@ struct x86_pmu {
u64 intel_ctrl;
void (*enable_bts)(u64 config);
void (*disable_bts)(void);
- int (*get_event_idx)(struct hw_perf_event *hwc);
+ int (*get_event_idx)(struct cpu_hw_events *cpuc,
+ struct hw_perf_event *hwc);
};

static struct x86_pmu x86_pmu __read_mostly;
@@ -523,7 +524,7 @@ static u64 intel_pmu_raw_event(u64 hw_event)
#define CORE_EVNTSEL_UNIT_MASK 0x0000FF00ULL
#define CORE_EVNTSEL_EDGE_MASK 0x00040000ULL
#define CORE_EVNTSEL_INV_MASK 0x00800000ULL
-#define CORE_EVNTSEL_REG_MASK 0xFF000000ULL
+#define CORE_EVNTSEL_REG_MASK 0xFF000000ULL

#define CORE_EVNTSEL_MASK \
(CORE_EVNTSEL_EVENT_MASK | \
@@ -1390,8 +1391,7 @@ static void amd_pmu_enable_event(struct hw_perf_event *hwc, int idx)
x86_pmu_enable_event(hwc, idx);
}

-static int
-fixed_mode_idx(struct perf_event *event, struct hw_perf_event *hwc)
+static int fixed_mode_idx(struct hw_perf_event *hwc)
{
unsigned int hw_event;

@@ -1424,9 +1424,9 @@ fixed_mode_idx(struct perf_event *event, struct hw_perf_event *hwc)
/*
* generic counter allocator: get next free counter
*/
-static int gen_get_event_idx(struct hw_perf_event *hwc)
+static int
+gen_get_event_idx(struct cpu_hw_events *cpuc, struct hw_perf_event *hwc)
{
- struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int idx;

idx = find_first_zero_bit(cpuc->used_mask, x86_pmu.num_events);
@@ -1436,16 +1436,16 @@ static int gen_get_event_idx(struct hw_perf_event *hwc)
/*
* intel-specific counter allocator: check event constraints
*/
-static int intel_get_event_idx(struct hw_perf_event *hwc)
+static int
+intel_get_event_idx(struct cpu_hw_events *cpuc, struct hw_perf_event *hwc)
{
- struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
const struct event_constraint *event_constraint;
int i, code;

if (!event_constraint)
goto skip;

- code = hwc->config & 0xff;
+ code = hwc->config & CORE_EVNTSEL_EVENT_MASK;

for_each_event_constraint(event_constraint, event_constraint) {
if (code == event_constraint->code) {
@@ -1457,26 +1457,22 @@ static int intel_get_event_idx(struct hw_perf_event *hwc)
}
}
skip:
- return gen_get_event_idx(hwc);
+ return gen_get_event_idx(cpuc, hwc);
}

-/*
- * Find a PMC slot for the freshly enabled / scheduled in event:
- */
-static int x86_pmu_enable(struct perf_event *event)
+static int
+x86_schedule_event(struct cpu_hw_events *cpuc, struct hw_perf_event *hwc)
{
- struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
- struct hw_perf_event *hwc = &event->hw;
int idx;

- idx = fixed_mode_idx(event, hwc);
+ idx = fixed_mode_idx(hwc);
if (idx == X86_PMC_IDX_FIXED_BTS) {
/* BTS is already occupied. */
if (test_and_set_bit(idx, cpuc->used_mask))
return -EAGAIN;

hwc->config_base = 0;
- hwc->event_base = 0;
+ hwc->event_base = 0;
hwc->idx = idx;
} else if (idx >= 0) {
/*
@@ -1499,17 +1495,33 @@ static int x86_pmu_enable(struct perf_event *event)
/* Try to get the previous generic event again */
if (idx == -1 || test_and_set_bit(idx, cpuc->used_mask)) {
try_generic:
- idx = x86_pmu.get_event_idx(hwc);
+ idx = x86_pmu.get_event_idx(cpuc, hwc);
if (idx == -1)
return -EAGAIN;

set_bit(idx, cpuc->used_mask);
hwc->idx = idx;
}
- hwc->config_base = x86_pmu.eventsel;
- hwc->event_base = x86_pmu.perfctr;
+ hwc->config_base = x86_pmu.eventsel;
+ hwc->event_base = x86_pmu.perfctr;
}

+ return idx;
+}
+
+/*
+ * Find a PMC slot for the freshly enabled / scheduled in event:
+ */
+static int x86_pmu_enable(struct perf_event *event)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct hw_perf_event *hwc = &event->hw;
+ int idx;
+
+ idx = x86_schedule_event(cpuc, hwc);
+ if (idx < 0)
+ return idx;
+
perf_events_lapic_init();

x86_pmu.disable(hwc, idx);
@@ -2212,11 +2224,47 @@ static const struct pmu pmu = {
.unthrottle = x86_pmu_unthrottle,
};

+static int
+validate_event(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+ struct hw_perf_event fake_event = event->hw;
+
+ if (event->pmu != &pmu)
+ return 0;
+
+ return x86_schedule_event(cpuc, &fake_event);
+}
+
+static int validate_group(struct perf_event *event)
+{
+ struct perf_event *sibling, *leader = event->group_leader;
+ struct cpu_hw_events fake_pmu;
+
+ memset(&fake_pmu, 0, sizeof(fake_pmu));
+
+ if (!validate_event(&fake_pmu, leader))
+ return -ENOSPC;
+
+ list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
+ if (!validate_event(&fake_pmu, sibling))
+ return -ENOSPC;
+ }
+
+ if (!validate_event(&fake_pmu, event))
+ return -ENOSPC;
+
+ return 0;
+}
+
const struct pmu *hw_perf_event_init(struct perf_event *event)
{
int err;

err = __hw_perf_event_init(event);
+ if (!err) {
+ if (event->group_leader != event)
+ err = validate_group(event);
+ }
if (err) {
if (event->destroy)
event->destroy(event);

2009-10-12 09:06:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors


* stephane eranian <[email protected]> wrote:

> On Thu, Oct 8, 2009 at 10:08 PM, Ingo Molnar <[email protected]> wrote:
> >
> > * David Miller <[email protected]> wrote:
> >
> >> From: stephane eranian <[email protected]>
> >> Date: Wed, 7 Oct 2009 14:31:58 +0200
> >>
> >> > What PPC does is probably the only way to do this given the interface between
> >> > generic and machine-specific code. The one advantage I see is that it works
> >> > inside an event group but also across event groups because that code does not
> >> > look at group boundary, it only looks at the events and the number of available
> >> > registers. The downside is that you duplicate state.
> >> >
> >> > Did I get this right, Paul?
> >>
> >> That's basically how his code works, yes. ?I intend on duplicating it
> >> to some extent on sparc64 since I'm operating in a similar problem
> >> space.
> >>
> >> So if at least some of this engine went to a generic place, there'd be
> >> at least a 3rd user :-)
> >
> > Yeah, i'd definitely suggest to generalize this. We've missed updating
> > PowerPC lowlevel details a couple of times in perf core updates, just
> > because it's in a non-obvious place. Even if it's used by just a single
> > arch, generic code is much more visible.
> >
>
> It is not clear how you can do this without creating a monster. As I
> said the constraints can be far more difficult than just event X =>
> allowed_counter_bitmask.

The event constraints we are interested in come from the physics of
CPUs, not from inherent properties of particular architectures.

If you check the various constraints you'll see there's many repeating
patterns and many of those will repeat between architectures.

Arbitrary, random constraints (that stem from design stupidity/laziness)
can be kept at arch level, as a quirk in essence.

Spreading them all out into architecture code is the far worse solution,
it creates a fragile distributed monster with repeating patterns -
instead we want a manageable central monster ;-) [We are also quite good
at controlling and shrinking monsters in the core kernel.]

So we could start all this by factoring out the sane looking bits of
PowerPC and x86 constraints into generic helpers, and go step by step
starting from that point.

Would you be interested in trying that, to finish what you started with
'struct event_constraint' in arch/x86/kernel/cpu/perf_event.c?

Thanks,

Ingo

2009-10-13 07:18:06

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors

On Mon, Oct 12, 2009 at 11:05 AM, Ingo Molnar <[email protected]> wrote:
>
> The event constraints we are interested in come from the physics of
> CPUs, not from inherent properties of particular architectures.
>
I don't understand this statement.

> If you check the various constraints you'll see there's many repeating
> patterns and many of those will repeat between architectures.
>
Some are similar and I have mentioned them but also many are specific.

> Arbitrary, random constraints (that stem from design stupidity/laziness)
> can be kept at arch level, as a quirk in essence.
>
We have had a discussion about event constraints early on. I think you and
I have a different appreciation on why they exist. I don't think it would be
fruitful to restart this. Constraints exist, they will most likely
always be there.
You can choose to ignore them, drop the constrained features, or add the
code to deal with them when the feature is worth it.

> Spreading them all out into architecture code is the far worse solution,
> it creates a fragile distributed monster with repeating patterns -
> instead we want a manageable central monster ;-) [We are also quite good
> at controlling and shrinking monsters in the core kernel.]
>
I don't understand this either.
Why would architecture specific code be more fragile ?

> So we could start all this by factoring out the sane looking bits of
> PowerPC and x86 constraints into generic helpers, and go step by step
> starting from that point.
>
> Would you be interested in trying that, to finish what you started with
> 'struct event_constraint' in arch/x86/kernel/cpu/perf_event.c?

I have already started this effort because what is there now, though correct,
is still not satisfactory. But I have decided to first try to
implement it in the
X86 specific code to gauge what is actually needed. Then, we may be able
to promote some code to the generic layer.

2009-10-13 07:29:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] perf_events: add event constraints support for Intel processors


* stephane eranian <[email protected]> wrote:

> > Spreading them all out into architecture code is the far worse
> > solution, it creates a fragile distributed monster with repeating
> > patterns - instead we want a manageable central monster ;-) [We are
> > also quite good at controlling and shrinking monsters in the core
> > kernel.]
>
> I don't understand this either.
> Why would architecture specific code be more fragile ?

Because similar code spread out and partly duplicated in 22
architectures is an order of magnitude less maintainable than
a core library.

Ingo