Subject: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

This patch adds northbridge counter support for AMD family 15h cpus.

The NB counter implementation and usage is in the same way as for
family 10h. Thus a nb event can now be selected as any other
performance counter event. As for family 10h the kernel selects only
one NB PMC per node by using the nb constraint handler.

Main part of this patch set is to rework current code in a way that
bit masks for counters can be used. Also, Intel's fixed counters have
been moved to Intel only code. This is since AMD nb counters start at
index 32 which leads to holes in the counter mask and causes conflicts
with fixed counters.

Another major change is the unification of AMD pmus and, where
possible, a family independent feature check based on cpuid.

It should also be mentioned that nb perfctrs do not support all bits
in the config value, see patch #10.

-Robert



Robert Richter (10):
perf, amd: Rework northbridge event constraints handler
perf, x86: Rework counter reservation code
perf, x86: Use bitmasks for generic counters
perf, x86: Rename Intel specific macros
perf, x86: Move Intel specific code to intel_pmu_init()
perf, amd: Unify AMD's generic and family 15h pmus
perf, amd: Generalize northbridge constraints code for family 15h
perf, amd: Enable northbridge counters on family 15h
perf, x86: Improve debug output in check_hw_exists()
perf, amd: Check northbridge event config value

arch/x86/include/asm/cpufeature.h | 2 +
arch/x86/include/asm/kvm_host.h | 4 +-
arch/x86/include/asm/perf_event.h | 26 ++-
arch/x86/kernel/cpu/perf_event.c | 129 +++++------
arch/x86/kernel/cpu/perf_event.h | 7 +
arch/x86/kernel/cpu/perf_event_amd.c | 368 +++++++++++++++++------------
arch/x86/kernel/cpu/perf_event_intel.c | 65 +++++-
arch/x86/kernel/cpu/perf_event_intel_ds.c | 4 +-
arch/x86/kernel/cpu/perf_event_p4.c | 8 +-
arch/x86/kvm/pmu.c | 22 +-
arch/x86/oprofile/op_model_amd.c | 4 +-
11 files changed, 374 insertions(+), 265 deletions(-)

--
1.7.8.4


Subject: [PATCH 09/10] perf, x86: Improve debug output in check_hw_exists()

It might be of interest which perfctr msr failed.

Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 1d59cac..8ab4e5a 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -175,7 +175,7 @@ static void release_pmc_hardware(void) {}

static bool check_hw_exists(void)
{
- u64 val, val_new = 0;
+ u64 val, val_new = ~0;
int i, reg, ret = 0;

/*
@@ -226,6 +226,7 @@ bios_fail:

msr_fail:
printk(KERN_CONT "Broken PMU hardware detected, using software events only.\n");
+ printk(KERN_ERR "Failed to access perfctr msr (MSR %x is %Lx)\n", reg, val_new);

return false;
}
--
1.7.8.4

Subject: [PATCH 02/10] perf, x86: Rework counter reservation code

We will have non-countinous counter masks with the AMD family 15h pmu.
Rework the code for later use of counter masks.

No functional changes.

Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 25 +++++++++++--------------
1 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index e7540c8..ac1cb32 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -132,29 +132,26 @@ static DEFINE_MUTEX(pmc_reserve_mutex);

static bool reserve_pmc_hardware(void)
{
- int i;
+ int idx1, idx2;

- for (i = 0; i < x86_pmu.num_counters; i++) {
- if (!reserve_perfctr_nmi(x86_pmu_event_addr(i)))
+ for (idx1 = 0; idx1 < x86_pmu.num_counters; idx1++) {
+ if (!reserve_perfctr_nmi(x86_pmu_event_addr(idx1)))
goto perfctr_fail;
- }
-
- for (i = 0; i < x86_pmu.num_counters; i++) {
- if (!reserve_evntsel_nmi(x86_pmu_config_addr(i)))
+ if (!reserve_evntsel_nmi(x86_pmu_config_addr(idx1)))
goto eventsel_fail;
}

return true;

eventsel_fail:
- for (i--; i >= 0; i--)
- release_evntsel_nmi(x86_pmu_config_addr(i));
-
- i = x86_pmu.num_counters;
-
+ release_perfctr_nmi(x86_pmu_event_addr(idx1));
perfctr_fail:
- for (i--; i >= 0; i--)
- release_perfctr_nmi(x86_pmu_event_addr(i));
+ for (idx2 = 0; idx2 < x86_pmu.num_counters; idx2++) {
+ if (idx2 >= idx1)
+ break;
+ release_evntsel_nmi(x86_pmu_config_addr(idx2));
+ release_perfctr_nmi(x86_pmu_event_addr(idx2));
+ }

return false;
}
--
1.7.8.4

Subject: [PATCH 07/10] perf, amd: Generalize northbridge constraints code for family 15h

Generalize northbridge constraints code for family 15h and use this
code there too. The main difference is the use of a counter mask for
northbridge counters. Families others than 15h use unconstraints mask
(any generic counter) while on family 15h we have 4 separate nb
counters. We get the nb counter bit mask from the constraint that we
get for a nb event. Still, the nb counters are not yet enabled. See
next patch for this.

Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/kernel/cpu/perf_event_amd.c | 59 +++++++++++++++++++++++-----------
1 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 27b2806..e538512 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -188,20 +188,13 @@ static inline int amd_has_nb(struct cpu_hw_events *cpuc)
return nb && nb->nb_id != -1;
}

-static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
- struct perf_event *event)
+static void amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
+ struct perf_event *event)
{
- struct hw_perf_event *hwc = &event->hw;
struct amd_nb *nb = cpuc->amd_nb;
int i;

/*
- * only care about NB events
- */
- if (!(amd_has_nb(cpuc) && amd_is_nb_event(hwc)))
- return;
-
- /*
* need to scan whole list because event may not have
* been assigned during scheduling
*
@@ -247,12 +240,13 @@ static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
*
* Given that resources are allocated (cmpxchg), they must be
* eventually freed for others to use. This is accomplished by
- * calling amd_put_event_constraints().
+ * calling amd_put_nb_event_constraints().
*
* Non NB events are not impacted by this restriction.
*/
static struct event_constraint *
-amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+__amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
+ struct event_constraint *c)
{
struct hw_perf_event *hwc = &event->hw;
struct amd_nb *nb = cpuc->amd_nb;
@@ -260,12 +254,6 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
int idx, new = -1;

/*
- * if not NB event or no NB, then no constraints
- */
- if (!(amd_has_nb(cpuc) && amd_is_nb_event(hwc)))
- return &unconstrained;
-
- /*
* detect if already present, if so reuse
*
* cannot merge with actual allocation
@@ -275,7 +263,7 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
* because of successive calls to x86_schedule_events() from
* hw_perf_group_sched_in() without hw_perf_enable()
*/
- for_each_generic_counter(idx) {
+ for_each_set_bit(idx, c->idxmsk, X86_PMC_IDX_MAX) {
if (new == -1 || hwc->idx == idx)
/* assign free slot, prefer hwc->idx */
old = cmpxchg(nb->owners + idx, NULL, event);
@@ -481,7 +469,7 @@ static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);

static struct event_constraint *
-amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
+__amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
unsigned int event_code = amd_get_event_code(hwc);
@@ -552,6 +540,39 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *ev
}
}

+static struct event_constraint *
+amd_get_nb_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event,
+ struct event_constraint *c)
+{
+ /*
+ * only care about NB events
+ */
+ if (!(amd_has_nb(cpuc) && amd_is_nb_event(&event->hw)))
+ return c;
+
+ return __amd_get_nb_event_constraints(cpuc, event, c);
+}
+
+static struct event_constraint *
+amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+ return amd_get_nb_event_constraints(cpuc, event, &unconstrained);
+}
+
+static struct event_constraint *
+amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+ struct event_constraint *c = __amd_get_event_constraints_f15h(cpuc, event);
+ return amd_get_nb_event_constraints(cpuc, event, c);
+}
+
+static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
+ struct perf_event *event)
+{
+ if (amd_has_nb(cpuc) && amd_is_nb_event(&event->hw))
+ amd_put_nb_event_constraints(cpuc, event);
+}
+
static __initconst const struct x86_pmu amd_pmu = {
.name = "AMD",
.handle_irq = x86_pmu_handle_irq,
--
1.7.8.4

Subject: [PATCH 10/10] perf, amd: Check northbridge event config value

The northbridge counter configuration registers do not support certain
bits as core performance counters do. Not supported bits are:

* Host/Guest Only
* Counter Mask
* Invert Comparison
* Edge Detect
* Operating-System Mode
* User Mode

Do not setup northbridge events with those bits set.

Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/include/asm/perf_event.h | 3 +
arch/x86/kernel/cpu/perf_event_amd.c | 93 +++++++++++++++++++++++----------
2 files changed, 68 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 637a72b7..dc7f5b1 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -48,6 +48,9 @@
#define AMD64_RAW_EVENT_MASK \
(X86_RAW_EVENT_MASK | \
AMD64_EVENTSEL_EVENT)
+#define AMD64_NB_EVENT_MASK \
+ (AMD64_EVENTSEL_EVENT | \
+ ARCH_PERFMON_EVENTSEL_UMASK)
#define AMD64_NUM_COUNTERS 4
#define AMD64_NUM_COUNTERS_CORE 6
#define AMD64_NUM_COUNTERS_NB 4
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 7a870d2..03cd13e 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -116,6 +116,65 @@ static __initconst const u64 amd_hw_cache_event_ids
};

/*
+ * AMD64 events are detected based on their event codes.
+ */
+static inline unsigned int amd_get_event_code(struct hw_perf_event *hwc)
+{
+ return ((hwc->config >> 24) & 0x0f00) | (hwc->config & 0x00ff);
+}
+
+static inline int amd_is_nb_event(struct hw_perf_event *hwc)
+{
+ return (hwc->config & 0xe0) == 0xe0;
+}
+
+static inline int amd_is_nb_perfctr_event(struct hw_perf_event *hwc)
+{
+ return amd_is_nb_event(hwc)
+ && (x86_pmu.counters_mask64 & AMD64_COUNTERS_MASK_NB);
+}
+
+static inline int amd_has_nb(struct cpu_hw_events *cpuc)
+{
+ struct amd_nb *nb = cpuc->amd_nb;
+
+ return nb && nb->nb_id != -1;
+}
+
+/*
+ * AMD NB counters (MSRs 0xc0010240 etc.) do not support the following
+ * flags:
+ *
+ * Host/Guest Only
+ * Counter Mask
+ * Invert Comparison
+ * Edge Detect
+ * Operating-System Mode
+ * User Mode
+ *
+ * Try to fix the config for default settings, otherwise fail.
+ */
+static int amd_nb_event_config(struct perf_event *event)
+{
+ if (!amd_is_nb_perfctr_event(&event->hw))
+ return 0;
+
+ if (event->attr.exclude_host || event->attr.exclude_guest
+ || event->attr.exclude_user || event->attr.exclude_kernel)
+ goto fail;
+
+ event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR | ARCH_PERFMON_EVENTSEL_OS);
+
+ if (event->hw.config & ~(AMD64_NB_EVENT_MASK | ARCH_PERFMON_EVENTSEL_INT))
+ goto fail;
+
+ return 0;
+fail:
+ pr_debug("Invalid nb counter config value: %016Lx\n", event->hw.config);
+ return -EINVAL;
+}
+
+/*
* AMD Performance Monitor K7 and later.
*/
static const u64 amd_perfmon_event_map[] =
@@ -143,13 +202,13 @@ static int amd_pmu_hw_config(struct perf_event *event)
if (event->attr.precise_ip && get_ibs_caps())
return -ENOENT;

+ if (has_branch_stack(event))
+ return -EOPNOTSUPP;
+
ret = x86_pmu_hw_config(event);
if (ret)
return ret;

- if (has_branch_stack(event))
- return -EOPNOTSUPP;
-
if (event->attr.exclude_host && event->attr.exclude_guest)
/*
* When HO == GO == 1 the hardware treats that as GO == HO == 0
@@ -163,32 +222,10 @@ static int amd_pmu_hw_config(struct perf_event *event)
else if (event->attr.exclude_guest)
event->hw.config |= AMD_PERFMON_EVENTSEL_HOSTONLY;

- if (event->attr.type != PERF_TYPE_RAW)
- return 0;
-
- event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;
-
- return 0;
-}
-
-/*
- * AMD64 events are detected based on their event codes.
- */
-static inline unsigned int amd_get_event_code(struct hw_perf_event *hwc)
-{
- return ((hwc->config >> 24) & 0x0f00) | (hwc->config & 0x00ff);
-}
-
-static inline int amd_is_nb_event(struct hw_perf_event *hwc)
-{
- return (hwc->config & 0xe0) == 0xe0;
-}
-
-static inline int amd_has_nb(struct cpu_hw_events *cpuc)
-{
- struct amd_nb *nb = cpuc->amd_nb;
+ if (event->attr.type == PERF_TYPE_RAW)
+ event->hw.config |= event->attr.config & AMD64_RAW_EVENT_MASK;

- return nb && nb->nb_id != -1;
+ return amd_nb_event_config(event);
}

static void amd_put_nb_event_constraints(struct cpu_hw_events *cpuc,
--
1.7.8.4

Subject: [PATCH 06/10] perf, amd: Unify AMD's generic and family 15h pmus

Enable amd_{get,put}_event_constraints() functions also for family
15h. We enable the setup of nb counters for family 15h now, but do not
yet enable the use of nb counters. This will be done in a later patch.

Keeping separate pmu structs no longer makes sense now. We can use
only a single pmu struct for all AMD cpus. This patch introduces
functions to setup the pmu to enabe core performance counters or
counter constraints.

Also, use cpuid checks instead of family checks where possible. Thus,
it enables the code independently of cpu families if the feature flag
is set.

Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/include/asm/perf_event.h | 3 +-
arch/x86/kernel/cpu/perf_event_amd.c | 104 +++++++++++++++------------------
arch/x86/oprofile/op_model_amd.c | 4 +-
3 files changed, 50 insertions(+), 61 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 3b31248..ffdf5e0 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -47,8 +47,7 @@
(X86_RAW_EVENT_MASK | \
AMD64_EVENTSEL_EVENT)
#define AMD64_NUM_COUNTERS 4
-#define AMD64_NUM_COUNTERS_F15H 6
-#define AMD64_NUM_COUNTERS_MAX AMD64_NUM_COUNTERS_F15H
+#define AMD64_NUM_COUNTERS_CORE 6

#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL 0x3c
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK (0x00 << 8)
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index b5ed2e1..27b2806 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -350,7 +350,7 @@ static void amd_pmu_cpu_starting(int cpu)

cpuc->perf_ctr_virt_mask = AMD_PERFMON_EVENTSEL_HOSTONLY;

- if (boot_cpu_data.x86_max_cores < 2 || boot_cpu_data.x86 == 0x15)
+ if (boot_cpu_data.x86_max_cores < 2)
return;

nb_id = amd_get_nb_id(cpu);
@@ -406,35 +406,6 @@ static struct attribute *amd_format_attr[] = {
NULL,
};

-static __initconst const struct x86_pmu amd_pmu = {
- .name = "AMD",
- .handle_irq = x86_pmu_handle_irq,
- .disable_all = x86_pmu_disable_all,
- .enable_all = x86_pmu_enable_all,
- .enable = x86_pmu_enable_event,
- .disable = x86_pmu_disable_event,
- .hw_config = amd_pmu_hw_config,
- .schedule_events = x86_schedule_events,
- .eventsel = MSR_K7_EVNTSEL0,
- .perfctr = MSR_K7_PERFCTR0,
- .event_map = amd_pmu_event_map,
- .max_events = ARRAY_SIZE(amd_perfmon_event_map),
- .num_counters = AMD64_NUM_COUNTERS,
- .cntval_bits = 48,
- .cntval_mask = (1ULL << 48) - 1,
- .apic = 1,
- /* use highest bit to detect overflow */
- .max_period = (1ULL << 47) - 1,
- .get_event_constraints = amd_get_event_constraints,
- .put_event_constraints = amd_put_event_constraints,
-
- .format_attrs = amd_format_attr,
-
- .cpu_prepare = amd_pmu_cpu_prepare,
- .cpu_starting = amd_pmu_cpu_starting,
- .cpu_dead = amd_pmu_cpu_dead,
-};
-
/* AMD Family 15h */

#define AMD_EVENT_TYPE_MASK 0x000000F0ULL
@@ -581,8 +552,8 @@ amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *ev
}
}

-static __initconst const struct x86_pmu amd_pmu_f15h = {
- .name = "AMD Family 15h",
+static __initconst const struct x86_pmu amd_pmu = {
+ .name = "AMD",
.handle_irq = x86_pmu_handle_irq,
.disable_all = x86_pmu_disable_all,
.enable_all = x86_pmu_enable_all,
@@ -590,50 +561,69 @@ static __initconst const struct x86_pmu amd_pmu_f15h = {
.disable = x86_pmu_disable_event,
.hw_config = amd_pmu_hw_config,
.schedule_events = x86_schedule_events,
- .eventsel = MSR_F15H_PERF_CTL,
- .perfctr = MSR_F15H_PERF_CTR,
+ .eventsel = MSR_K7_EVNTSEL0,
+ .perfctr = MSR_K7_PERFCTR0,
.event_map = amd_pmu_event_map,
.max_events = ARRAY_SIZE(amd_perfmon_event_map),
- .num_counters = AMD64_NUM_COUNTERS_F15H,
+ .num_counters = AMD64_NUM_COUNTERS,
.cntval_bits = 48,
.cntval_mask = (1ULL << 48) - 1,
.apic = 1,
/* use highest bit to detect overflow */
.max_period = (1ULL << 47) - 1,
- .get_event_constraints = amd_get_event_constraints_f15h,
- /* nortbridge counters not yet implemented: */
-#if 0
+ .get_event_constraints = amd_get_event_constraints,
.put_event_constraints = amd_put_event_constraints,

+ .format_attrs = amd_format_attr,
+
.cpu_prepare = amd_pmu_cpu_prepare,
- .cpu_dead = amd_pmu_cpu_dead,
-#endif
.cpu_starting = amd_pmu_cpu_starting,
- .format_attrs = amd_format_attr,
+ .cpu_dead = amd_pmu_cpu_dead,
};

+static int setup_event_constraints(void)
+{
+ if (boot_cpu_data.x86 >= 0x15)
+ x86_pmu.get_event_constraints = amd_get_event_constraints_f15h;
+ return 0;
+}
+
+static int setup_perfctr_core(void)
+{
+ if (!cpu_has_perfctr_core) {
+ WARN(x86_pmu.get_event_constraints == amd_get_event_constraints_f15h,
+ KERN_ERR "Odd, counter constraints enabled but no core perfctrs detected!");
+ return -ENODEV;
+ }
+
+ WARN(x86_pmu.get_event_constraints == amd_get_event_constraints,
+ KERN_ERR "hw perf events core counters need constraints handler!");
+
+ /*
+ * If core performance counter extensions exists, we must use
+ * MSR_F15H_PERF_CTL/MSR_F15H_PERF_CTR msrs. See also
+ * x86_pmu_addr_offset().
+ */
+ x86_pmu.eventsel = MSR_F15H_PERF_CTL;
+ x86_pmu.perfctr = MSR_F15H_PERF_CTR;
+ x86_pmu.num_counters = AMD64_NUM_COUNTERS_CORE;
+ x86_pmu.counters_mask64 = (1ULL << AMD64_NUM_COUNTERS_CORE) - 1;
+
+ printk(KERN_INFO "perf: AMD core performance counters detected\n");
+
+ return 0;
+}
+
__init int amd_pmu_init(void)
{
/* Performance-monitoring supported from K7 and later: */
if (boot_cpu_data.x86 < 6)
return -ENODEV;

- /*
- * If core performance counter extensions exists, it must be
- * family 15h, otherwise fail. See x86_pmu_addr_offset().
- */
- switch (boot_cpu_data.x86) {
- case 0x15:
- if (!cpu_has_perfctr_core)
- return -ENODEV;
- x86_pmu = amd_pmu_f15h;
- break;
- default:
- if (cpu_has_perfctr_core)
- return -ENODEV;
- x86_pmu = amd_pmu;
- break;
- }
+ x86_pmu = amd_pmu;
+
+ setup_event_constraints();
+ setup_perfctr_core();

/* Events are common for all AMDs */
memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
diff --git a/arch/x86/oprofile/op_model_amd.c b/arch/x86/oprofile/op_model_amd.c
index 303f086..b2b9443 100644
--- a/arch/x86/oprofile/op_model_amd.c
+++ b/arch/x86/oprofile/op_model_amd.c
@@ -312,7 +312,7 @@ static int op_amd_fill_in_addresses(struct op_msrs * const msrs)
goto fail;
}
/* both registers must be reserved */
- if (num_counters == AMD64_NUM_COUNTERS_F15H) {
+ if (num_counters == AMD64_NUM_COUNTERS_CORE) {
msrs->counters[i].addr = MSR_F15H_PERF_CTR + (i << 1);
msrs->controls[i].addr = MSR_F15H_PERF_CTL + (i << 1);
} else {
@@ -514,7 +514,7 @@ static int op_amd_init(struct oprofile_operations *ops)
ops->create_files = setup_ibs_files;

if (boot_cpu_data.x86 == 0x15) {
- num_counters = AMD64_NUM_COUNTERS_F15H;
+ num_counters = AMD64_NUM_COUNTERS_CORE;
} else {
num_counters = AMD64_NUM_COUNTERS;
}
--
1.7.8.4

Subject: [PATCH 08/10] perf, amd: Enable northbridge counters on family 15h

This patch enables northbridge counter support for family 15h cpus.
Northbridge counters on family 15h have a separate counter bit mask
and constraints. NB counters are enabled if the nb performance counter
extensions cpuid flag is set.

Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/include/asm/cpufeature.h | 2 +
arch/x86/include/asm/perf_event.h | 3 ++
arch/x86/kernel/cpu/perf_event.c | 21 +++++++++++++++-
arch/x86/kernel/cpu/perf_event_amd.c | 42 +++++++++++++++++++++++++++------
4 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 340ee49..404bb64 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -164,6 +164,7 @@
#define X86_FEATURE_TBM (6*32+21) /* trailing bit manipulations */
#define X86_FEATURE_TOPOEXT (6*32+22) /* topology extensions CPUID leafs */
#define X86_FEATURE_PERFCTR_CORE (6*32+23) /* core performance counter extensions */
+#define X86_FEATURE_PERFCTR_NB (6*32+24) /* nb performance counter extensions */

/*
* Auxiliary flags: Linux defined - For features scattered in various
@@ -301,6 +302,7 @@ extern const char * const x86_power_flags[32];
#define cpu_has_hypervisor boot_cpu_has(X86_FEATURE_HYPERVISOR)
#define cpu_has_pclmulqdq boot_cpu_has(X86_FEATURE_PCLMULQDQ)
#define cpu_has_perfctr_core boot_cpu_has(X86_FEATURE_PERFCTR_CORE)
+#define cpu_has_perfctr_nb boot_cpu_has(X86_FEATURE_PERFCTR_NB)
#define cpu_has_cx8 boot_cpu_has(X86_FEATURE_CX8)
#define cpu_has_cx16 boot_cpu_has(X86_FEATURE_CX16)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index ffdf5e0..637a72b7 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -9,6 +9,8 @@
#define INTEL_PMC_MAX_FIXED 3
#define INTEL_PMC_IDX_FIXED 32

+#define AMD64_PMC_IDX_NB 32
+
#define X86_PMC_IDX_MAX 64

#define MSR_ARCH_PERFMON_PERFCTR0 0xc1
@@ -48,6 +50,7 @@
AMD64_EVENTSEL_EVENT)
#define AMD64_NUM_COUNTERS 4
#define AMD64_NUM_COUNTERS_CORE 6
+#define AMD64_NUM_COUNTERS_NB 4

#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_SEL 0x3c
#define ARCH_PERFMON_UNHALTED_CORE_CYCLES_UMASK (0x00 << 8)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 9fbf70a..1d59cac 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -805,6 +805,13 @@ static int collect_events(struct cpu_hw_events *cpuc, struct perf_event *leader,
return n;
}

+static inline int x86_pmu_rdpmc_index(int index)
+{
+ if (cpu_has_perfctr_nb && (index >= AMD64_PMC_IDX_NB))
+ return index - AMD64_PMC_IDX_NB + AMD64_NUM_COUNTERS_CORE;
+ return index;
+}
+
static inline void x86_assign_hw_event(struct perf_event *event,
struct cpu_hw_events *cpuc, int i)
{
@@ -824,7 +831,7 @@ static inline void x86_assign_hw_event(struct perf_event *event,
} else {
hwc->config_base = x86_pmu_config_addr(hwc->idx);
hwc->event_base = x86_pmu_event_addr(hwc->idx);
- hwc->event_base_rdpmc = hwc->idx;
+ hwc->event_base_rdpmc = x86_pmu_rdpmc_index(hwc->idx);
}
}

@@ -1194,7 +1201,17 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
* might still deliver spurious interrupts still
* in flight. Catch them:
*/
- if (__test_and_clear_bit(idx, cpuc->running))
+ if (idx >= AMD64_PMC_IDX_NB && cpuc->amd_nb && cpuc->amd_nb->owners[idx]) {
+ /*
+ * AMD NB counters send nmis to all
+ * cpus on the node. Fix this later:
+ * This swallows all nmis on the node
+ * if a nb counter is active and even
+ * if there was no overflow.
+ */
+ handled++;
+ __set_bit(idx, cpuc->running);
+ } else if (__test_and_clear_bit(idx, cpuc->running))
handled++;
continue;
}
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index e538512..7a870d2 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -7,6 +7,9 @@

#include "perf_event.h"

+#define AMD64_COUNTERS_MASK_NB \
+ (((1ULL << AMD64_NUM_COUNTERS_NB) - 1) << AMD64_PMC_IDX_NB)
+
static __initconst const u64 amd_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
@@ -461,12 +464,13 @@ static struct attribute *amd_format_attr[] = {
* (**) only one unitmask enabled at a time
*/

-static struct event_constraint amd_f15_PMC0 = EVENT_CONSTRAINT(0, 0x01, 0);
-static struct event_constraint amd_f15_PMC20 = EVENT_CONSTRAINT(0, 0x07, 0);
-static struct event_constraint amd_f15_PMC3 = EVENT_CONSTRAINT(0, 0x08, 0);
-static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09, 0);
-static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
-static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
+static struct event_constraint amd_f15_PMC0 = EVENT_CONSTRAINT(0, 0x01, 0);
+static struct event_constraint amd_f15_PMC20 = EVENT_CONSTRAINT(0, 0x07, 0);
+static struct event_constraint amd_f15_PMC3 = EVENT_CONSTRAINT(0, 0x08, 0);
+static struct event_constraint amd_f15_PMC30 = EVENT_CONSTRAINT_OVERLAP(0, 0x09, 0);
+static struct event_constraint amd_f15_PMC50 = EVENT_CONSTRAINT(0, 0x3F, 0);
+static struct event_constraint amd_f15_PMC53 = EVENT_CONSTRAINT(0, 0x38, 0);
+static struct event_constraint amd_f15_NBPMC30 = EVENT_CONSTRAINT(0, AMD64_COUNTERS_MASK_NB, 0);

static struct event_constraint *
__amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *event)
@@ -533,8 +537,7 @@ __amd_get_event_constraints_f15h(struct cpu_hw_events *cpuc, struct perf_event *
return &amd_f15_PMC20;
}
case AMD_EVENT_NB:
- /* not yet implemented */
- return &emptyconstraint;
+ return &amd_f15_NBPMC30;
default:
return &emptyconstraint;
}
@@ -635,6 +638,28 @@ static int setup_perfctr_core(void)
return 0;
}

+static int setup_perfctr_nb(void)
+{
+ if (!cpu_has_perfctr_nb)
+ return -ENODEV;
+
+ /* requires core perfctrs initialized first */
+ if (x86_pmu.eventsel != MSR_F15H_PERF_CTL)
+ return -ENODEV;
+
+ if (x86_pmu.get_event_constraints == amd_get_event_constraints) {
+ WARN(1, KERN_ERR "hw perf events nb counter needs constraints handler!");
+ return -ENODEV;
+ }
+
+ x86_pmu.num_counters += AMD64_NUM_COUNTERS_NB;
+ x86_pmu.counters_mask64 |= AMD64_COUNTERS_MASK_NB;
+
+ printk(KERN_INFO "perf: AMD northbridge performance counters detected\n");
+
+ return 0;
+}
+
__init int amd_pmu_init(void)
{
/* Performance-monitoring supported from K7 and later: */
@@ -645,6 +670,7 @@ __init int amd_pmu_init(void)

setup_event_constraints();
setup_perfctr_core();
+ setup_perfctr_nb();

/* Events are common for all AMDs */
memcpy(hw_cache_event_ids, amd_hw_cache_event_ids,
--
1.7.8.4

Subject: [PATCH 03/10] perf, x86: Use bitmasks for generic counters

We will have non-countinous counter masks with the AMD family 15h pmu.
Introduce bitmasks for generic counters to support this.

This patch introduces the for_each_generic_counter(idx) macro to
iterate over all existing generic counters.

We update all the code to use that macro. Thus we are able to
configure a x86 pmu (struct members counters_mask/counters_mask64) to
have counter bit masks with holes in it.

The maximum number of generic counters is expanded to 64 now. If a pmu
has fixed counters (Intel), its number remains 32. Conflicts are
possible with generic counters and X86_PMC_IDX_FIXED_BTS. However, no
current pmu is affected. Moving BTS code to Intel only code could be
subject of another patch set.

Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 39 +++++++++++++++++++++----------
arch/x86/kernel/cpu/perf_event.h | 7 +++++
arch/x86/kernel/cpu/perf_event_amd.c | 6 ++--
arch/x86/kernel/cpu/perf_event_intel.c | 6 ++--
arch/x86/kernel/cpu/perf_event_p4.c | 6 ++--
5 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index ac1cb32..7edea06 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -134,7 +134,7 @@ static bool reserve_pmc_hardware(void)
{
int idx1, idx2;

- for (idx1 = 0; idx1 < x86_pmu.num_counters; idx1++) {
+ for_each_generic_counter(idx1) {
if (!reserve_perfctr_nmi(x86_pmu_event_addr(idx1)))
goto perfctr_fail;
if (!reserve_evntsel_nmi(x86_pmu_config_addr(idx1)))
@@ -146,7 +146,7 @@ static bool reserve_pmc_hardware(void)
eventsel_fail:
release_perfctr_nmi(x86_pmu_event_addr(idx1));
perfctr_fail:
- for (idx2 = 0; idx2 < x86_pmu.num_counters; idx2++) {
+ for_each_generic_counter(idx2) {
if (idx2 >= idx1)
break;
release_evntsel_nmi(x86_pmu_config_addr(idx2));
@@ -160,7 +160,7 @@ static void release_pmc_hardware(void)
{
int i;

- for (i = 0; i < x86_pmu.num_counters; i++) {
+ for_each_generic_counter(i) {
release_perfctr_nmi(x86_pmu_event_addr(i));
release_evntsel_nmi(x86_pmu_config_addr(i));
}
@@ -182,7 +182,7 @@ static bool check_hw_exists(void)
* Check to see if the BIOS enabled any of the counters, if so
* complain and bail.
*/
- for (i = 0; i < x86_pmu.num_counters; i++) {
+ for_each_generic_counter(i) {
reg = x86_pmu_config_addr(i);
ret = rdmsrl_safe(reg, &val);
if (ret)
@@ -480,7 +480,7 @@ void x86_pmu_disable_all(void)
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int idx;

- for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ for_each_generic_counter(idx) {
u64 val;

if (!test_bit(idx, cpuc->active_mask))
@@ -515,7 +515,7 @@ void x86_pmu_enable_all(int added)
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int idx;

- for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ for_each_generic_counter(idx) {
struct hw_perf_event *hwc = &cpuc->events[idx]->hw;

if (!test_bit(idx, cpuc->active_mask))
@@ -612,7 +612,7 @@ static bool perf_sched_restore_state(struct perf_sched *sched)
static bool __perf_sched_find_counter(struct perf_sched *sched)
{
struct event_constraint *c;
- int idx;
+ int idx, max;

if (!sched->state.unassigned)
return false;
@@ -629,10 +629,14 @@ static bool __perf_sched_find_counter(struct perf_sched *sched)
if (!__test_and_set_bit(idx, sched->state.used))
goto done;
}
+ max = X86_PMC_IDX_FIXED;
+ } else {
+ max = X86_PMC_IDX_MAX;
}
+
/* Grab the first unused counter starting with idx */
idx = sched->state.counter;
- for_each_set_bit_from(idx, c->idxmsk, X86_PMC_IDX_FIXED) {
+ for_each_set_bit_from(idx, c->idxmsk, max) {
if (!__test_and_set_bit(idx, sched->state.used))
goto done;
}
@@ -813,7 +817,7 @@ static inline void x86_assign_hw_event(struct perf_event *event,
if (hwc->idx == X86_PMC_IDX_FIXED_BTS) {
hwc->config_base = 0;
hwc->event_base = 0;
- } else if (hwc->idx >= X86_PMC_IDX_FIXED) {
+ } else if (x86_pmu.num_counters_fixed && hwc->idx >= X86_PMC_IDX_FIXED) {
hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 + (hwc->idx - X86_PMC_IDX_FIXED);
hwc->event_base_rdpmc = (hwc->idx - X86_PMC_IDX_FIXED) | 1<<30;
@@ -1088,7 +1092,7 @@ void perf_event_print_debug(void)
}
pr_info("CPU#%d: active: %016llx\n", cpu, *(u64 *)cpuc->active_mask);

- for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ for_each_generic_counter(idx) {
rdmsrl(x86_pmu_config_addr(idx), pmc_ctrl);
rdmsrl(x86_pmu_event_addr(idx), pmc_count);

@@ -1183,7 +1187,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
*/
apic_write(APIC_LVTPC, APIC_DM_NMI);

- for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ for_each_generic_counter(idx) {
if (!test_bit(idx, cpuc->active_mask)) {
/*
* Though we deactivated the counter some cpus
@@ -1351,11 +1355,19 @@ static int __init init_hw_perf_events(void)
x86_pmu.intel_ctrl |=
((1LL << x86_pmu.num_counters_fixed)-1) << X86_PMC_IDX_FIXED;

+ if (!x86_pmu.counters_mask64)
+ x86_pmu.counters_mask64 = (1ULL << x86_pmu.num_counters) - 1;
+ if (x86_pmu.num_counters != hweight64(x86_pmu.counters_mask64)) {
+ WARN(1, KERN_ERR "hw perf events counter mask and number don't match: 0x%016Lx/%d!",
+ x86_pmu.counters_mask64, x86_pmu.num_counters);
+ x86_pmu.num_counters = hweight64(x86_pmu.counters_mask64);
+ }
+
perf_events_lapic_init();
register_nmi_handler(NMI_LOCAL, perf_event_nmi_handler, 0, "PMI");

unconstrained = (struct event_constraint)
- __EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
+ __EVENT_CONSTRAINT(0, x86_pmu.counters_mask64,
0, x86_pmu.num_counters, 0);

if (x86_pmu.event_constraints) {
@@ -1383,7 +1395,8 @@ static int __init init_hw_perf_events(void)
pr_info("... value mask: %016Lx\n", x86_pmu.cntval_mask);
pr_info("... max period: %016Lx\n", x86_pmu.max_period);
pr_info("... fixed-purpose events: %d\n", x86_pmu.num_counters_fixed);
- pr_info("... event mask: %016Lx\n", x86_pmu.intel_ctrl);
+ pr_info("... event mask: %016Lx\n",
+ x86_pmu.intel_ctrl ? x86_pmu.intel_ctrl : x86_pmu.counters_mask64);

perf_pmu_register(&pmu, "cpu", PERF_TYPE_RAW);
perf_cpu_notifier(x86_pmu_notifier);
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3df3de9..eb14b76 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -237,6 +237,9 @@ struct cpu_hw_events {
#define for_each_event_constraint(e, c) \
for ((e) = (c); (e)->weight; (e)++)

+#define for_each_generic_counter(idx) \
+ for_each_set_bit((idx), x86_pmu.counters_mask, X86_PMC_IDX_MAX)
+
/*
* Extra registers for specific events.
*
@@ -327,6 +330,10 @@ struct x86_pmu {
unsigned perfctr;
u64 (*event_map)(int);
int max_events;
+ union { /* generic counter mask, no fixed counters: */
+ unsigned long counters_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
+ u64 counters_mask64;
+ };
int num_counters;
int num_counters_fixed;
int cntval_bits;
diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 194ca0b..b5ed2e1 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -209,7 +209,7 @@ static void amd_put_event_constraints(struct cpu_hw_events *cpuc,
* be removed on one CPU at a time AND PMU is disabled
* when we come here
*/
- for (i = 0; i < x86_pmu.num_counters; i++) {
+ for_each_generic_counter(i) {
if (cmpxchg(nb->owners + i, event, NULL) == event)
break;
}
@@ -275,7 +275,7 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
* because of successive calls to x86_schedule_events() from
* hw_perf_group_sched_in() without hw_perf_enable()
*/
- for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ for_each_generic_counter(idx) {
if (new == -1 || hwc->idx == idx)
/* assign free slot, prefer hwc->idx */
old = cmpxchg(nb->owners + idx, NULL, event);
@@ -319,7 +319,7 @@ static struct amd_nb *amd_alloc_nb(int cpu)
/*
* initialize all possible NB constraints
*/
- for (i = 0; i < x86_pmu.num_counters; i++) {
+ for_each_generic_counter(i) {
__set_bit(i, nb->event_constraints[i].idxmsk);
nb->event_constraints[i].weight = 1;
}
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index e23e71f..e8c2eae 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1002,7 +1002,7 @@ static void intel_pmu_reset(void)

printk("clearing PMU state on CPU#%d\n", smp_processor_id());

- for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ for_each_generic_counter(idx) {
checking_wrmsrl(x86_pmu_config_addr(idx), 0ull);
checking_wrmsrl(x86_pmu_event_addr(idx), 0ull);
}
@@ -1453,7 +1453,7 @@ static struct perf_guest_switch_msr *core_guest_get_msrs(int *nr)
struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
int idx;

- for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ for_each_generic_counter(idx) {
struct perf_event *event = cpuc->events[idx];

arr[idx].msr = x86_pmu_config_addr(idx);
@@ -1486,7 +1486,7 @@ static void core_pmu_enable_all(int added)
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int idx;

- for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ for_each_generic_counter(idx) {
struct hw_perf_event *hwc = &cpuc->events[idx]->hw;

if (!test_bit(idx, cpuc->active_mask) ||
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 47124a7..1019049 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -919,7 +919,7 @@ static void p4_pmu_disable_all(void)
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int idx;

- for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ for_each_generic_counter(idx) {
struct perf_event *event = cpuc->events[idx];
if (!test_bit(idx, cpuc->active_mask))
continue;
@@ -988,7 +988,7 @@ static void p4_pmu_enable_all(int added)
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int idx;

- for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ for_each_generic_counter(idx) {
struct perf_event *event = cpuc->events[idx];
if (!test_bit(idx, cpuc->active_mask))
continue;
@@ -1007,7 +1007,7 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)

cpuc = &__get_cpu_var(cpu_hw_events);

- for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ for_each_generic_counter(idx) {
int overflow;

if (!test_bit(idx, cpuc->active_mask)) {
--
1.7.8.4

Subject: [PATCH 05/10] perf, x86: Move Intel specific code to intel_pmu_init()

There is some Intel specific code in the generic x86 path. Move it to
intel_pmu_init().

Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 33 -----------------------
arch/x86/kernel/cpu/perf_event_intel.c | 45 +++++++++++++++++++++++++++++++-
2 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 2874059..9fbf70a 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1308,7 +1308,6 @@ static struct attribute_group x86_pmu_format_group = {
static int __init init_hw_perf_events(void)
{
struct x86_pmu_quirk *quirk;
- struct event_constraint *c;
int err;

pr_info("Performance Events: ");
@@ -1339,22 +1338,6 @@ static int __init init_hw_perf_events(void)
for (quirk = x86_pmu.quirks; quirk; quirk = quirk->next)
quirk->func();

- if (x86_pmu.num_counters > INTEL_PMC_MAX_GENERIC) {
- WARN(1, KERN_ERR "hw perf events %d > max(%d), clipping!",
- x86_pmu.num_counters, INTEL_PMC_MAX_GENERIC);
- x86_pmu.num_counters = INTEL_PMC_MAX_GENERIC;
- }
- x86_pmu.intel_ctrl = (1 << x86_pmu.num_counters) - 1;
-
- if (x86_pmu.num_counters_fixed > INTEL_PMC_MAX_FIXED) {
- WARN(1, KERN_ERR "hw perf events fixed %d > max(%d), clipping!",
- x86_pmu.num_counters_fixed, INTEL_PMC_MAX_FIXED);
- x86_pmu.num_counters_fixed = INTEL_PMC_MAX_FIXED;
- }
-
- x86_pmu.intel_ctrl |=
- ((1LL << x86_pmu.num_counters_fixed)-1) << INTEL_PMC_IDX_FIXED;
-
if (!x86_pmu.counters_mask64)
x86_pmu.counters_mask64 = (1ULL << x86_pmu.num_counters) - 1;
if (x86_pmu.num_counters != hweight64(x86_pmu.counters_mask64)) {
@@ -1370,22 +1353,6 @@ static int __init init_hw_perf_events(void)
__EVENT_CONSTRAINT(0, x86_pmu.counters_mask64,
0, x86_pmu.num_counters, 0);

- if (x86_pmu.event_constraints) {
- /*
- * event on fixed counter2 (REF_CYCLES) only works on this
- * counter, so do not extend mask to generic counters
- */
- for_each_event_constraint(c, x86_pmu.event_constraints) {
- if (c->cmask != X86_RAW_EVENT_MASK
- || c->idxmsk64 == INTEL_PMC_MSK_FIXED_REF_CYCLES) {
- continue;
- }
-
- c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1;
- c->weight += x86_pmu.num_counters;
- }
- }
-
x86_pmu.attr_rdpmc = 1; /* enable userspace RDPMC usage by default */
x86_pmu_format_group.attrs = x86_pmu.format_attrs;

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 1eb9f00..90d7097 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1760,7 +1760,7 @@ static __init void intel_nehalem_quirk(void)
}
}

-__init int intel_pmu_init(void)
+static __init int __intel_pmu_init(void)
{
union cpuid10_edx edx;
union cpuid10_eax eax;
@@ -1955,3 +1955,46 @@ __init int intel_pmu_init(void)

return 0;
}
+
+__init int intel_pmu_init(void)
+{
+ struct event_constraint *c;
+ int ret = __intel_pmu_init();
+
+ if (ret)
+ return ret;
+
+ if (x86_pmu.num_counters > INTEL_PMC_MAX_GENERIC) {
+ WARN(1, KERN_ERR "hw perf events %d > max(%d), clipping!",
+ x86_pmu.num_counters, INTEL_PMC_MAX_GENERIC);
+ x86_pmu.num_counters = INTEL_PMC_MAX_GENERIC;
+ }
+ x86_pmu.intel_ctrl = (1 << x86_pmu.num_counters) - 1;
+
+ if (x86_pmu.num_counters_fixed > INTEL_PMC_MAX_FIXED) {
+ WARN(1, KERN_ERR "hw perf events fixed %d > max(%d), clipping!",
+ x86_pmu.num_counters_fixed, INTEL_PMC_MAX_FIXED);
+ x86_pmu.num_counters_fixed = INTEL_PMC_MAX_FIXED;
+ }
+
+ x86_pmu.intel_ctrl |=
+ ((1LL << x86_pmu.num_counters_fixed)-1) << INTEL_PMC_IDX_FIXED;
+
+ if (x86_pmu.event_constraints) {
+ /*
+ * event on fixed counter2 (REF_CYCLES) only works on this
+ * counter, so do not extend mask to generic counters
+ */
+ for_each_event_constraint(c, x86_pmu.event_constraints) {
+ if (c->cmask != X86_RAW_EVENT_MASK
+ || c->idxmsk64 == INTEL_PMC_MSK_FIXED_REF_CYCLES) {
+ continue;
+ }
+
+ c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1;
+ c->weight += x86_pmu.num_counters;
+ }
+ }
+
+ return 0;
+}
--
1.7.8.4

Subject: [PATCH 04/10] perf, x86: Rename Intel specific macros

There are macros that are Intel specific and not x86 generic. Rename
them into INTEL_*.

This patch removes X86_PMC_IDX_GENERIC and does:

$ sed -i -e 's/X86_PMC_MAX_/INTEL_PMC_MAX_/g' \
arch/x86/include/asm/kvm_host.h \
arch/x86/include/asm/perf_event.h \
arch/x86/kernel/cpu/perf_event.c \
arch/x86/kernel/cpu/perf_event_p4.c \
arch/x86/kvm/pmu.c
$ sed -i -e 's/X86_PMC_IDX_FIXED/INTEL_PMC_IDX_FIXED/g' \
arch/x86/include/asm/perf_event.h \
arch/x86/kernel/cpu/perf_event.c \
arch/x86/kernel/cpu/perf_event_intel.c \
arch/x86/kernel/cpu/perf_event_intel_ds.c \
arch/x86/kvm/pmu.c
$ sed -i -e 's/X86_PMC_MSK_/INTEL_PMC_MSK_/g' \
arch/x86/include/asm/perf_event.h \
arch/x86/kernel/cpu/perf_event.c

Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 +-
arch/x86/include/asm/perf_event.h | 17 ++++++-------
arch/x86/kernel/cpu/perf_event.c | 36 ++++++++++++++--------------
arch/x86/kernel/cpu/perf_event_intel.c | 14 +++++-----
arch/x86/kernel/cpu/perf_event_intel_ds.c | 4 +-
arch/x86/kernel/cpu/perf_event_p4.c | 2 +-
arch/x86/kvm/pmu.c | 22 +++++++++---------
7 files changed, 49 insertions(+), 50 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index db7c1f2..2da88c0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -313,8 +313,8 @@ struct kvm_pmu {
u64 counter_bitmask[2];
u64 global_ctrl_mask;
u8 version;
- struct kvm_pmc gp_counters[X86_PMC_MAX_GENERIC];
- struct kvm_pmc fixed_counters[X86_PMC_MAX_FIXED];
+ struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
+ struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
struct irq_work irq_work;
u64 reprogram_pmi;
};
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 588f52e..3b31248 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -5,11 +5,10 @@
* Performance event hw details:
*/

-#define X86_PMC_MAX_GENERIC 32
-#define X86_PMC_MAX_FIXED 3
+#define INTEL_PMC_MAX_GENERIC 32
+#define INTEL_PMC_MAX_FIXED 3
+#define INTEL_PMC_IDX_FIXED 32

-#define X86_PMC_IDX_GENERIC 0
-#define X86_PMC_IDX_FIXED 32
#define X86_PMC_IDX_MAX 64

#define MSR_ARCH_PERFMON_PERFCTR0 0xc1
@@ -121,16 +120,16 @@ struct x86_pmu_capability {

/* Instr_Retired.Any: */
#define MSR_ARCH_PERFMON_FIXED_CTR0 0x309
-#define X86_PMC_IDX_FIXED_INSTRUCTIONS (X86_PMC_IDX_FIXED + 0)
+#define INTEL_PMC_IDX_FIXED_INSTRUCTIONS (INTEL_PMC_IDX_FIXED + 0)

/* CPU_CLK_Unhalted.Core: */
#define MSR_ARCH_PERFMON_FIXED_CTR1 0x30a
-#define X86_PMC_IDX_FIXED_CPU_CYCLES (X86_PMC_IDX_FIXED + 1)
+#define INTEL_PMC_IDX_FIXED_CPU_CYCLES (INTEL_PMC_IDX_FIXED + 1)

/* CPU_CLK_Unhalted.Ref: */
#define MSR_ARCH_PERFMON_FIXED_CTR2 0x30b
-#define X86_PMC_IDX_FIXED_REF_CYCLES (X86_PMC_IDX_FIXED + 2)
-#define X86_PMC_MSK_FIXED_REF_CYCLES (1ULL << X86_PMC_IDX_FIXED_REF_CYCLES)
+#define INTEL_PMC_IDX_FIXED_REF_CYCLES (INTEL_PMC_IDX_FIXED + 2)
+#define INTEL_PMC_MSK_FIXED_REF_CYCLES (1ULL << INTEL_PMC_IDX_FIXED_REF_CYCLES)

/*
* We model BTS tracing as another fixed-mode PMC.
@@ -139,7 +138,7 @@ struct x86_pmu_capability {
* values are used by actual fixed events and higher values are used
* to indicate other overflow conditions in the PERF_GLOBAL_STATUS msr.
*/
-#define X86_PMC_IDX_FIXED_BTS (X86_PMC_IDX_FIXED + 16)
+#define INTEL_PMC_IDX_FIXED_BTS (INTEL_PMC_IDX_FIXED + 16)

/*
* IBS cpuid feature detection
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 7edea06..2874059 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -63,7 +63,7 @@ u64 x86_perf_event_update(struct perf_event *event)
int idx = hwc->idx;
s64 delta;

- if (idx == X86_PMC_IDX_FIXED_BTS)
+ if (idx == INTEL_PMC_IDX_FIXED_BTS)
return 0;

/*
@@ -624,12 +624,12 @@ static bool __perf_sched_find_counter(struct perf_sched *sched)

/* Prefer fixed purpose counters */
if (x86_pmu.num_counters_fixed) {
- idx = X86_PMC_IDX_FIXED;
+ idx = INTEL_PMC_IDX_FIXED;
for_each_set_bit_from(idx, c->idxmsk, X86_PMC_IDX_MAX) {
if (!__test_and_set_bit(idx, sched->state.used))
goto done;
}
- max = X86_PMC_IDX_FIXED;
+ max = INTEL_PMC_IDX_FIXED;
} else {
max = X86_PMC_IDX_MAX;
}
@@ -814,13 +814,13 @@ static inline void x86_assign_hw_event(struct perf_event *event,
hwc->last_cpu = smp_processor_id();
hwc->last_tag = ++cpuc->tags[i];

- if (hwc->idx == X86_PMC_IDX_FIXED_BTS) {
+ if (hwc->idx == INTEL_PMC_IDX_FIXED_BTS) {
hwc->config_base = 0;
hwc->event_base = 0;
- } else if (x86_pmu.num_counters_fixed && hwc->idx >= X86_PMC_IDX_FIXED) {
+ } else if (x86_pmu.num_counters_fixed && hwc->idx >= INTEL_PMC_IDX_FIXED) {
hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
- hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 + (hwc->idx - X86_PMC_IDX_FIXED);
- hwc->event_base_rdpmc = (hwc->idx - X86_PMC_IDX_FIXED) | 1<<30;
+ hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 + (hwc->idx - INTEL_PMC_IDX_FIXED);
+ hwc->event_base_rdpmc = (hwc->idx - INTEL_PMC_IDX_FIXED) | 1<<30;
} else {
hwc->config_base = x86_pmu_config_addr(hwc->idx);
hwc->event_base = x86_pmu_event_addr(hwc->idx);
@@ -922,7 +922,7 @@ int x86_perf_event_set_period(struct perf_event *event)
s64 period = hwc->sample_period;
int ret = 0, idx = hwc->idx;

- if (idx == X86_PMC_IDX_FIXED_BTS)
+ if (idx == INTEL_PMC_IDX_FIXED_BTS)
return 0;

/*
@@ -1339,21 +1339,21 @@ static int __init init_hw_perf_events(void)
for (quirk = x86_pmu.quirks; quirk; quirk = quirk->next)
quirk->func();

- if (x86_pmu.num_counters > X86_PMC_MAX_GENERIC) {
+ if (x86_pmu.num_counters > INTEL_PMC_MAX_GENERIC) {
WARN(1, KERN_ERR "hw perf events %d > max(%d), clipping!",
- x86_pmu.num_counters, X86_PMC_MAX_GENERIC);
- x86_pmu.num_counters = X86_PMC_MAX_GENERIC;
+ x86_pmu.num_counters, INTEL_PMC_MAX_GENERIC);
+ x86_pmu.num_counters = INTEL_PMC_MAX_GENERIC;
}
x86_pmu.intel_ctrl = (1 << x86_pmu.num_counters) - 1;

- if (x86_pmu.num_counters_fixed > X86_PMC_MAX_FIXED) {
+ if (x86_pmu.num_counters_fixed > INTEL_PMC_MAX_FIXED) {
WARN(1, KERN_ERR "hw perf events fixed %d > max(%d), clipping!",
- x86_pmu.num_counters_fixed, X86_PMC_MAX_FIXED);
- x86_pmu.num_counters_fixed = X86_PMC_MAX_FIXED;
+ x86_pmu.num_counters_fixed, INTEL_PMC_MAX_FIXED);
+ x86_pmu.num_counters_fixed = INTEL_PMC_MAX_FIXED;
}

x86_pmu.intel_ctrl |=
- ((1LL << x86_pmu.num_counters_fixed)-1) << X86_PMC_IDX_FIXED;
+ ((1LL << x86_pmu.num_counters_fixed)-1) << INTEL_PMC_IDX_FIXED;

if (!x86_pmu.counters_mask64)
x86_pmu.counters_mask64 = (1ULL << x86_pmu.num_counters) - 1;
@@ -1377,7 +1377,7 @@ static int __init init_hw_perf_events(void)
*/
for_each_event_constraint(c, x86_pmu.event_constraints) {
if (c->cmask != X86_RAW_EVENT_MASK
- || c->idxmsk64 == X86_PMC_MSK_FIXED_REF_CYCLES) {
+ || c->idxmsk64 == INTEL_PMC_MSK_FIXED_REF_CYCLES) {
continue;
}

@@ -1621,8 +1621,8 @@ static int x86_pmu_event_idx(struct perf_event *event)
if (!x86_pmu.attr_rdpmc)
return 0;

- if (x86_pmu.num_counters_fixed && idx >= X86_PMC_IDX_FIXED) {
- idx -= X86_PMC_IDX_FIXED;
+ if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
+ idx -= INTEL_PMC_IDX_FIXED;
idx |= 1 << 30;
}

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index e8c2eae..1eb9f00 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -747,7 +747,7 @@ static void intel_pmu_disable_all(void)

wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);

- if (test_bit(X86_PMC_IDX_FIXED_BTS, cpuc->active_mask))
+ if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask))
intel_pmu_disable_bts();

intel_pmu_pebs_disable_all();
@@ -763,9 +763,9 @@ static void intel_pmu_enable_all(int added)
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask);

- if (test_bit(X86_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
+ if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
struct perf_event *event =
- cpuc->events[X86_PMC_IDX_FIXED_BTS];
+ cpuc->events[INTEL_PMC_IDX_FIXED_BTS];

if (WARN_ON_ONCE(!event))
return;
@@ -871,7 +871,7 @@ static inline void intel_pmu_ack_status(u64 ack)

static void intel_pmu_disable_fixed(struct hw_perf_event *hwc)
{
- int idx = hwc->idx - X86_PMC_IDX_FIXED;
+ int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
u64 ctrl_val, mask;

mask = 0xfULL << (idx * 4);
@@ -886,7 +886,7 @@ static void intel_pmu_disable_event(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);

- if (unlikely(hwc->idx == X86_PMC_IDX_FIXED_BTS)) {
+ if (unlikely(hwc->idx == INTEL_PMC_IDX_FIXED_BTS)) {
intel_pmu_disable_bts();
intel_pmu_drain_bts_buffer();
return;
@@ -915,7 +915,7 @@ static void intel_pmu_disable_event(struct perf_event *event)

static void intel_pmu_enable_fixed(struct hw_perf_event *hwc)
{
- int idx = hwc->idx - X86_PMC_IDX_FIXED;
+ int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
u64 ctrl_val, bits, mask;

/*
@@ -949,7 +949,7 @@ static void intel_pmu_enable_event(struct perf_event *event)
struct hw_perf_event *hwc = &event->hw;
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);

- if (unlikely(hwc->idx == X86_PMC_IDX_FIXED_BTS)) {
+ if (unlikely(hwc->idx == INTEL_PMC_IDX_FIXED_BTS)) {
if (!__this_cpu_read(cpu_hw_events.enabled))
return;

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 026373e..629ae0b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -248,7 +248,7 @@ void reserve_ds_buffers(void)
*/

struct event_constraint bts_constraint =
- EVENT_CONSTRAINT(0, 1ULL << X86_PMC_IDX_FIXED_BTS, 0);
+ EVENT_CONSTRAINT(0, 1ULL << INTEL_PMC_IDX_FIXED_BTS, 0);

void intel_pmu_enable_bts(u64 config)
{
@@ -295,7 +295,7 @@ int intel_pmu_drain_bts_buffer(void)
u64 to;
u64 flags;
};
- struct perf_event *event = cpuc->events[X86_PMC_IDX_FIXED_BTS];
+ struct perf_event *event = cpuc->events[INTEL_PMC_IDX_FIXED_BTS];
struct bts_record *at, *top;
struct perf_output_handle handle;
struct perf_event_header header;
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index 1019049..f3e73b8 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -1325,7 +1325,7 @@ __init int p4_pmu_init(void)
unsigned int low, high;

/* If we get stripped -- indexing fails */
- BUILD_BUG_ON(ARCH_P4_MAX_CCCR > X86_PMC_MAX_GENERIC);
+ BUILD_BUG_ON(ARCH_P4_MAX_CCCR > INTEL_PMC_MAX_GENERIC);

rdmsr(MSR_IA32_MISC_ENABLE, low, high);
if (!(low & (1 << 7))) {
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 2e88438..9b7ec11 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -80,10 +80,10 @@ static inline struct kvm_pmc *get_fixed_pmc_idx(struct kvm_pmu *pmu, int idx)

static struct kvm_pmc *global_idx_to_pmc(struct kvm_pmu *pmu, int idx)
{
- if (idx < X86_PMC_IDX_FIXED)
+ if (idx < INTEL_PMC_IDX_FIXED)
return get_gp_pmc(pmu, MSR_P6_EVNTSEL0 + idx, MSR_P6_EVNTSEL0);
else
- return get_fixed_pmc_idx(pmu, idx - X86_PMC_IDX_FIXED);
+ return get_fixed_pmc_idx(pmu, idx - INTEL_PMC_IDX_FIXED);
}

void kvm_deliver_pmi(struct kvm_vcpu *vcpu)
@@ -291,7 +291,7 @@ static void reprogram_idx(struct kvm_pmu *pmu, int idx)
if (pmc_is_gp(pmc))
reprogram_gp_counter(pmc, pmc->eventsel);
else {
- int fidx = idx - X86_PMC_IDX_FIXED;
+ int fidx = idx - INTEL_PMC_IDX_FIXED;
reprogram_fixed_counter(pmc,
fixed_en_pmi(pmu->fixed_ctr_ctrl, fidx), fidx);
}
@@ -452,7 +452,7 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu)
return;

pmu->nr_arch_gp_counters = min((int)(entry->eax >> 8) & 0xff,
- X86_PMC_MAX_GENERIC);
+ INTEL_PMC_MAX_GENERIC);
pmu->counter_bitmask[KVM_PMC_GP] =
((u64)1 << ((entry->eax >> 16) & 0xff)) - 1;
bitmap_len = (entry->eax >> 24) & 0xff;
@@ -462,13 +462,13 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu)
pmu->nr_arch_fixed_counters = 0;
} else {
pmu->nr_arch_fixed_counters = min((int)(entry->edx & 0x1f),
- X86_PMC_MAX_FIXED);
+ INTEL_PMC_MAX_FIXED);
pmu->counter_bitmask[KVM_PMC_FIXED] =
((u64)1 << ((entry->edx >> 5) & 0xff)) - 1;
}

pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
- (((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED);
+ (((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED);
pmu->global_ctrl_mask = ~pmu->global_ctrl;
}

@@ -478,15 +478,15 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
struct kvm_pmu *pmu = &vcpu->arch.pmu;

memset(pmu, 0, sizeof(*pmu));
- for (i = 0; i < X86_PMC_MAX_GENERIC; i++) {
+ for (i = 0; i < INTEL_PMC_MAX_GENERIC; i++) {
pmu->gp_counters[i].type = KVM_PMC_GP;
pmu->gp_counters[i].vcpu = vcpu;
pmu->gp_counters[i].idx = i;
}
- for (i = 0; i < X86_PMC_MAX_FIXED; i++) {
+ for (i = 0; i < INTEL_PMC_MAX_FIXED; i++) {
pmu->fixed_counters[i].type = KVM_PMC_FIXED;
pmu->fixed_counters[i].vcpu = vcpu;
- pmu->fixed_counters[i].idx = i + X86_PMC_IDX_FIXED;
+ pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
}
init_irq_work(&pmu->irq_work, trigger_pmi);
kvm_pmu_cpuid_update(vcpu);
@@ -498,13 +498,13 @@ void kvm_pmu_reset(struct kvm_vcpu *vcpu)
int i;

irq_work_sync(&pmu->irq_work);
- for (i = 0; i < X86_PMC_MAX_GENERIC; i++) {
+ for (i = 0; i < INTEL_PMC_MAX_GENERIC; i++) {
struct kvm_pmc *pmc = &pmu->gp_counters[i];
stop_counter(pmc);
pmc->counter = pmc->eventsel = 0;
}

- for (i = 0; i < X86_PMC_MAX_FIXED; i++)
+ for (i = 0; i < INTEL_PMC_MAX_FIXED; i++)
stop_counter(&pmu->fixed_counters[i]);

pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status =
--
1.7.8.4

Subject: [PATCH 01/10] perf, amd: Rework northbridge event constraints handler

This is in preparation of family 15h northbridge event implementation.
Current code relies on continuous northbridge counter masks with no
holes. In family 15h northbridge counters are separate from other
counters and thus only a subset of all generic counters. This patch
changes the code so that it can later work with counter masks.

No functional changes.

Signed-off-by: Robert Richter <[email protected]>
---
arch/x86/kernel/cpu/perf_event_amd.c | 68 +++++++++++++---------------------
1 files changed, 26 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 11a4eb9..194ca0b 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -256,9 +256,8 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
struct amd_nb *nb = cpuc->amd_nb;
- struct perf_event *old = NULL;
- int max = x86_pmu.num_counters;
- int i, j, k = -1;
+ struct perf_event *old;
+ int idx, new = -1;

/*
* if not NB event or no NB, then no constraints
@@ -276,48 +275,33 @@ amd_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
* because of successive calls to x86_schedule_events() from
* hw_perf_group_sched_in() without hw_perf_enable()
*/
- for (i = 0; i < max; i++) {
- /*
- * keep track of first free slot
- */
- if (k == -1 && !nb->owners[i])
- k = i;
+ for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ if (new == -1 || hwc->idx == idx)
+ /* assign free slot, prefer hwc->idx */
+ old = cmpxchg(nb->owners + idx, NULL, event);
+ else if (nb->owners[idx] == event)
+ /* event already present */
+ old = event;
+ else
+ continue;
+
+ if (old && old != event)
+ continue;
+
+ /* reassign to this slot */
+ if (new != -1)
+ cmpxchg(nb->owners + new, event, NULL);
+ new = idx;

/* already present, reuse */
- if (nb->owners[i] == event)
- goto done;
- }
- /*
- * not present, so grab a new slot
- * starting either at:
- */
- if (hwc->idx != -1) {
- /* previous assignment */
- i = hwc->idx;
- } else if (k != -1) {
- /* start from free slot found */
- i = k;
- } else {
- /*
- * event not found, no slot found in
- * first pass, try again from the
- * beginning
- */
- i = 0;
- }
- j = i;
- do {
- old = cmpxchg(nb->owners+i, NULL, event);
- if (!old)
+ if (old == event)
break;
- if (++i == max)
- i = 0;
- } while (i != j);
-done:
- if (!old)
- return &nb->event_constraints[i];
-
- return &emptyconstraint;
+ }
+
+ if (new == -1)
+ return &emptyconstraint;
+
+ return &nb->event_constraints[new];
}

static struct amd_nb *amd_alloc_nb(int cpu)
--
1.7.8.4

2012-06-20 08:36:36

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

On Tue, Jun 19, 2012 at 8:10 PM, Robert Richter <[email protected]> wrote:
>
> This patch adds northbridge counter support for AMD family 15h cpus.
>
> The NB counter implementation and usage is in the same way as for
> family 10h. Thus a nb event can now be selected as any other
> performance counter event. As for family 10h the kernel selects only
> one NB PMC per node by using the nb constraint handler.
>
> Main part of this patch set is to rework current code in a way that
> bit masks for counters can be used. Also, Intel's fixed counters have
> been moved to Intel only code. This is since AMD nb counters start at
> index 32 which leads to holes in the counter mask and causes conflicts
> with fixed counters.
>

I dont' quite understand the design choice here. In Fam15h, there is a clean
design for the uncore PMU. It has its own distinct set of 4 counters. Unlike
Fam10h, where you program core counters to access the NB counters. So
why not like with Intel uncore, create a separate NB PMU which would
advertise its characteristics? That does not preclude re-using the existing
AMD-specific routines wherever possible. I think the advantage is that
muxing or starting/stopping of the core PMU would not affect uncore and
vice-versa for instance. Wouldn't this also alleviate the problems with
assigning indexes to uncore PMU counters?

> Another major change is the unification of AMD pmus and, where
> possible, a family independent feature check based on cpuid.
>
> It should also be mentioned that nb perfctrs do not support all bits
> in the config value, see patch #10.
>
> -Robert
>
>
>
> Robert Richter (10):
>  perf, amd: Rework northbridge event constraints handler
>  perf, x86: Rework counter reservation code
>  perf, x86: Use bitmasks for generic counters
>  perf, x86: Rename Intel specific macros
>  perf, x86: Move Intel specific code to intel_pmu_init()
>  perf, amd: Unify AMD's generic and family 15h pmus
>  perf, amd: Generalize northbridge constraints code for family 15h
>  perf, amd: Enable northbridge counters on family 15h
>  perf, x86: Improve debug output in check_hw_exists()
>  perf, amd: Check northbridge event config value
>
>  arch/x86/include/asm/cpufeature.h         |    2 +
>  arch/x86/include/asm/kvm_host.h           |    4 +-
>  arch/x86/include/asm/perf_event.h         |   26 ++-
>  arch/x86/kernel/cpu/perf_event.c          |  129 +++++------
>  arch/x86/kernel/cpu/perf_event.h          |    7 +
>  arch/x86/kernel/cpu/perf_event_amd.c      |  368
> +++++++++++++++++------------
>  arch/x86/kernel/cpu/perf_event_intel.c    |   65 +++++-
>  arch/x86/kernel/cpu/perf_event_intel_ds.c |    4 +-
>  arch/x86/kernel/cpu/perf_event_p4.c       |    8 +-
>  arch/x86/kvm/pmu.c                        |   22 +-
>  arch/x86/oprofile/op_model_amd.c          |    4 +-
>  11 files changed, 374 insertions(+), 265 deletions(-)
>
> --
> 1.7.8.4
>
>

2012-06-20 08:55:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

On Wed, 2012-06-20 at 10:36 +0200, Stephane Eranian wrote:
>
> I dont' quite understand the design choice here. In Fam15h, there is a clean
> design for the uncore PMU. It has its own distinct set of 4 counters. Unlike
> Fam10h, where you program core counters to access the NB counters. So
> why not like with Intel uncore, create a separate NB PMU which would
> advertise its characteristics? That does not preclude re-using the existing
> AMD-specific routines wherever possible. I think the advantage is that
> muxing or starting/stopping of the core PMU would not affect uncore and
> vice-versa for instance. Wouldn't this also alleviate the problems with
> assigning indexes to uncore PMU counters?

Quite agreed, it also avoids making a trainwreck of the counter rotation
on overload.

Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

Stephane,

On 20.06.12 10:23:48, Stephane Eranian wrote:
> I dont' quite understand the design choice here. In Fam15h, there is
> a clean design for the uncore PMU. It has its own distinct set of 4
> counters. Unlike Fam10h, where you program core counters to access
> the NB counters. So why not like with Intel uncore, create a
> separate NB PMU which would advertise its characteristics? That does
> not preclude re-using the existing AMD-specific routines wherever
> possible. I think the advantage is that muxing or starting/stopping
> of the core PMU would not affect uncore and vice-versa for
> instance. Wouldn't this also alleviate the problems with assigning
> indexes to uncore PMU counters?

I was thinking of creating a separate pmu too. There were 2
fundamental problems with it. First, the implementation would have
been different to the family 10h implementation. But besides the new
counter msr range and the per-node counter msrs there is not much
difference to perfctrs of family 10h and also family 15h. Otherwise NB
events would have been programmed different for both families.

Second, since nb perfctr are implemented the same way as core
counters, the same code would have been used. Thus multiple (two) x86
pmus (struct x86_pmu) would reside in parallel in the kernel. The
current implemenation is not designed for this. A complete rework of
the x86 perf implementation with impact to more code than this
implemetation was the main reason against that approach and for
choosing this design.

The main problem with my approach was the introduction of counter
masks and conflicts with Intel's fixed counter. I think my patches
address this in a clean way which also led to a bit code cleanup.
Another advantage I see is the unification of AMD pmus and a straight
AMD setup code that is not widely spread other multiple pmus. The
setup code uses cpuid and is family independent.

I generally could imagine to switch AMD NB implementation to an
uncore-like counter support. But then I would prefer a homogeneous
implementation over all AMD families. This would be a separate patch
set that is independent from family 15h nb counter implementation.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

2012-06-20 09:36:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 05/10] perf, x86: Move Intel specific code to intel_pmu_init()

On Tue, 2012-06-19 at 20:10 +0200, Robert Richter wrote:
> There is some Intel specific code in the generic x86 path. Move it to
> intel_pmu_init().
>
> Signed-off-by: Robert Richter <[email protected]>
> ---

> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> index 1eb9f00..90d7097 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -1760,7 +1760,7 @@ static __init void intel_nehalem_quirk(void)
> }
> }
>
> -__init int intel_pmu_init(void)
> +static __init int __intel_pmu_init(void)
> {
> union cpuid10_edx edx;
> union cpuid10_eax eax;
> @@ -1955,3 +1955,46 @@ __init int intel_pmu_init(void)
>
> return 0;
> }
> +
> +__init int intel_pmu_init(void)
> +{
> + struct event_constraint *c;
> + int ret = __intel_pmu_init();


This seems like a nice enough cleanup all on its own, but why make it
two functions?

2012-06-20 09:38:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

On Wed, 2012-06-20 at 11:29 +0200, Robert Richter wrote:
> Second, since nb perfctr are implemented the same way as core
> counters, the same code would have been used. Thus multiple (two) x86
> pmus (struct x86_pmu) would reside in parallel in the kernel.

Well, no. The I take it the uncore counters are nb wide, thus you need
special goo to make counter rotation work properly, x86_pmu is unsuited
for that.

2012-06-20 09:42:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

On Wed, 2012-06-20 at 11:29 +0200, Robert Richter wrote:
> But then I would prefer a homogeneous
> implementation over all AMD families. This would be a separate patch
> set that is independent from family 15h nb counter implementation.

I realize you would prefer that, but I would really like fam15h to do
the right thing and maybe see if its possible to rework the fam10h code
to approach that, instead of the other way around.

Fam10h really is somewhat ugly, it would be ashame to carry that ugly
into fam15h which actually did the right thing here.

Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

On 20.06.12 11:38:04, Peter Zijlstra wrote:
> On Wed, 2012-06-20 at 11:29 +0200, Robert Richter wrote:
> > Second, since nb perfctr are implemented the same way as core
> > counters, the same code would have been used. Thus multiple (two) x86
> > pmus (struct x86_pmu) would reside in parallel in the kernel.
>
> Well, no. The I take it the uncore counters are nb wide, thus you need
> special goo to make counter rotation work properly, x86_pmu is unsuited
> for that.

The code for nb and core counters is identical. There would be the
same nmi handler, same code to setup the event, same code to
start/stop cpus. The only difference are per-node msrs, even the msr
offset calculation is the same as for core counters on family 15h. It
would not make sense to duplicate all this code. And, as said, current
design does not fit to use x86_pmus in parallel or to easy reuse x86
functions. Separating nb counters would make the same sense as
implementing a separate pmu for fixed counters.

And wrt counter rotation, this only affects code to assign counters.
You don't need a separate pmu for this.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

2012-06-20 10:16:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

On Wed, 2012-06-20 at 12:00 +0200, Robert Richter wrote:
>
> And wrt counter rotation, this only affects code to assign counters.
> You don't need a separate pmu for this.

It makes it easier though, on unplug you would otherwise need to filter
what events are uncore and migrate those to another online cpu of the
same nb instead of all events..

Also, what cpu on the NB receives the PMI?

Sure it can be done, just not pretty. Combine that with all the other
special casing like patches 3 and 10 and one really starts to wonder if
its all worth it.

2012-06-20 10:46:24

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

On Wed, Jun 20, 2012 at 12:00 PM, Robert Richter <[email protected]> wrote:
> On 20.06.12 11:38:04, Peter Zijlstra wrote:
>> On Wed, 2012-06-20 at 11:29 +0200, Robert Richter wrote:
>> > Second, since nb perfctr are implemented the same way as core
>> > counters, the same code would have been used. Thus multiple (two) x86
>> > pmus (struct x86_pmu) would reside in parallel in the kernel.
>>
>> Well, no. The I take it the uncore counters are nb wide, thus you need
>> special goo to make counter rotation work properly, x86_pmu is unsuited
>> for that.
>
> The code for nb and core counters is identical. There would be the
> same nmi handler, same code to setup the event, same code to
> start/stop cpus. The only difference are per-node msrs, even the msr
> offset calculation is the same as for core counters on family 15h. It
> would not make sense to duplicate all this code. And, as said, current
> design does not fit to use x86_pmus in parallel or to easy reuse x86
> functions. Separating nb counters would make the same sense as
> implementing a separate pmu for fixed counters.
>
Being identical does not necessarily mean you have to copy the code,
you can also simply call it.

I don't see the explanation for the non-contiguous counter indexes.
What's that about? With a separate PMU, would you have that problem.
I see uncore CTL base MSRC001_0240, next is 0242, and so on. But
that's already the case with core counters on Fam15h.

As Peter said, having your own PMU would alleviate the need for
Patch 10. Those filters would simply not be visible to tools via
sysfs.

> And wrt counter rotation, this only affects code to assign counters.
> You don't need a separate pmu for this.
>
> -Robert
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>

Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

On 20.06.12 12:16:13, Peter Zijlstra wrote:
> Sure it can be done, just not pretty. Combine that with all the other
> special casing like patches 3 and 10 and one really starts to wonder if
> its all worth it.

I actually started writing the code by implementing a different pmu.
It turned out to be the wrong direction. The pmus would be almost
identical, just some different config values and a bit nb related
special code. But you can't really reuse the functions on a 2nd
running pmu, there are hard wired functions in the x86 pmu code and
x86_pmu ops do not fit for such a split. It would mean a complete
rework of x86 perf code. Really, I tried that already. And all this
effort just to implement nb counters? If someone is willing to help
here this would be ok, but I guess I would have to do all this on my
own. And to be fair, this effort was also not make for fixed counters,
pebs, bts, etc. Maybe the uncore implementation is different here, but
today is the first day the uncore patches are in tip.

I also do not see the advantage of a separate pmu. Just to have a
different msr base to avoid the use of counter masks and some
optimized pmu ops? Masks are wide spread used in the kernel and on x86
the bsf instruction takes not more than an increment. And switches in
the code paths to special nb code are not more expensive than other
switches for other special code.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

On 20.06.12 12:46:21, Stephane Eranian wrote:
> On Wed, Jun 20, 2012 at 12:00 PM, Robert Richter <[email protected]> wrote:
> > On 20.06.12 11:38:04, Peter Zijlstra wrote:
> >> On Wed, 2012-06-20 at 11:29 +0200, Robert Richter wrote:
> >> > Second, since nb perfctr are implemented the same way as core
> >> > counters, the same code would have been used. Thus multiple (two) x86
> >> > pmus (struct x86_pmu) would reside in parallel in the kernel.
> >>
> >> Well, no. The I take it the uncore counters are nb wide, thus you need
> >> special goo to make counter rotation work properly, x86_pmu is unsuited
> >> for that.
> >
> > The code for nb and core counters is identical. There would be the
> > same nmi handler, same code to setup the event, same code to
> > start/stop cpus. The only difference are per-node msrs, even the msr
> > offset calculation is the same as for core counters on family 15h. It
> > would not make sense to duplicate all this code. And, as said, current
> > design does not fit to use x86_pmus in parallel or to easy reuse x86
> > functions. Separating nb counters would make the same sense as
> > implementing a separate pmu for fixed counters.
> >
> Being identical does not necessarily mean you have to copy the code,
> you can also simply call it.

You can't use two x86_pmu in parallel in the kernel. Code is not
designed for this. The effort of changing the code to support this is
very high.

> I don't see the explanation for the non-contiguous counter indexes.
> What's that about? With a separate PMU, would you have that problem.
> I see uncore CTL base MSRC001_0240, next is 0242, and so on. But
> that's already the case with core counters on Fam15h.

The counters reside in msrs MSRC001_0200 to MSRC001_027f with two msrs
per counter. This is room for 64 counters. NB counters start at index
32 which is MSRC001_0240.

> As Peter said, having your own PMU would alleviate the need for
> Patch 10. Those filters would simply not be visible to tools via
> sysfs.

That's what I explained in an earlier thread about pmu descriptions in
sysfs. It is not possible to describe a complex pmu in sysfs. My
preference that time was the use of pmu ops in userland and not a
single generic pmu that is configured by sysfs.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

Subject: Re: [PATCH 05/10] perf, x86: Move Intel specific code to intel_pmu_init()

On 20.06.12 11:36:42, Peter Zijlstra wrote:
> On Tue, 2012-06-19 at 20:10 +0200, Robert Richter wrote:
> > There is some Intel specific code in the generic x86 path. Move it to
> > intel_pmu_init().
> >
> > Signed-off-by: Robert Richter <[email protected]>
> > ---
>
> > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
> > index 1eb9f00..90d7097 100644
> > --- a/arch/x86/kernel/cpu/perf_event_intel.c
> > +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> > @@ -1760,7 +1760,7 @@ static __init void intel_nehalem_quirk(void)
> > }
> > }
> >
> > -__init int intel_pmu_init(void)
> > +static __init int __intel_pmu_init(void)
> > {
> > union cpuid10_edx edx;
> > union cpuid10_eax eax;
> > @@ -1955,3 +1955,46 @@ __init int intel_pmu_init(void)
> >
> > return 0;
> > }
> > +
> > +__init int intel_pmu_init(void)
> > +{
> > + struct event_constraint *c;
> > + int ret = __intel_pmu_init();
>
>
> This seems like a nice enough cleanup all on its own, but why make it
> two functions?

Didn't know if checks are necessary after p6_pmu_init() and
p4_pmu_init(). I didn't want to touch the switch/case path containing
p6_pmu_init() and p4_pmu_init(). But it seems the p4 and p6 pmus don't
support fixed counters and have fix num_counter values. Thus we can
skip the checks that are moved from init_hw_perf_events() in that case
and leave intel_pmu_init() early.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

2012-06-20 15:54:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

On Wed, 2012-06-20 at 14:29 +0200, Robert Richter wrote:
> On 20.06.12 12:16:13, Peter Zijlstra wrote:
> > Sure it can be done, just not pretty. Combine that with all the other
> > special casing like patches 3 and 10 and one really starts to wonder if
> > its all worth it.
>
> I actually started writing the code by implementing a different pmu.
> It turned out to be the wrong direction. The pmus would be almost
> identical, just some different config values and a bit nb related
> special code. But you can't really reuse the functions on a 2nd
> running pmu, there are hard wired functions in the x86 pmu code and
> x86_pmu ops do not fit for such a split. It would mean a complete
> rework of x86 perf code. Really, I tried that already. And all this
> effort just to implement nb counters? If someone is willing to help
> here this would be ok, but I guess I would have to do all this on my
> own. And to be fair, this effort was also not make for fixed counters,
> pebs, bts, etc. Maybe the uncore implementation is different here, but
> today is the first day the uncore patches are in tip.

Yeah, the Intel uncore implements an entire new pmu. The code is a
little over the top because Intel went there and decided it was a good
thing to have numerous uncore pmus instead of 1, some in PCI space some
in MSR space.

Still their programming is similar to the core ones -- just like for
AMD.

Yeah, there's a little bit of 'duplicated' code, but that's unavoidable.

> I also do not see the advantage of a separate pmu. Just to have a
> different msr base to avoid the use of counter masks and some
> optimized pmu ops? Masks are wide spread used in the kernel and on x86
> the bsf instruction takes not more than an increment. And switches in
> the code paths to special nb code are not more expensive than other
> switches for other special code.

Well, as it stands this thing is almost certainly doing things wrong. An
uncore pmu wants to put all events for the same NB on the same cpu, not
on whatever cpu they are registered, otherwise event rotation doesn't
work right.

It also wants to migrate events to another cpu if the designated cpu
gets unplugged but there's still active cpus on the NB.

Furthermore, if the uncore does PMI, you want PMI steering, if it
doesn't do PMIs you want to poll the thing to avoid overflowing the
counter.

/me rummages on the interwebs to find the BKDG for Fam15h..

OK, it looks like it does do PMI and it broadcast interrupts to the
entire NB.. ok so that wants special magic too -- you might even want to
disallow sampling on the thing until someone has a good use-case for
that -- but you still need the PMI to deal with the counter overflow
stuff.


2012-06-20 16:09:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

On Wed, 2012-06-20 at 17:54 +0200, Peter Zijlstra wrote:
> Yeah, there's a little bit of 'duplicated' code, but that's
> unavoidable.

Also, you don't need to replicate the entire x86_pmu thing, most of that
is trying to share stuff between the various x86 core pmu things, like
p6,p4,intel,amd etc.. If you only support the one amd uncore things
become a lot easier.

2012-06-20 16:21:56

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 00/10] perf, x86: Add northbridge counter support for AMD family 15h

On Wed, Jun 20, 2012 at 5:54 PM, Peter Zijlstra <[email protected]> wrote:
> On Wed, 2012-06-20 at 14:29 +0200, Robert Richter wrote:
>> On 20.06.12 12:16:13, Peter Zijlstra wrote:
>> > Sure it can be done, just not pretty. Combine that with all the other
>> > special casing like patches 3 and 10 and one really starts to wonder if
>> > its all worth it.
>>
>> I actually started writing the code by implementing a different pmu.
>> It turned out to be the wrong direction. The pmus would be almost
>> identical, just some different config values and a bit nb related
>> special code. But you can't really reuse the functions on a 2nd
>> running pmu, there are hard wired functions in the x86 pmu code and
>> x86_pmu ops do not fit for such a split. It would mean a complete
>> rework of x86 perf code. Really, I tried that already. And all this
>> effort just to implement nb counters? If someone is willing to help
>> here this would be ok, but I guess I would have to do all this on my
>> own. And to be fair, this effort was also not make for fixed counters,
>> pebs, bts, etc. Maybe the uncore implementation is different here, but
>> today is the first day the uncore patches are in tip.
>
> Yeah, the Intel uncore implements an entire new pmu. The code is a
> little over the top because Intel went there and decided it was a good
> thing to have numerous uncore pmus instead of 1, some in PCI space some
> in MSR space.
>
> Still their programming is similar to the core ones -- just like for
> AMD.
>
> Yeah, there's a little bit of 'duplicated' code, but that's unavoidable.
>
>> I also do not see the advantage of a separate pmu. Just to have a
>> different msr base to avoid the use of counter masks and some
>> optimized pmu ops? Masks are wide spread used in the kernel and on x86
>> the bsf instruction takes not more than an increment. And switches in
>> the code paths to special nb code are not more expensive than other
>> switches for other special code.
>
> Well, as it stands this thing is almost certainly doing things wrong. An
> uncore pmu wants to put all events for the same NB on the same cpu, not
> on whatever cpu they are registered, otherwise event rotation doesn't
> work right.
>
> It also wants to migrate events to another cpu if the designated cpu
> gets unplugged but there's still active cpus on the NB.
>
> Furthermore, if the uncore does PMI, you want PMI steering, if it
> doesn't do PMIs you want to poll the thing to avoid overflowing the
> counter.
>
> /me rummages on the interwebs to find the BKDG for Fam15h..
>
> OK, it looks like it does do PMI and it broadcast interrupts to the
> entire NB.. ok so that wants special magic too -- you might even want to
> disallow sampling on the thing until someone has a good use-case for
> that -- but you still need the PMI to deal with the counter overflow
> stuff.
>
I do have a good use-case for the broadcast interrupt especially
if the uncore is capable of counting some form of cycles. That
interrupt can be used to provide a unique vantage point across
all the CPUs. We could relative easily discover what each CPU
is doing at any one time almost with very good synchronization.
Of course, a lot of plumbing would be needed to gather the IPs
from all the CPUS into the single sampling buffer or maybe one
per CPU. If a CPU knows it is a possible target of uncore PMI,
it should not discard the interrupt, it should process it.

The other thing I don't know about the AMD uncore is whether
or not it does deliver the PMI in case the core(s) are halted.
Robert, any info on this in particular?