2013-03-21 20:00:06

by Andi Kleen

[permalink] [raw]
Subject: Basic perf PMU support for Haswell v10

This is based on v7 of the full Haswell PMU support,
rebased, and stripped down to the bare bones

Most interesting new features are not in this patchkit
(full version is git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git hsw/pmu5)

Contains support for:
- Basic Haswell PMU and PEBS support
- Late unmasking of the PMI
- Basic LBRv4 support

v2: Addressed Stephane's feedback. See individual patches for details.
v3: now even more bite-sized. Qualifier constraints merged earlier.
v4: Rename some variables, add some comments and other minor changes.
Add some Reviewed/Tested-bys.
v5: Address some minor review feedback. Port to latest perf/core
v6: Add just some variable names, add comments, edit descriptions, some
more testing, rebased to latest perf/core
v7: Expand comment
v8: Rename structure field.
v9: No wide counters, but add basic LBRs. Add some more
constraints. Rebase to 3.9rc1
v10: Change some whitespace. Rebase to 3.9rc3

-Andi


2013-03-21 20:00:07

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 3/5] perf, x86: Basic Haswell PEBS support v4

From: Andi Kleen <[email protected]>

Add basic PEBS support for Haswell.
The constraints are similar to SandyBridge with a few new events.

v2: Readd missing pebs_aliases
v3: Readd missing hunk. Fix some constraints.
v4: Fix typo in PEBS event table (Stephane Eranian)
Reviewed-by: Stephane Eranian <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/perf_event.h | 2 +
arch/x86/kernel/cpu/perf_event_intel.c | 6 +++-
arch/x86/kernel/cpu/perf_event_intel_ds.c | 34 +++++++++++++++++++++++++++++
3 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index a356350..4f6b97d 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -591,6 +591,8 @@ extern struct event_constraint intel_snb_pebs_event_constraints[];

extern struct event_constraint intel_ivb_pebs_event_constraints[];

+extern struct event_constraint intel_hsw_pebs_event_constraints[];
+
struct event_constraint *intel_pebs_constraints(struct perf_event *event);

void intel_pmu_pebs_enable(struct perf_event *event);
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 5ecaef0..c5262ab 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -853,7 +853,8 @@ static inline bool intel_pmu_needs_lbr_smpl(struct perf_event *event)
return true;

/* implicit branch sampling to correct PEBS skid */
- if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1)
+ if (x86_pmu.intel_cap.pebs_trap && event->attr.precise_ip > 1 &&
+ x86_pmu.intel_cap.pebs_format < 2)
return true;

return false;
@@ -2200,8 +2201,9 @@ __init int intel_pmu_init(void)
intel_pmu_lbr_init_snb();

x86_pmu.event_constraints = intel_hsw_event_constraints;
-
+ x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
x86_pmu.extra_regs = intel_snb_extra_regs;
+ x86_pmu.pebs_aliases = intel_pebs_aliases_snb;
/* all extra regs are per-cpu when HT is on */
x86_pmu.er_flags |= ERF_HAS_RSP_1;
x86_pmu.er_flags |= ERF_NO_HT_SHARING;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index a7ab4db..aa4361b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -437,6 +437,40 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = {
EVENT_CONSTRAINT_END
};

+struct event_constraint intel_hsw_pebs_event_constraints[] = {
+ INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PRECDIST */
+ INTEL_UEVENT_CONSTRAINT(0x01c2, 0xf), /* UOPS_RETIRED.ALL */
+ INTEL_UEVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
+ INTEL_EVENT_CONSTRAINT(0xc4, 0xf), /* BR_INST_RETIRED.* */
+ INTEL_UEVENT_CONSTRAINT(0x01c5, 0xf), /* BR_MISP_RETIRED.CONDITIONAL */
+ INTEL_UEVENT_CONSTRAINT(0x04c5, 0xf), /* BR_MISP_RETIRED.ALL_BRANCHES */
+ INTEL_UEVENT_CONSTRAINT(0x20c5, 0xf), /* BR_MISP_RETIRED.NEAR_TAKEN */
+ INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.* */
+ /* MEM_UOPS_RETIRED.STLB_MISS_LOADS */
+ INTEL_UEVENT_CONSTRAINT(0x11d0, 0xf),
+ /* MEM_UOPS_RETIRED.STLB_MISS_STORES */
+ INTEL_UEVENT_CONSTRAINT(0x12d0, 0xf),
+ INTEL_UEVENT_CONSTRAINT(0x21d0, 0xf), /* MEM_UOPS_RETIRED.LOCK_LOADS */
+ INTEL_UEVENT_CONSTRAINT(0x41d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_LOADS */
+ INTEL_UEVENT_CONSTRAINT(0x42d0, 0xf), /* MEM_UOPS_RETIRED.SPLIT_STORES */
+ INTEL_UEVENT_CONSTRAINT(0x81d0, 0xf), /* MEM_UOPS_RETIRED.ALL_LOADS */
+ INTEL_UEVENT_CONSTRAINT(0x82d0, 0xf), /* MEM_UOPS_RETIRED.ALL_STORES */
+ INTEL_UEVENT_CONSTRAINT(0x01d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L1_HIT */
+ INTEL_UEVENT_CONSTRAINT(0x02d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L2_HIT */
+ INTEL_UEVENT_CONSTRAINT(0x04d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.L3_HIT */
+ INTEL_UEVENT_CONSTRAINT(0x40d1, 0xf), /* MEM_LOAD_UOPS_RETIRED.HIT_LFB */
+ /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_MISS */
+ INTEL_UEVENT_CONSTRAINT(0x01d2, 0xf),
+ /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT */
+ INTEL_UEVENT_CONSTRAINT(0x02d2, 0xf),
+ /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.LOCAL_DRAM */
+ INTEL_UEVENT_CONSTRAINT(0x01d3, 0xf),
+ INTEL_UEVENT_CONSTRAINT(0x04c8, 0xf), /* HLE_RETIRED.Abort */
+ INTEL_UEVENT_CONSTRAINT(0x04c9, 0xf), /* RTM_RETIRED.Abort */
+
+ EVENT_CONSTRAINT_END
+};
+
struct event_constraint *intel_pebs_constraints(struct perf_event *event)
{
struct event_constraint *c;
--
1.7.7.6

2013-03-21 20:00:33

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 4/5] perf, x86: Move NMI clearing to end of PMI handler after the counter registers are reset

From: Andi Kleen <[email protected]>

This avoids some problems with spurious PMIs on Haswell.
Haswell seems to behave more like P4 in this regard. Do
the same thing as the P4 perf handler by unmasking
the NMI only at the end. Shouldn't make any difference
for earlier family 6 cores.

Tested on Haswell, IvyBridge, Westmere, Saltwell (Atom)

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel.c | 16 ++++++----------
1 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index c5262ab..d440a4f 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1148,16 +1148,6 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)

cpuc = &__get_cpu_var(cpu_hw_events);

- /*
- * Some chipsets need to unmask the LVTPC in a particular spot
- * inside the nmi handler. As a result, the unmasking was pushed
- * into all the nmi handlers.
- *
- * This handler doesn't seem to have any issues with the unmasking
- * so it was left at the top.
- */
- apic_write(APIC_LVTPC, APIC_DM_NMI);
-
intel_pmu_disable_all();
handled = intel_pmu_drain_bts_buffer();
status = intel_pmu_get_status();
@@ -1217,6 +1207,12 @@ again:

done:
intel_pmu_enable_all(0);
+ /*
+ * Only unmask the NMI after the overflow counters
+ * have been reset. This avoids spurious NMIs on
+ * Haswell CPUs.
+ */
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
return handled;
}

--
1.7.7.6

2013-03-21 20:00:04

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 1/5] perf, x86: Add Haswell PEBS record support v5

From: Andi Kleen <[email protected]>

Add support for the Haswell extended (fmt2) PEBS format.

It has a superset of the nhm (fmt1) PEBS fields, but has a longer record so
we need to adjust the code paths.

The main advantage is the new "EventingRip" support which directly
gives the instruction, not off-by-one instruction. So with precise == 2
we use that directly and don't try to use LBRs and walking basic blocks.
This lowers the overhead of using precise significantly.

Some other features are added in later patches.

Reviewed-by: Stephane Eranian <[email protected]>
v2: Rename various identifiers. Add more comments. Get rid of a cast.
v3: fmt2->hsw rename
v4: ip_of_the_event->real_ip rename
v5: use pr_cont. white space changes.
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 3 +-
arch/x86/kernel/cpu/perf_event_intel_ds.c | 114 +++++++++++++++++++++++------
2 files changed, 93 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index bf0f01a..758f3fd 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -397,7 +397,8 @@ int x86_pmu_hw_config(struct perf_event *event)
* check that PEBS LBR correction does not conflict with
* whatever the user is asking with attr->branch_sample_type
*/
- if (event->attr.precise_ip > 1) {
+ if (event->attr.precise_ip > 1 &&
+ x86_pmu.intel_cap.pebs_format < 2) {
u64 *br_type = &event->attr.branch_sample_type;

if (has_branch_stack(event)) {
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index b05a575..a7ab4db 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -41,6 +41,22 @@ struct pebs_record_nhm {
u64 status, dla, dse, lat;
};

+/*
+ * Same as pebs_record_nhm, with two additional fields.
+ */
+struct pebs_record_hsw {
+ struct pebs_record_nhm nhm;
+ /*
+ * Real IP of the event. In the Intel documentation this
+ * is called eventingrip.
+ */
+ u64 real_ip;
+ /*
+ * TSX tuning information field: abort cycles and abort flags.
+ */
+ u64 tsx_tuning;
+};
+
void init_debug_store_on_cpu(int cpu)
{
struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
@@ -559,11 +575,11 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
{
/*
* We cast to pebs_record_core since that is a subset of
- * both formats and we don't use the other fields in this
- * routine.
+ * all formats.
*/
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct pebs_record_core *pebs = __pebs;
+ struct pebs_record_hsw *pebs_hsw = __pebs;
struct perf_sample_data data;
struct pt_regs regs;

@@ -588,7 +604,10 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
regs.bp = pebs->bp;
regs.sp = pebs->sp;

- if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip(&regs))
+ if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format >= 2) {
+ regs.ip = pebs_hsw->real_ip;
+ regs.flags |= PERF_EFLAGS_EXACT;
+ } else if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip(&regs))
regs.flags |= PERF_EFLAGS_EXACT;
else
regs.flags &= ~PERF_EFLAGS_EXACT;
@@ -641,35 +660,22 @@ static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
__intel_pmu_pebs_event(event, iregs, at);
}

-static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
+static void __intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, void *at,
+ void *top)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct debug_store *ds = cpuc->ds;
- struct pebs_record_nhm *at, *top;
struct perf_event *event = NULL;
u64 status = 0;
- int bit, n;
-
- if (!x86_pmu.pebs_active)
- return;
-
- at = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
- top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;
+ int bit;

ds->pebs_index = ds->pebs_buffer_base;

- n = top - at;
- if (n <= 0)
- return;
-
- /*
- * Should not happen, we program the threshold at 1 and do not
- * set a reset value.
- */
- WARN_ONCE(n > x86_pmu.max_pebs_events, "Unexpected number of pebs records %d\n", n);
+ for ( ; at < top; at += x86_pmu.pebs_record_size) {
+ struct pebs_record_nhm *p = at;

- for ( ; at < top; at++) {
- for_each_set_bit(bit, (unsigned long *)&at->status, x86_pmu.max_pebs_events) {
+ for_each_set_bit(bit, (unsigned long *)&p->status,
+ x86_pmu.max_pebs_events) {
event = cpuc->events[bit];
if (!test_bit(bit, cpuc->active_mask))
continue;
@@ -692,6 +698,61 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
}
}

+static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct debug_store *ds = cpuc->ds;
+ struct pebs_record_nhm *at, *top;
+ int n;
+
+ if (!x86_pmu.pebs_active)
+ return;
+
+ at = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
+ top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;
+
+ ds->pebs_index = ds->pebs_buffer_base;
+
+ n = top - at;
+ if (n <= 0)
+ return;
+
+ /*
+ * Should not happen, we program the threshold at 1 and do not
+ * set a reset value.
+ */
+ WARN_ONCE(n > x86_pmu.max_pebs_events,
+ "Unexpected number of pebs records %d\n", n);
+
+ return __intel_pmu_drain_pebs_nhm(iregs, at, top);
+}
+
+static void intel_pmu_drain_pebs_hsw(struct pt_regs *iregs)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct debug_store *ds = cpuc->ds;
+ struct pebs_record_hsw *at, *top;
+ int n;
+
+ if (!x86_pmu.pebs_active)
+ return;
+
+ at = (struct pebs_record_hsw *)(unsigned long)ds->pebs_buffer_base;
+ top = (struct pebs_record_hsw *)(unsigned long)ds->pebs_index;
+
+ n = top - at;
+ if (n <= 0)
+ return;
+ /*
+ * Should not happen, we program the threshold at 1 and do not
+ * set a reset value.
+ */
+ WARN_ONCE(n > x86_pmu.max_pebs_events,
+ "Unexpected number of pebs records %d\n", n);
+
+ return __intel_pmu_drain_pebs_nhm(iregs, at, top);
+}
+
/*
* BTS, PEBS probe and setup
*/
@@ -723,6 +784,13 @@ void intel_ds_init(void)
x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm;
break;

+ case 2:
+ pr_cont("PEBS fmt2%c, ", pebs_type);
+ x86_pmu.pebs_record_size =
+ sizeof(struct pebs_record_hsw);
+ x86_pmu.drain_pebs = intel_pmu_drain_pebs_hsw;
+ break;
+
default:
printk(KERN_CONT "no PEBS fmt%d%c, ", format, pebs_type);
x86_pmu.pebs = 0;
--
1.7.7.6

2013-03-21 20:01:14

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 2/5] perf, x86: Basic Haswell PMU support v7

From: Andi Kleen <[email protected]>

Add basic Haswell PMU support.

Similar to SandyBridge, but has a few new events and two
new counter bits.

There are some new counter flags that need to be prevented
from being set on fixed counters, and allowed to be set
for generic counters.

Also we add support for the counter 2 constraint to handle
all raw events.

Contains fixes from Stephane Eranian

v2: Folded TSX bits into standard FIXED_EVENT_CONSTRAINTS
v3: Use SNB LBR init code. Comment fix (Stephane Eranian)
v4: Add the counter2 constraints. Fix comment in the right place.
v5: Expand comment
v6: Add CYCLE_ACTIVITY.* to counter constraints
v7: Follow Linux style, not perf style
Reviewed-by: Stephane Eranian <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/include/asm/perf_event.h | 3 +
arch/x86/kernel/cpu/perf_event.h | 5 ++-
arch/x86/kernel/cpu/perf_event_intel.c | 79 ++++++++++++++++++++++++++++++++
3 files changed, 86 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 57cb634..b79b6eb 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -29,6 +29,9 @@
#define ARCH_PERFMON_EVENTSEL_INV (1ULL << 23)
#define ARCH_PERFMON_EVENTSEL_CMASK 0xFF000000ULL

+#define HSW_INTX (1ULL << 32)
+#define HSW_INTX_CHECKPOINTED (1ULL << 33)
+
#define AMD64_EVENTSEL_INT_CORE_ENABLE (1ULL << 36)
#define AMD64_EVENTSEL_GUESTONLY (1ULL << 40)
#define AMD64_EVENTSEL_HOSTONLY (1ULL << 41)
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 7f5c75c..a356350 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -219,11 +219,14 @@ struct cpu_hw_events {
* - inv
* - edge
* - cnt-mask
+ * - intx
+ * - intx_checkpointed
* The other filters are supported by fixed counters.
* The any-thread option is supported starting with v3.
*/
+#define FIXED_EVENT_FLAGS (X86_RAW_EVENT_MASK|HSW_INTX|HSW_INTX_CHECKPOINTED)
#define FIXED_EVENT_CONSTRAINT(c, n) \
- EVENT_CONSTRAINT(c, (1ULL << (32+n)), X86_RAW_EVENT_MASK)
+ EVENT_CONSTRAINT(c, (1ULL << (32+n)), FIXED_EVENT_FLAGS)

/*
* Constraint on the Event code + UMask
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 529c893..5ecaef0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -13,6 +13,7 @@
#include <linux/slab.h>
#include <linux/export.h>

+#include <asm/cpufeature.h>
#include <asm/hardirq.h>
#include <asm/apic.h>

@@ -154,6 +155,22 @@ static struct extra_reg intel_snb_extra_regs[] __read_mostly = {
EVENT_EXTRA_END
};

+static struct event_constraint intel_hsw_event_constraints[] = {
+ FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
+ FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
+ FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */
+ INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */
+ INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */
+ INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */
+ /* CYCLE_ACTIVITY.CYCLES_L1D_PENDING */
+ INTEL_EVENT_CONSTRAINT(0x08a3, 0x4),
+ /* CYCLE_ACTIVITY.STALLS_L1D_PENDING */
+ INTEL_EVENT_CONSTRAINT(0x0ca3, 0x4),
+ /* CYCLE_ACTIVITY.CYCLES_NO_EXECUTE */
+ INTEL_EVENT_CONSTRAINT(0x04a3, 0xf),
+ EVENT_CONSTRAINT_END
+};
+
static u64 intel_pmu_event_map(int hw_event)
{
return intel_perfmon_event_map[hw_event];
@@ -1606,6 +1623,48 @@ static void core_pmu_enable_all(int added)
}
}

+static int hsw_hw_config(struct perf_event *event)
+{
+ int ret = intel_pmu_hw_config(event);
+
+ if (ret)
+ return ret;
+ if (!boot_cpu_has(X86_FEATURE_RTM) && !boot_cpu_has(X86_FEATURE_HLE))
+ return 0;
+ event->hw.config |= event->attr.config &
+ (HSW_INTX|HSW_INTX_CHECKPOINTED);
+
+ /*
+ * INTX/INTX-CP filters are not supported by the Haswell PMU with
+ * PEBS or in ANY thread mode. Since the results are non-sensical forbid
+ * this combination.
+ */
+ if ((event->hw.config & (HSW_INTX|HSW_INTX_CHECKPOINTED)) &&
+ ((event->hw.config & ARCH_PERFMON_EVENTSEL_ANY) ||
+ event->attr.precise_ip > 0))
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+static struct event_constraint counter2_constraint =
+ EVENT_CONSTRAINT(0, 0x4, 0);
+
+static struct event_constraint *
+hsw_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
+{
+ struct event_constraint *c = intel_get_event_constraints(cpuc, event);
+
+ /* Handle special quirk on intx_checkpointed only in counter 2 */
+ if (event->hw.config & HSW_INTX_CHECKPOINTED) {
+ if (c->idxmsk64 & (1U << 2))
+ return &counter2_constraint;
+ return &emptyconstraint;
+ }
+
+ return c;
+}
+
PMU_FORMAT_ATTR(event, "config:0-7" );
PMU_FORMAT_ATTR(umask, "config:8-15" );
PMU_FORMAT_ATTR(edge, "config:18" );
@@ -2132,6 +2191,26 @@ __init int intel_pmu_init(void)
break;


+ case 60: /* Haswell Client */
+ case 70:
+ case 71:
+ memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
+ sizeof(hw_cache_event_ids));
+
+ intel_pmu_lbr_init_snb();
+
+ x86_pmu.event_constraints = intel_hsw_event_constraints;
+
+ x86_pmu.extra_regs = intel_snb_extra_regs;
+ /* all extra regs are per-cpu when HT is on */
+ x86_pmu.er_flags |= ERF_HAS_RSP_1;
+ x86_pmu.er_flags |= ERF_NO_HT_SHARING;
+
+ x86_pmu.hw_config = hsw_hw_config;
+ x86_pmu.get_event_constraints = hsw_get_event_constraints;
+ pr_cont("Haswell events, ");
+ break;
+
default:
switch (x86_pmu.version) {
case 1:
--
1.7.7.6

2013-03-21 20:01:32

by Andi Kleen

[permalink] [raw]
Subject: [PATCH 5/5] perf, x86: Support Haswell v4 LBR format

From: Andi Kleen <[email protected]>

Haswell has two additional LBR from flags for TSX: intx and abort, implemented
as a new v4 version of the LBR format.

Handle those in and adjust the sign extension code to still correctly extend.
The flags are exported similarly in the LBR record to the existing misprediction
flag

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_lbr.c | 18 +++++++++++++++---
include/linux/perf_event.h | 7 ++++++-
2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
index da02e9c..2af6695b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
@@ -12,6 +12,7 @@ enum {
LBR_FORMAT_LIP = 0x01,
LBR_FORMAT_EIP = 0x02,
LBR_FORMAT_EIP_FLAGS = 0x03,
+ LBR_FORMAT_EIP_FLAGS2 = 0x04,
};

/*
@@ -56,6 +57,8 @@ enum {
LBR_FAR)

#define LBR_FROM_FLAG_MISPRED (1ULL << 63)
+#define LBR_FROM_FLAG_INTX (1ULL << 62)
+#define LBR_FROM_FLAG_ABORT (1ULL << 61)

#define for_each_branch_sample_type(x) \
for ((x) = PERF_SAMPLE_BRANCH_USER; \
@@ -270,21 +273,30 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)

for (i = 0; i < x86_pmu.lbr_nr; i++) {
unsigned long lbr_idx = (tos - i) & mask;
- u64 from, to, mis = 0, pred = 0;
+ u64 from, to, mis = 0, pred = 0, intx = 0, abort = 0;

rdmsrl(x86_pmu.lbr_from + lbr_idx, from);
rdmsrl(x86_pmu.lbr_to + lbr_idx, to);

- if (lbr_format == LBR_FORMAT_EIP_FLAGS) {
+ if (lbr_format == LBR_FORMAT_EIP_FLAGS ||
+ lbr_format == LBR_FORMAT_EIP_FLAGS2) {
mis = !!(from & LBR_FROM_FLAG_MISPRED);
pred = !mis;
- from = (u64)((((s64)from) << 1) >> 1);
+ if (lbr_format == LBR_FORMAT_EIP_FLAGS)
+ from = (u64)((((s64)from) << 1) >> 1);
+ else if (lbr_format == LBR_FORMAT_EIP_FLAGS2) {
+ intx = !!(from & LBR_FROM_FLAG_INTX);
+ abort = !!(from & LBR_FROM_FLAG_ABORT);
+ from = (u64)((((s64)from) << 3) >> 3);
+ }
}

cpuc->lbr_entries[i].from = from;
cpuc->lbr_entries[i].to = to;
cpuc->lbr_entries[i].mispred = mis;
cpuc->lbr_entries[i].predicted = pred;
+ cpuc->lbr_entries[i].intx = intx;
+ cpuc->lbr_entries[i].abort = abort;
cpuc->lbr_entries[i].reserved = 0;
}
cpuc->lbr_stack.nr = i;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1d795df..e1b975a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -74,13 +74,18 @@ struct perf_raw_record {
*
* support for mispred, predicted is optional. In case it
* is not supported mispred = predicted = 0.
+ *
+ * intx: running in a hardware transaction
+ * abort: aborting a hardware transaction
*/
struct perf_branch_entry {
__u64 from;
__u64 to;
__u64 mispred:1, /* target mispredicted */
predicted:1,/* target predicted */
- reserved:62;
+ intx:1, /* in transaction */
+ abort:1, /* transaction abort */
+ reserved:60;
};

/*
--
1.7.7.6

2013-03-28 15:13:15

by Stephane Eranian

[permalink] [raw]
Subject: Re: Basic perf PMU support for Haswell v10

On Thu, Mar 21, 2013 at 8:59 PM, Andi Kleen <[email protected]> wrote:
> This is based on v7 of the full Haswell PMU support,
> rebased, and stripped down to the bare bones
>
> Most interesting new features are not in this patchkit
> (full version is git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git hsw/pmu5)
>
> Contains support for:
> - Basic Haswell PMU and PEBS support
> - Late unmasking of the PMI
> - Basic LBRv4 support
>
> v2: Addressed Stephane's feedback. See individual patches for details.
> v3: now even more bite-sized. Qualifier constraints merged earlier.
> v4: Rename some variables, add some comments and other minor changes.
> Add some Reviewed/Tested-bys.
> v5: Address some minor review feedback. Port to latest perf/core
> v6: Add just some variable names, add comments, edit descriptions, some
> more testing, rebased to latest perf/core
> v7: Expand comment
> v8: Rename structure field.
> v9: No wide counters, but add basic LBRs. Add some more
> constraints. Rebase to 3.9rc1
> v10: Change some whitespace. Rebase to 3.9rc3
>
I have not seen any movement on this series.
What's wrong with it this time?

2013-04-05 06:47:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: Basic perf PMU support for Haswell v10


* Andi Kleen <[email protected]> wrote:

> This is based on v7 of the full Haswell PMU support,

This series looks mostly good - I've got two requests:

- please rename intx -> in_tx, INTX -> IN_TX, as 'intx' is confusing

- please port to the latest tip:master (or perf/core), as the underlying
code changed meanwhile, creating conflicts.

Thanks,

Ingo

2013-04-15 10:36:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: Basic perf PMU support for Haswell v10


* Ingo Molnar <[email protected]> wrote:

>
> * Andi Kleen <[email protected]> wrote:
>
> > This is based on v7 of the full Haswell PMU support,
>
> This series looks mostly good - I've got two requests:
>
> - please rename intx -> in_tx, INTX -> IN_TX, as 'intx' is confusing
>
> - please port to the latest tip:master (or perf/core), as the underlying
> code changed meanwhile, creating conflicts.

Any update here? If not done this week then you'll possibly miss v3.10 as
well.

Thanks,

Ingo

2013-05-27 13:01:19

by Stephane Eranian

[permalink] [raw]
Subject: Re: Basic perf PMU support for Haswell v10

On Mon, Apr 15, 2013 at 12:36 PM, Ingo Molnar <[email protected]> wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
>>
>> * Andi Kleen <[email protected]> wrote:
>>
>> > This is based on v7 of the full Haswell PMU support,
>>
>> This series looks mostly good - I've got two requests:
>>
>> - please rename intx -> in_tx, INTX -> IN_TX, as 'intx' is confusing
>>
>> - please port to the latest tip:master (or perf/core), as the underlying
>> code changed meanwhile, creating conflicts.
>
> Any update here? If not done this week then you'll possibly miss v3.10 as
> well.
>
Over a month and no update on this. This is becoming ridiculous.
This needs to go in for 3.10. Shall I do the rebase myself?

2013-05-27 19:18:36

by Andi Kleen

[permalink] [raw]
Subject: Re: Basic perf PMU support for Haswell v10

> Over a month and no update on this.

Ah I posted on May 22, but looks like the mails got stuck in a mail
queue. Thanks for the reminder. Will repost.

Also always available at
git://git.kernel.org/pub/scm/linux/kernel/people/ak/linux-misc hsw/pmu6

-Andi

2013-05-31 13:02:37

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH 1/5] perf, x86: Add Haswell PEBS record support v5

Hi,

So looked at this patch again. There is nothing wrong with the code.
But there is something bothering me with the usage model. I think
the choice of having pebs->ip for precise=1 or pebs->real_ip for
precise=2 is too restrictive.

There are situations where you want BOTH the real_ip AND the
off-by-one ip. This is when you're sampling call branches.
The real_ip gives you the call site, the off-by-one gives you
the target of the branch. This is very handy because you do
not need to use the LBR to get this, unlike with SandyBridge.

Yet in the patch, I don't see a way to get this by simply tweaking the
precise= parameter. And I want this feature because I need it for
function value profiling support.

So we need to find an extension or a way to return both IPs
without invoking LBR. Easiest would be to add another
PERF_SAMPLE_*.

Any better idea?


On Thu, Mar 21, 2013 at 8:59 PM, Andi Kleen <[email protected]> wrote:
> From: Andi Kleen <[email protected]>
>
> Add support for the Haswell extended (fmt2) PEBS format.
>
> It has a superset of the nhm (fmt1) PEBS fields, but has a longer record so
> we need to adjust the code paths.
>
> The main advantage is the new "EventingRip" support which directly
> gives the instruction, not off-by-one instruction. So with precise == 2
> we use that directly and don't try to use LBRs and walking basic blocks.
> This lowers the overhead of using precise significantly.
>
> Some other features are added in later patches.
>
> Reviewed-by: Stephane Eranian <[email protected]>
> v2: Rename various identifiers. Add more comments. Get rid of a cast.
> v3: fmt2->hsw rename
> v4: ip_of_the_event->real_ip rename
> v5: use pr_cont. white space changes.
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event.c | 3 +-
> arch/x86/kernel/cpu/perf_event_intel_ds.c | 114 +++++++++++++++++++++++------
> 2 files changed, 93 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index bf0f01a..758f3fd 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -397,7 +397,8 @@ int x86_pmu_hw_config(struct perf_event *event)
> * check that PEBS LBR correction does not conflict with
> * whatever the user is asking with attr->branch_sample_type
> */
> - if (event->attr.precise_ip > 1) {
> + if (event->attr.precise_ip > 1 &&
> + x86_pmu.intel_cap.pebs_format < 2) {
> u64 *br_type = &event->attr.branch_sample_type;
>
> if (has_branch_stack(event)) {
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> index b05a575..a7ab4db 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
> @@ -41,6 +41,22 @@ struct pebs_record_nhm {
> u64 status, dla, dse, lat;
> };
>
> +/*
> + * Same as pebs_record_nhm, with two additional fields.
> + */
> +struct pebs_record_hsw {
> + struct pebs_record_nhm nhm;
> + /*
> + * Real IP of the event. In the Intel documentation this
> + * is called eventingrip.
> + */
> + u64 real_ip;
> + /*
> + * TSX tuning information field: abort cycles and abort flags.
> + */
> + u64 tsx_tuning;
> +};
> +
> void init_debug_store_on_cpu(int cpu)
> {
> struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds;
> @@ -559,11 +575,11 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
> {
> /*
> * We cast to pebs_record_core since that is a subset of
> - * both formats and we don't use the other fields in this
> - * routine.
> + * all formats.
> */
> struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> struct pebs_record_core *pebs = __pebs;
> + struct pebs_record_hsw *pebs_hsw = __pebs;
> struct perf_sample_data data;
> struct pt_regs regs;
>
> @@ -588,7 +604,10 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
> regs.bp = pebs->bp;
> regs.sp = pebs->sp;
>
> - if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip(&regs))
> + if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format >= 2) {
> + regs.ip = pebs_hsw->real_ip;
> + regs.flags |= PERF_EFLAGS_EXACT;
> + } else if (event->attr.precise_ip > 1 && intel_pmu_pebs_fixup_ip(&regs))
> regs.flags |= PERF_EFLAGS_EXACT;
> else
> regs.flags &= ~PERF_EFLAGS_EXACT;
> @@ -641,35 +660,22 @@ static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
> __intel_pmu_pebs_event(event, iregs, at);
> }
>
> -static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
> +static void __intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, void *at,
> + void *top)
> {
> struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> struct debug_store *ds = cpuc->ds;
> - struct pebs_record_nhm *at, *top;
> struct perf_event *event = NULL;
> u64 status = 0;
> - int bit, n;
> -
> - if (!x86_pmu.pebs_active)
> - return;
> -
> - at = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
> - top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;
> + int bit;
>
> ds->pebs_index = ds->pebs_buffer_base;
>
> - n = top - at;
> - if (n <= 0)
> - return;
> -
> - /*
> - * Should not happen, we program the threshold at 1 and do not
> - * set a reset value.
> - */
> - WARN_ONCE(n > x86_pmu.max_pebs_events, "Unexpected number of pebs records %d\n", n);
> + for ( ; at < top; at += x86_pmu.pebs_record_size) {
> + struct pebs_record_nhm *p = at;
>
> - for ( ; at < top; at++) {
> - for_each_set_bit(bit, (unsigned long *)&at->status, x86_pmu.max_pebs_events) {
> + for_each_set_bit(bit, (unsigned long *)&p->status,
> + x86_pmu.max_pebs_events) {
> event = cpuc->events[bit];
> if (!test_bit(bit, cpuc->active_mask))
> continue;
> @@ -692,6 +698,61 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
> }
> }
>
> +static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
> +{
> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> + struct debug_store *ds = cpuc->ds;
> + struct pebs_record_nhm *at, *top;
> + int n;
> +
> + if (!x86_pmu.pebs_active)
> + return;
> +
> + at = (struct pebs_record_nhm *)(unsigned long)ds->pebs_buffer_base;
> + top = (struct pebs_record_nhm *)(unsigned long)ds->pebs_index;
> +
> + ds->pebs_index = ds->pebs_buffer_base;
> +
> + n = top - at;
> + if (n <= 0)
> + return;
> +
> + /*
> + * Should not happen, we program the threshold at 1 and do not
> + * set a reset value.
> + */
> + WARN_ONCE(n > x86_pmu.max_pebs_events,
> + "Unexpected number of pebs records %d\n", n);
> +
> + return __intel_pmu_drain_pebs_nhm(iregs, at, top);
> +}
> +
> +static void intel_pmu_drain_pebs_hsw(struct pt_regs *iregs)
> +{
> + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> + struct debug_store *ds = cpuc->ds;
> + struct pebs_record_hsw *at, *top;
> + int n;
> +
> + if (!x86_pmu.pebs_active)
> + return;
> +
> + at = (struct pebs_record_hsw *)(unsigned long)ds->pebs_buffer_base;
> + top = (struct pebs_record_hsw *)(unsigned long)ds->pebs_index;
> +
> + n = top - at;
> + if (n <= 0)
> + return;
> + /*
> + * Should not happen, we program the threshold at 1 and do not
> + * set a reset value.
> + */
> + WARN_ONCE(n > x86_pmu.max_pebs_events,
> + "Unexpected number of pebs records %d\n", n);
> +
> + return __intel_pmu_drain_pebs_nhm(iregs, at, top);
> +}
> +
> /*
> * BTS, PEBS probe and setup
> */
> @@ -723,6 +784,13 @@ void intel_ds_init(void)
> x86_pmu.drain_pebs = intel_pmu_drain_pebs_nhm;
> break;
>
> + case 2:
> + pr_cont("PEBS fmt2%c, ", pebs_type);
> + x86_pmu.pebs_record_size =
> + sizeof(struct pebs_record_hsw);
> + x86_pmu.drain_pebs = intel_pmu_drain_pebs_hsw;
> + break;
> +
> default:
> printk(KERN_CONT "no PEBS fmt%d%c, ", format, pebs_type);
> x86_pmu.pebs = 0;
> --
> 1.7.7.6
>

2013-05-31 16:23:29

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 1/5] perf, x86: Add Haswell PEBS record support v5

On Fri, May 31, 2013 at 03:02:29PM +0200, Stephane Eranian wrote:
> Hi,
>
> So looked at this patch again. There is nothing wrong with the code.
> But there is something bothering me with the usage model. I think
> the choice of having pebs->ip for precise=1 or pebs->real_ip for
> precise=2 is too restrictive.
>
> There are situations where you want BOTH the real_ip AND the
> off-by-one ip. This is when you're sampling call branches.
> The real_ip gives you the call site, the off-by-one gives you
> the target of the branch. This is very handy because you do
> not need to use the LBR to get this, unlike with SandyBridge.

It's also useful for TSX: critical section versus abort point
inside transaction.

> So we need to find an extension or a way to return both IPs
> without invoking LBR. Easiest would be to add another
> PERF_SAMPLE_*.
>
> Any better idea?

For TSX I just use two different events. It works.

I'm not disagreeing with it, but I think it's an orthogonal issue to my
patchkit. Adding a new SAMPLE type seems reasonable to me.

-Andi