2012-11-21 12:51:29

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 00/16] perf: add memory access sampling support

This patch series had a new feature to the kernel perf_events
interface and corresponding user level tool, perf.

With this patch, it is possible to sample (not trace) memory
accesses (load, store). For loads, the instruction and data
addresses are captured along with the latency and data source.
For stores, the instruction and data addresses are capture
along with limited cache and TLB information.

For load data source, the memory hierarchy level, the tlb, snoop
and lock information is captured.

Although the perf_event interface is extended in a generic manner,
sampling memory accesses requires HW support. The current patches
implement the feature on Intel processors starting with Nehalem.
The patches leverage the PEBS Load Latency and Precise Store
mechanisms. Precise Store is present only on Sandy Bridge and
Ivy Bridge based processors.

The perf tool is extended to make capturing and analyzing the
data easier with a new command: perf mem.

$ perf mem -t load rec triad
$ perf mem -t load rep --stdio
# Samples: 19K of event 'cpu/mem-loads/pp'
# Total cost : 1013994
# Sort order : cost,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead Samples Cost Memory access Symbol Shared Obj Data Symbol Data Object Snoop TLB access Locked
# ........ ........... ....... ............. .......... ........... ...................... ........... ...... ............ ......
#
0.10% 1 986 LFB hit [.] triad triad [.] 0x00007f67dffe8038 [unknown] None L1 or L2 hit No
0.09% 1 890 LFB hit [.] triad triad [.] 0x00007f67df91a750 [unknown] None L1 or L2 hit No
0.08% 1 826 LFB hit [.] triad triad [.] 0x00007f67e288fba8 [unknown] None L1 or L2 hit No
0.08% 1 825 LFB hit [.] triad triad [.] 0x00007f67dea28c80 [unknown] None L1 or L2 hit No
0.08% 1 787 LFB hit [.] triad triad [.] 0x00007f67df055a60 [unknown] None L1 or L2 hit No

The perf mem command is a wrapper around perf record/report. It passes the
right options to the report and record commands. Note that the TUI mode is
supported.

One powerful feature of perf is that users can toy with sort order to display
the information in different format or from a different angle. This is particularly
useful with memory sampling:

$ perf mem -t load rep --sort=mem
# Samples: 19K of event 'cpu/mem-loads/pp'
# Total cost : 1013994
# Sort order : mem
#
# Overhead Samples Memory access
# ........ ........... ........................
#
85.26% 10633 LFB hit
7.35% 8151 L1 hit
3.13% 383 L3 hit
3.09% 195 Local RAM hit
1.16% 259 L2 hit
0.00% 4 Uncached hit

Or if one is interested in the data view:
$ perf mem -t load rep --sort=symbol_daddr,cost
# Samples: 19K of event 'cpu/mem-loads/pp'
# Total cost : 1013994
# Sort order : symbol_daddr,cost
#
# Overhead Samples Data Symbol Cost
# ........ ........... ...................... .......
#
0.10% 1 [.] 0x00007f67dffe8038 986
0.09% 1 [.] 0x00007f67df91a750 890
0.08% 1 [.] 0x00007f67e288fba8 826


One note on the cost displayed: On Intel processors with PEBS Load Latency, as described
in the SDM, the cost encompasses the number of cycles from dispatch to Globally Observable
(GO) state. That means, that it includes OOO execution. It is not usual to see L1D Hits
with a cost of > 100 cycles. Always look at the memory level for an approximation of the
access penalty, then interpret the cost value accordingly.

Data symbolization is working for initialized global variables. Dynamically allocated
data and bss symbolization is currently non-functional.

There is no cost associated with stores.

In v2, we leverage some of Andi Kleen's Haswell patches, namely the weighted
samples and perf tool event parser fixes. We also introduce PERF_RECORD_MISC_DATA_MMAP
to tag mmaps for data vs. code. This helps the perf tool distinguish data. vs. code
mmaps (and therefore symbols). We have also integrated the feedback from v1. Note that in
v2 data symbol resolution is not yet fully operational, but there is a slight improvement.

In v3, we rebased the patch on 3.7.0-rc6 which includes certain of Nahyung's patches.

Andi Kleen (2):
perf, x86: Support CPU specific sysfs events
perf, core: Add a concept of a weightened sample

Namhyung Kim (3):
perf tools: Synthesize data mmap events for threads
perf tools: Ignore ABS symbols when loading data maps
perf tools: Fix output of symbol_daddr offset

Stephane Eranian (14):
perf/x86: improve sysfs event mapping with event string
perf/x86: add flags to event constraints
perf: add minimal support for PERF_SAMPLE_WEIGHT
perf: add support for PERF_SAMPLE_ADDR in dump_sampple()
perf: add generic memory sampling interface
perf/x86: add memory profiling via PEBS Load Latency
perf/x86: export PEBS load latency threshold register to sysfs
perf/x86: add support for PEBS Precise Store
perf tools: add mem access sampling core support
perf report: add support for mem access profiling
perf record: add support for mem access profiling
perf tools: add new mem command for memory access profiling
perf: add PERF_RECORD_MISC_MMAP_DATA to RECORD_MMAP
perf tools: detect data vs. text mappings

arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/perf_event.c | 63 +++--
arch/x86/kernel/cpu/perf_event.h | 62 ++++-
arch/x86/kernel/cpu/perf_event_intel.c | 34 ++-
arch/x86/kernel/cpu/perf_event_intel_ds.c | 182 +++++++++++++-
arch/x86/kernel/cpu/perf_event_intel_uncore.c | 2 +-
include/linux/perf_event.h | 5 +
include/uapi/linux/perf_event.h | 74 +++++-
kernel/events/core.c | 15 ++
tools/perf/Makefile | 1 +
tools/perf/builtin-mem.c | 238 ++++++++++++++++++
tools/perf/builtin-record.c | 2 +
tools/perf/builtin-report.c | 131 +++++++++-
tools/perf/builtin.h | 1 +
tools/perf/command-list.txt | 1 +
tools/perf/perf.c | 1 +
tools/perf/perf.h | 1 +
tools/perf/util/event.c | 78 +++---
tools/perf/util/event.h | 2 +
tools/perf/util/evsel.c | 15 ++
tools/perf/util/hist.c | 66 ++++-
tools/perf/util/hist.h | 13 +
tools/perf/util/machine.c | 10 +-
tools/perf/util/session.c | 44 ++++
tools/perf/util/session.h | 4 +
tools/perf/util/sort.c | 324 ++++++++++++++++++++++++-
tools/perf/util/sort.h | 10 +-
tools/perf/util/symbol-elf.c | 3 +
tools/perf/util/symbol.h | 7 +
29 files changed, 1298 insertions(+), 92 deletions(-)
create mode 100644 tools/perf/builtin-mem.c

--
1.7.9.5


2012-11-21 12:51:30

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 01/19] perf, x86: Support CPU specific sysfs events

From: Andi Kleen <[email protected]>

Add a way for the CPU initialization code to register additional events,
and merge them into the events attribute directory. Used in the next
patch.

Signed-off-by: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 29 +++++++++++++++++++++++++++++
arch/x86/kernel/cpu/perf_event.h | 1 +
2 files changed, 30 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 4428fd1..b116788 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1341,6 +1341,30 @@ static void __init filter_events(struct attribute **attrs)
}
}

+/* Merge two pointer arrays */
+static __init struct attribute **merge_attr(struct attribute **a,
+ struct attribute **b)
+{
+ struct attribute **new;
+ int j, i;
+
+ for (j = 0; a[j]; j++)
+ ;
+ for (i = 0; b[i]; i++)
+ j++;
+ j++;
+ new = kmalloc(sizeof(struct attribute *) * j, GFP_KERNEL);
+ if (!new)
+ return a;
+ j = 0;
+ for (i = 0; a[i]; i++)
+ new[j++] = a[i];
+ for (i = 0; b[i]; i++)
+ new[j++] = b[i];
+ new[j] = NULL;
+ return new;
+}
+
static ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
char *page)
{
@@ -1482,6 +1506,11 @@ static int __init init_hw_perf_events(void)
else
filter_events(x86_pmu_events_group.attrs);

+ if (x86_pmu.cpu_events)
+ x86_pmu_events_group.attrs =
+ merge_attr(x86_pmu_events_group.attrs,
+ x86_pmu.cpu_events);
+
pr_info("... version: %d\n", x86_pmu.version);
pr_info("... bit width: %d\n", x86_pmu.cntval_bits);
pr_info("... generic registers: %d\n", x86_pmu.num_counters);
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 115c1ea..4170043 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -355,6 +355,7 @@ struct x86_pmu {
struct attribute **format_attrs;

ssize_t (*events_sysfs_show)(char *page, u64 config);
+ struct attribute **cpu_events;

/*
* CPU Hotplug hooks
--
1.7.9.5

2012-11-21 12:51:34

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 02/19] perf/x86: improve sysfs event mapping with event string

This patch extends Jiri's changes to make generic
events mapping visible via sysfs. The patch extends
the mechanism to non-generic events by allowing
the mappings to be hardcoded in strings.

This mechanism will be used by the PEBS-LL patch
later on.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 27 ++++++++++++---------------
arch/x86/kernel/cpu/perf_event.h | 23 +++++++++++++++++++++++
2 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index b116788..198d74f 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1316,20 +1316,22 @@ static struct attribute_group x86_pmu_format_group = {
.attrs = NULL,
};

-struct perf_pmu_events_attr {
- struct device_attribute attr;
- u64 id;
-};
-
/*
* Remove all undefined events (x86_pmu.event_map(id) == 0)
* out of events_attr attributes.
*/
static void __init filter_events(struct attribute **attrs)
{
+ struct device_attribute *d;
+ struct perf_pmu_events_attr *pmu_attr;
int i, j;

for (i = 0; attrs[i]; i++) {
+ d = (struct device_attribute *)attrs[i];
+ pmu_attr = container_of(d, struct perf_pmu_events_attr, attr);
+ /* str trumps id */
+ if (pmu_attr->event_str)
+ continue;
if (x86_pmu.event_map(i))
continue;

@@ -1370,19 +1372,14 @@ static ssize_t events_sysfs_show(struct device *dev, struct device_attribute *at
{
struct perf_pmu_events_attr *pmu_attr = \
container_of(attr, struct perf_pmu_events_attr, attr);
-
u64 config = x86_pmu.event_map(pmu_attr->id);
- return x86_pmu.events_sysfs_show(page, config);
-}

-#define EVENT_VAR(_id) event_attr_##_id
-#define EVENT_PTR(_id) &event_attr_##_id.attr.attr
+ /* string trumps id */
+ if (pmu_attr->event_str)
+ return sprintf(page, "%s", pmu_attr->event_str);

-#define EVENT_ATTR(_name, _id) \
-static struct perf_pmu_events_attr EVENT_VAR(_id) = { \
- .attr = __ATTR(_name, 0444, events_sysfs_show, NULL), \
- .id = PERF_COUNT_HW_##_id, \
-};
+ return x86_pmu.events_sysfs_show(page, config);
+}

EVENT_ATTR(cpu-cycles, CPU_CYCLES );
EVENT_ATTR(instructions, INSTRUCTIONS );
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 4170043..3f4380c 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -420,6 +420,29 @@ do { \
#define ERF_NO_HT_SHARING 1
#define ERF_HAS_RSP_1 2

+#define EVENT_VAR(_id) event_attr_##_id
+#define EVENT_PTR(_id) &event_attr_##_id.attr.attr
+
+#define EVENT_ATTR(_name, _id) \
+static struct perf_pmu_events_attr EVENT_VAR(_id) = { \
+ .attr = __ATTR(_name, 0444, events_sysfs_show, NULL), \
+ .id = PERF_COUNT_HW_##_id, \
+ .event_str = NULL, \
+};
+
+#define EVENT_ATTR_STR(_name, v, str) \
+static struct perf_pmu_events_attr event_attr_##v = { \
+ .attr = __ATTR(_name, 0444, events_sysfs_show, NULL),\
+ .id = 0, \
+ .event_str = str, \
+};
+
+struct perf_pmu_events_attr {
+ struct device_attribute attr;
+ u64 id;
+ const char *event_str;
+};
+
extern struct x86_pmu x86_pmu __read_mostly;

DECLARE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
--
1.7.9.5

2012-11-21 12:51:37

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 03/19] perf/x86: add flags to event constraints

This patch adds a flags field to each event constraint.
It can be used to store event specific features which can
then later be used by scheduling code or low-level x86 code.

The flags are propagated into event->hw.flags during the
get_event_constraint() call. They are cleared during the
put_event_constraint() call.

This mechanism is going to be used by the PEBS-LL patches.
It avoids defining yet another table to hold event specific
information.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event.c | 2 +-
arch/x86/kernel/cpu/perf_event.h | 8 +++++---
arch/x86/kernel/cpu/perf_event_intel.c | 5 ++++-
arch/x86/kernel/cpu/perf_event_intel_ds.c | 4 +++-
arch/x86/kernel/cpu/perf_event_intel_uncore.c | 2 +-
include/linux/perf_event.h | 1 +
6 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 198d74f..1dbd9b1 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1493,7 +1493,7 @@ static int __init init_hw_perf_events(void)

unconstrained = (struct event_constraint)
__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
- 0, x86_pmu.num_counters, 0);
+ 0, x86_pmu.num_counters, 0, 0);

x86_pmu.attr_rdpmc = 1; /* enable userspace RDPMC usage by default */
x86_pmu_format_group.attrs = x86_pmu.format_attrs;
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3f4380c..3f10cfe 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -59,6 +59,7 @@ struct event_constraint {
u64 cmask;
int weight;
int overlap;
+ int flags;
};

struct amd_nb {
@@ -170,16 +171,17 @@ struct cpu_hw_events {
void *kfree_on_online;
};

-#define __EVENT_CONSTRAINT(c, n, m, w, o) {\
+#define __EVENT_CONSTRAINT(c, n, m, w, o, f) {\
{ .idxmsk64 = (n) }, \
.code = (c), \
.cmask = (m), \
.weight = (w), \
.overlap = (o), \
+ .flags = f, \
}

#define EVENT_CONSTRAINT(c, n, m) \
- __EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0)
+ __EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0, 0)

/*
* The overlap flag marks event constraints with overlapping counter
@@ -203,7 +205,7 @@ struct cpu_hw_events {
* and its counter masks must be kept at a minimum.
*/
#define EVENT_CONSTRAINT_OVERLAP(c, n, m) \
- __EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 1)
+ __EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 1, 0)

/*
* Constraint on the Event code.
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 93b9e11..57d6527 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1367,8 +1367,10 @@ x86_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)

if (x86_pmu.event_constraints) {
for_each_event_constraint(c, x86_pmu.event_constraints) {
- if ((event->hw.config & c->cmask) == c->code)
+ if ((event->hw.config & c->cmask) == c->code) {
+ event->hw.flags |= c->flags;
return c;
+ }
}
}

@@ -1413,6 +1415,7 @@ intel_put_shared_regs_event_constraints(struct cpu_hw_events *cpuc,
static void intel_put_event_constraints(struct cpu_hw_events *cpuc,
struct perf_event *event)
{
+ event->hw.flags = 0;
intel_put_shared_regs_event_constraints(cpuc, event);
}

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 826054a..f30d85b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -430,8 +430,10 @@ struct event_constraint *intel_pebs_constraints(struct perf_event *event)

if (x86_pmu.pebs_constraints) {
for_each_event_constraint(c, x86_pmu.pebs_constraints) {
- if ((event->hw.config & c->cmask) == c->code)
+ if ((event->hw.config & c->cmask) == c->code) {
+ event->hw.flags |= c->flags;
return c;
+ }
}
}

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 3cf3d97..cc39e64 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -2438,7 +2438,7 @@ static int __init uncore_type_init(struct intel_uncore_type *type)

type->unconstrainted = (struct event_constraint)
__EVENT_CONSTRAINT(0, (1ULL << type->num_counters) - 1,
- 0, type->num_counters, 0);
+ 0, type->num_counters, 0, 0);

for (i = 0; i < type->num_boxes; i++) {
pmus[i].func_id = -1;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6bfb2faa..484cfbc 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -128,6 +128,7 @@ struct hw_perf_event {
int event_base_rdpmc;
int idx;
int last_cpu;
+ int flags;

struct hw_perf_event_extra extra_reg;
struct hw_perf_event_extra branch_reg;
--
1.7.9.5

2012-11-21 12:51:47

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 07/19] perf: add generic memory sampling interface

This patch adds PERF_SAMPLE_COST and PERF_SAMPLE_DSRC.
The first collects a cost associated with the sampled
event. In case of memory access, the cost would be
the latency of the load, otherwise it defaults to
the sampling period.

PERF_SAMPLE_DSRC collects the data source, i.e., where
did the data associated with the sampled instruction
come from. Information is stored in a perf_mem_dsrc
structure. It contains opcode, mem level, tlb, snoop,
lock information, subject to availability in hardware.

Signed-off-by: Stephane Eranian <[email protected]>
---
include/linux/perf_event.h | 2 ++
include/uapi/linux/perf_event.h | 69 +++++++++++++++++++++++++++++++++++++--
kernel/events/core.c | 6 ++++
3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index bb2429d..8fe4610 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -579,6 +579,7 @@ struct perf_sample_data {
u32 reserved;
} cpu_entry;
u64 period;
+ union perf_mem_dsrc dsrc;
struct perf_callchain_entry *callchain;
struct perf_raw_record *raw;
struct perf_branch_stack *br_stack;
@@ -599,6 +600,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->regs_user.regs = NULL;
data->stack_user_size = 0;
data->weight = 0;
+ data->dsrc.val = 0;
}

extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c52caab..b2a78bb 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -133,9 +133,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_REGS_USER = 1U << 12,
PERF_SAMPLE_STACK_USER = 1U << 13,
PERF_SAMPLE_WEIGHT = 1U << 14,
+ PERF_SAMPLE_DSRC = 1U << 15,

- PERF_SAMPLE_MAX = 1U << 15, /* non-ABI */
-
+ PERF_SAMPLE_MAX = 1U << 16, /* non-ABI */
};

/*
@@ -591,6 +591,8 @@ enum perf_event_type {
* { u64 size;
* char data[size];
* u64 dyn_size; } && PERF_SAMPLE_STACK_USER
+ *
+ * { u64 dsrc; } && PERF_SAMPLE_DSRC
* };
*/
PERF_RECORD_SAMPLE = 9,
@@ -616,4 +618,67 @@ enum perf_callchain_context {
#define PERF_FLAG_FD_OUTPUT (1U << 1)
#define PERF_FLAG_PID_CGROUP (1U << 2) /* pid=cgroup id, per-cpu mode only */

+union perf_mem_dsrc {
+ __u64 val;
+ struct {
+ __u64 mem_op:5, /* type of opcode */
+ mem_lvl:14, /* memory hierarchy level */
+ mem_snoop:5, /* snoop mode */
+ mem_lock:2, /* lock instr */
+ mem_dtlb:7, /* tlb access */
+ mem_rsvd:31;
+ };
+};
+
+/* type of opcode (load/store/prefetch,code) */
+#define PERF_MEM_OP_NA 0x01 /* not available */
+#define PERF_MEM_OP_LOAD 0x02 /* load instruction */
+#define PERF_MEM_OP_STORE 0x04 /* store instruction */
+#define PERF_MEM_OP_PFETCH 0x08 /* prefetch */
+#define PERF_MEM_OP_EXEC 0x10 /* code (execution) */
+#define PERF_MEM_OP_SHIFT 0
+
+/* memory hierarchy (memory level, hit or miss) */
+#define PERF_MEM_LVL_NA 0x01 /* not available */
+#define PERF_MEM_LVL_HIT 0x02 /* hit level */
+#define PERF_MEM_LVL_MISS 0x04 /* miss level */
+#define PERF_MEM_LVL_L1 0x08 /* L1 */
+#define PERF_MEM_LVL_LFB 0x10 /* Line Fill Buffer */
+#define PERF_MEM_LVL_L2 0x20 /* L2 hit */
+#define PERF_MEM_LVL_L3 0x40 /* L3 hit */
+#define PERF_MEM_LVL_LOC_RAM 0x80 /* Local DRAM */
+#define PERF_MEM_LVL_REM_RAM1 0x100 /* Remote DRAM (1 hop) */
+#define PERF_MEM_LVL_REM_RAM2 0x200 /* Remote DRAM (2 hops) */
+#define PERF_MEM_LVL_REM_CCE1 0x400 /* Remote Cache (1 hop) */
+#define PERF_MEM_LVL_REM_CCE2 0x800 /* Remote Cache (2 hops) */
+#define PERF_MEM_LVL_IO 0x1000 /* I/O memory */
+#define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */
+#define PERF_MEM_LVL_SHIFT 5
+
+/* snoop mode */
+#define PERF_MEM_SNOOP_NA 0x01 /* not available */
+#define PERF_MEM_SNOOP_NONE 0x02 /* no snoop */
+#define PERF_MEM_SNOOP_HIT 0x04 /* snoop hit */
+#define PERF_MEM_SNOOP_MISS 0x08 /* snoop miss */
+#define PERF_MEM_SNOOP_HITM 0x10 /* snoop hit modified */
+#define PERF_MEM_SNOOP_SHIFT 19
+
+/* locked instruction */
+#define PERF_MEM_LOCK_NA 0x01 /* not available */
+#define PERF_MEM_LOCK_LOCKED 0x02 /* locked transaction */
+#define PERF_MEM_LOCK_SHIFT 24
+
+/* TLB access */
+#define PERF_MEM_TLB_NA 0x01 /* not available */
+#define PERF_MEM_TLB_HIT 0x02 /* hit level */
+#define PERF_MEM_TLB_MISS 0x04 /* miss level */
+#define PERF_MEM_TLB_L1 0x08 /* L1 */
+#define PERF_MEM_TLB_L2 0x10 /* L2 */
+#define PERF_MEM_TLB_WK 0x20 /* Hardware Walker*/
+#define PERF_MEM_TLB_OS 0x40 /* OS fault handler */
+#define PERF_MEM_TLB_SHIFT 26
+
+#define PERF_MEM_S(a, s) \
+ (((u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
+
#endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d633581..94b2c70 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -958,6 +958,9 @@ static void perf_event__header_size(struct perf_event *event)
if (sample_type & PERF_SAMPLE_READ)
size += event->read_size;

+ if (sample_type & PERF_SAMPLE_DSRC)
+ size += sizeof(data->dsrc.val);
+
event->header_size = size;
}

@@ -4175,6 +4178,9 @@ void perf_output_sample(struct perf_output_handle *handle,
perf_output_sample_ustack(handle,
data->stack_user_size,
data->regs_user.regs);
+
+ if (sample_type & PERF_SAMPLE_DSRC)
+ perf_output_put(handle, data->dsrc.val);
}

void perf_prepare_sample(struct perf_event_header *header,
--
1.7.9.5

2012-11-21 12:51:42

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 05/19] perf: add minimal support for PERF_SAMPLE_WEIGHT

Ensure we grab the weight from raw sample struct
and that we can dump it via perf report -D.

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/util/event.h | 1 +
tools/perf/util/evsel.c | 5 +++++
tools/perf/util/session.c | 3 +++
3 files changed, 9 insertions(+)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 0d573ff..cf52977 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -90,6 +90,7 @@ struct perf_sample {
u64 period;
u32 cpu;
u32 raw_size;
+ u64 weight;
void *raw_data;
struct ip_callchain *callchain;
struct branch_stack *branch_stack;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 618d411..b6f3697 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1020,6 +1020,11 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
}
}

+ if (type & PERF_SAMPLE_WEIGHT) {
+ data->weight= *array;
+ array++;
+ }
+
return 0;
}

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 15abe40..962c04f 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1006,6 +1006,9 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,

if (sample_type & PERF_SAMPLE_STACK_USER)
stack_user__printf(&sample->user_stack);
+
+ if (sample_type & PERF_SAMPLE_WEIGHT)
+ printf(" ... weight: %"PRIu64"\n", sample->weight);
}

static struct machine *
--
1.7.9.5

2012-11-21 12:51:53

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 08/19] perf/x86: add memory profiling via PEBS Load Latency

This patch adds support for memory profiling using the
PEBS Load Latency facility.

Load accesses are sampled by HW and the instruction
address, data address, load latency, data source, tlb,
locked information can be saved in the sampling buffer
if using the PERF_SAMPLE_COST (for latency),
PERF_SAMPLE_ADDR, PERF_SAMPLE_DSRC types.

To enable PEBS Load Latency, users have to use the
model specific event:
- on NHM/WSM: MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD
- on SNB/IVB: MEM_TRANS_RETIRED:LATENCY_ABOVE_THRESHOLD

To make things easier, this patch also exports a generic
alias via sysfs: mem-loads. It export the right event
encoding based on the host CPU and can be used directly
by the perf tool.

Loosely based on Intel's Lin Ming patch posted on LKML
in July 2011.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/perf_event.c | 5 +-
arch/x86/kernel/cpu/perf_event.h | 25 +++++-
arch/x86/kernel/cpu/perf_event_intel.c | 24 ++++++
arch/x86/kernel/cpu/perf_event_intel_ds.c | 133 +++++++++++++++++++++++++++--
5 files changed, 178 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e400cdb..135b673 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -66,6 +66,7 @@
#define MSR_IA32_PEBS_ENABLE 0x000003f1
#define MSR_IA32_DS_AREA 0x00000600
#define MSR_IA32_PERF_CAPABILITIES 0x00000345
+#define MSR_PEBS_LD_LAT_THRESHOLD 0x000003f6

#define MSR_MTRRfix64K_00000 0x00000250
#define MSR_MTRRfix16K_80000 0x00000258
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 1dbd9b1..c69317a 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1367,7 +1367,7 @@ static __init struct attribute **merge_attr(struct attribute **a,
return new;
}

-static ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
+ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
char *page)
{
struct perf_pmu_events_attr *pmu_attr = \
@@ -1498,6 +1498,9 @@ static int __init init_hw_perf_events(void)
x86_pmu.attr_rdpmc = 1; /* enable userspace RDPMC usage by default */
x86_pmu_format_group.attrs = x86_pmu.format_attrs;

+ if (x86_pmu.event_attrs)
+ x86_pmu_events_group.attrs = x86_pmu.event_attrs;
+
if (!x86_pmu.events_sysfs_show)
x86_pmu_events_group.attrs = &empty_attrs;
else
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3f10cfe..3f91411 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -46,6 +46,7 @@ enum extra_reg_type {
EXTRA_REG_RSP_0 = 0, /* offcore_response_0 */
EXTRA_REG_RSP_1 = 1, /* offcore_response_1 */
EXTRA_REG_LBR = 2, /* lbr_select */
+ EXTRA_REG_LDLAT = 3, /* ld_lat_threshold */

EXTRA_REG_MAX /* number of entries needed */
};
@@ -61,6 +62,10 @@ struct event_constraint {
int overlap;
int flags;
};
+/*
+ * struct event_constraint flags
+ */
+#define PERF_X86_EVENT_PEBS_LDLAT 0x1 /* ld+ldlat data address sampling */

struct amd_nb {
int nb_id; /* NorthBridge id */
@@ -233,6 +238,10 @@ struct cpu_hw_events {
#define INTEL_UEVENT_CONSTRAINT(c, n) \
EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK)

+#define INTEL_PLD_CONSTRAINT(c, n) \
+ __EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
+ HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LDLAT)
+
#define EVENT_CONSTRAINT_END \
EVENT_CONSTRAINT(0, 0, 0)

@@ -262,12 +271,22 @@ struct extra_reg {
.msr = (ms), \
.config_mask = (m), \
.valid_mask = (vm), \
- .idx = EXTRA_REG_##i \
+ .idx = EXTRA_REG_##i, \
}

#define INTEL_EVENT_EXTRA_REG(event, msr, vm, idx) \
EVENT_EXTRA_REG(event, msr, ARCH_PERFMON_EVENTSEL_EVENT, vm, idx)

+#define INTEL_UEVENT_EXTRA_REG(event, msr, vm, idx) \
+ EVENT_EXTRA_REG(event, msr, ARCH_PERFMON_EVENTSEL_EVENT | \
+ ARCH_PERFMON_EVENTSEL_UMASK, vm, idx)
+
+#define INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(c) \
+ INTEL_UEVENT_EXTRA_REG(c, \
+ MSR_PEBS_LD_LAT_THRESHOLD, \
+ 0xffff, \
+ LDLAT)
+
#define EVENT_EXTRA_END EVENT_EXTRA_REG(0, 0, 0, 0, RSP_0)

union perf_capabilities {
@@ -355,6 +374,7 @@ struct x86_pmu {
*/
int attr_rdpmc;
struct attribute **format_attrs;
+ struct attribute **event_attrs;

ssize_t (*events_sysfs_show)(char *page, u64 config);
struct attribute **cpu_events;
@@ -659,6 +679,9 @@ int p6_pmu_init(void);

int knc_pmu_init(void);

+ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
+ char *page);
+
#else /* CONFIG_CPU_SUP_INTEL */

static inline void reserve_ds_buffers(void)
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 57d6527..1d2a7ca 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -81,6 +81,7 @@ static struct event_constraint intel_nehalem_event_constraints[] __read_mostly =
static struct extra_reg intel_nehalem_extra_regs[] __read_mostly =
{
INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0xffff, RSP_0),
+ INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x100b),
EVENT_EXTRA_END
};

@@ -111,6 +112,7 @@ static struct extra_reg intel_westmere_extra_regs[] __read_mostly =
{
INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0xffff, RSP_0),
INTEL_EVENT_EXTRA_REG(0xbb, MSR_OFFCORE_RSP_1, 0xffff, RSP_1),
+ INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x100b),
EVENT_EXTRA_END
};

@@ -130,9 +132,23 @@ static struct event_constraint intel_gen_event_constraints[] __read_mostly =
static struct extra_reg intel_snb_extra_regs[] __read_mostly = {
INTEL_EVENT_EXTRA_REG(0xb7, MSR_OFFCORE_RSP_0, 0x3fffffffffull, RSP_0),
INTEL_EVENT_EXTRA_REG(0xbb, MSR_OFFCORE_RSP_1, 0x3fffffffffull, RSP_1),
+ INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd),
EVENT_EXTRA_END
};

+EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x0b,umask=0x10,ldlat=3");
+EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3");
+
+struct attribute *nhm_events_attrs[] = {
+ EVENT_PTR(mem_ld_nhm),
+ NULL,
+};
+
+struct attribute *snb_events_attrs[] = {
+ EVENT_PTR(mem_ld_snb),
+ NULL,
+};
+
static u64 intel_pmu_event_map(int hw_event)
{
return intel_perfmon_event_map[hw_event];
@@ -2009,6 +2025,8 @@ __init int intel_pmu_init(void)
x86_pmu.enable_all = intel_pmu_nhm_enable_all;
x86_pmu.extra_regs = intel_nehalem_extra_regs;

+ x86_pmu.cpu_events = nhm_events_attrs;
+
/* UOPS_ISSUED.STALLED_CYCLES */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
@@ -2049,6 +2067,8 @@ __init int intel_pmu_init(void)
x86_pmu.extra_regs = intel_westmere_extra_regs;
x86_pmu.er_flags |= ERF_HAS_RSP_1;

+ x86_pmu.cpu_events = nhm_events_attrs;
+
/* UOPS_ISSUED.STALLED_CYCLES */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
@@ -2077,6 +2097,8 @@ __init int intel_pmu_init(void)
x86_pmu.er_flags |= ERF_HAS_RSP_1;
x86_pmu.er_flags |= ERF_NO_HT_SHARING;

+ x86_pmu.cpu_events = snb_events_attrs;
+
/* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
@@ -2102,6 +2124,8 @@ __init int intel_pmu_init(void)
x86_pmu.er_flags |= ERF_HAS_RSP_1;
x86_pmu.er_flags |= ERF_NO_HT_SHARING;

+ x86_pmu.cpu_events = snb_events_attrs;
+
/* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index f30d85b..759b0c0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -24,6 +24,92 @@ struct pebs_record_32 {

*/

+union intel_x86_pebs_dse {
+ u64 val;
+ struct {
+ unsigned int ld_dse:4;
+ unsigned int ld_stlb_miss:1;
+ unsigned int ld_locked:1;
+ unsigned int ld_reserved:26;
+ };
+ struct {
+ unsigned int st_l1d_hit:1;
+ unsigned int st_reserved1:3;
+ unsigned int st_stlb_miss:1;
+ unsigned int st_locked:1;
+ unsigned int st_reserved2:26;
+ };
+};
+
+
+/*
+ * Map PEBS Load Latency Data Source encodings to generic
+ * memory data source information
+ */
+#define P(a, b) PERF_MEM_S(a, b)
+#define OP_LH (P(OP, LOAD) | P(LVL, HIT))
+#define SNOOP_NONE_MISS (P(SNOOP, NONE) | P(SNOOP, MISS))
+
+static const u64 pebs_data_source[] = {
+ P(OP, LOAD) | P(LVL, MISS) | P(LVL, L3) | P(SNOOP, NA),/* 0x00:ukn L3 */
+ OP_LH | P(LVL, L1) | P(SNOOP, NONE), /* 0x01: L1 local */
+ OP_LH | P(LVL, LFB)| P(SNOOP, NONE), /* 0x02: LFB hit */
+ OP_LH | P(LVL, L2) | P(SNOOP, NONE), /* 0x03: L2 hit */
+ OP_LH | P(LVL, L3) | P(SNOOP, NONE), /* 0x04: L3 hit */
+ OP_LH | P(LVL, L3) | P(SNOOP, MISS), /* 0x05: L3 hit, snoop miss */
+ OP_LH | P(LVL, L3) | P(SNOOP, HIT), /* 0x06: L3 hit, snoop hit */
+ OP_LH | P(LVL, L3) | P(SNOOP, HITM), /* 0x07: L3 hit, snoop hitm */
+ OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HIT), /* 0x08: L3 miss snoop hit */
+ OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HITM), /* 0x09: L3 miss snoop hitm*/
+ OP_LH | P(LVL, LOC_RAM) | P(SNOOP, HIT), /* 0x0a: L3 miss, shared */
+ OP_LH | P(LVL, REM_RAM1) | P(SNOOP, HIT), /* 0x0b: L3 miss, shared */
+ OP_LH | P(LVL, LOC_RAM) | SNOOP_NONE_MISS,/* 0x0c: L3 miss, excl */
+ OP_LH | P(LVL, REM_RAM1) | SNOOP_NONE_MISS,/* 0x0d: L3 miss, excl */
+ OP_LH | P(LVL, IO) | P(SNOOP, NONE), /* 0x0e: I/O */
+ OP_LH | P(LVL,UNC) | P(SNOOP, NONE), /* 0x0f: uncached */
+};
+
+static u64 load_latency_data(u64 status)
+{
+ union intel_x86_pebs_dse dse;
+ u64 val;
+ int model = boot_cpu_data.x86_model;
+ int fam = boot_cpu_data.x86;
+
+ dse.val = status;
+
+ /*
+ * use the mapping table for bit 0-3
+ */
+ val = pebs_data_source[dse.ld_dse];
+
+ /*
+ * Nehalem models do not support TLB, Lock infos
+ */
+ if (fam == 0x6 && (model == 26 || model == 30
+ || model == 31 || model == 46)) {
+ val |= P(TLB, NA) | P(LOCK, NA);
+ return val;
+ }
+ /*
+ * bit 4: TLB access
+ * 0 = did not miss 2nd level TLB
+ * 1 = missed 2nd level TLB
+ */
+ if (dse.ld_stlb_miss)
+ val |= P(TLB, MISS) | P(TLB, L2);
+ else
+ val |= P(TLB, HIT) | P(TLB,L1) | P(TLB, L2);
+
+ /*
+ * bit 5: locked prefix
+ */
+ if (dse.ld_locked)
+ val |= P(LOCK, LOCKED);
+
+ return val;
+}
+
struct pebs_record_core {
u64 flags, ip;
u64 ax, bx, cx, dx;
@@ -364,7 +450,7 @@ struct event_constraint intel_atom_pebs_event_constraints[] = {
};

struct event_constraint intel_nehalem_pebs_event_constraints[] = {
- INTEL_EVENT_CONSTRAINT(0x0b, 0xf), /* MEM_INST_RETIRED.* */
+ INTEL_PLD_CONSTRAINT(0x100b, 0xf), /* MEM_INST_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0x0f, 0xf), /* MEM_UNCORE_RETIRED.* */
INTEL_UEVENT_CONSTRAINT(0x010c, 0xf), /* MEM_STORE_RETIRED.DTLB_MISS */
INTEL_EVENT_CONSTRAINT(0xc0, 0xf), /* INST_RETIRED.ANY */
@@ -379,7 +465,7 @@ struct event_constraint intel_nehalem_pebs_event_constraints[] = {
};

struct event_constraint intel_westmere_pebs_event_constraints[] = {
- INTEL_EVENT_CONSTRAINT(0x0b, 0xf), /* MEM_INST_RETIRED.* */
+ INTEL_PLD_CONSTRAINT(0x100b, 0xf), /* MEM_INST_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0x0f, 0xf), /* MEM_UNCORE_RETIRED.* */
INTEL_UEVENT_CONSTRAINT(0x010c, 0xf), /* MEM_STORE_RETIRED.DTLB_MISS */
INTEL_EVENT_CONSTRAINT(0xc0, 0xf), /* INSTR_RETIRED.* */
@@ -399,7 +485,7 @@ struct event_constraint intel_snb_pebs_event_constraints[] = {
INTEL_UEVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
INTEL_EVENT_CONSTRAINT(0xc4, 0xf), /* BR_INST_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xc5, 0xf), /* BR_MISP_RETIRED.* */
- INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.* */
+ INTEL_PLD_CONSTRAINT(0x01cd, 0x8), /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
INTEL_EVENT_CONSTRAINT(0xd0, 0xf), /* MEM_UOP_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xd2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
@@ -413,7 +499,7 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = {
INTEL_UEVENT_CONSTRAINT(0x02c2, 0xf), /* UOPS_RETIRED.RETIRE_SLOTS */
INTEL_EVENT_CONSTRAINT(0xc4, 0xf), /* BR_INST_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xc5, 0xf), /* BR_MISP_RETIRED.* */
- INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.* */
+ INTEL_PLD_CONSTRAINT(0x01cd, 0x8), /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
INTEL_EVENT_CONSTRAINT(0xd0, 0xf), /* MEM_UOP_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xd2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
@@ -448,6 +534,9 @@ void intel_pmu_pebs_enable(struct perf_event *event)
hwc->config &= ~ARCH_PERFMON_EVENTSEL_INT;

cpuc->pebs_enabled |= 1ULL << hwc->idx;
+
+ if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
+ cpuc->pebs_enabled |= 1ULL << (hwc->idx + 32);
}

void intel_pmu_pebs_disable(struct perf_event *event)
@@ -560,20 +649,48 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
struct pt_regs *iregs, void *__pebs)
{
/*
- * We cast to pebs_record_core since that is a subset of
- * both formats and we don't use the other fields in this
- * routine.
+ * We cast to pebs_record_nhm to get the load latency data
+ * if extra_reg MSR_PEBS_LD_LAT_THRESHOLD used
*/
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
- struct pebs_record_core *pebs = __pebs;
+ struct pebs_record_nhm *pebs = __pebs;
struct perf_sample_data data;
struct pt_regs regs;
+ u64 sample_type;
+ int fll;

if (!intel_pmu_save_and_restart(event))
return;

+ fll = event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT;
+
perf_sample_data_init(&data, 0, event->hw.last_period);

+ data.period = event->hw.last_period;
+ sample_type = event->attr.sample_type;
+
+ /*
+ * if PEBS-LL or PreciseStore
+ */
+ if (fll) {
+ if (sample_type & PERF_SAMPLE_ADDR)
+ data.addr = pebs->dla;
+
+ /*
+ * Use latency for weight (only avail with PEBS-LL)
+ */
+ if (fll && (sample_type & PERF_SAMPLE_WEIGHT))
+ data.weight = pebs->lat;
+
+ /*
+ * data.dsrc encodes the data source
+ */
+ if (sample_type & PERF_SAMPLE_DSRC) {
+ if (fll)
+ data.dsrc.val = load_latency_data(pebs->dse);
+ }
+ }
+
/*
* We use the interrupt regs as a base because the PEBS record
* does not contain a full regs set, specifically it seems to
--
1.7.9.5

2012-11-21 12:52:05

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 13/19] perf record: add support for mem access profiling

Add the -l option to perf record to enable sampling
access cost sampling.

Data address sampling is obtained via the -d option.

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/builtin-record.c | 2 ++
tools/perf/perf.h | 1 +
tools/perf/util/evsel.c | 6 ++++++
3 files changed, 9 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5783c32..8243405 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1041,6 +1041,8 @@ const struct option record_options[] = {
OPT_CALLBACK('j', "branch-filter", &record.opts.branch_stack,
"branch filter mask", "branch stack filter modes",
parse_branch_stack),
+ OPT_BOOLEAN('l', "cost", &record.opts.weight,
+ "event cost"),
OPT_END()
};

diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 054182e..3b79dcf 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -250,6 +250,7 @@ struct perf_record_opts {
bool sample_id_all_missing;
bool exclude_guest_missing;
bool period;
+ bool weight;
unsigned int freq;
unsigned int mmap_pages;
unsigned int user_freq;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 951d4d1..65ca661 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -474,6 +474,12 @@ void perf_evsel__config(struct perf_evsel *evsel, struct perf_record_opts *opts,
attr->sample_type |= PERF_SAMPLE_CPU;
}

+ if (opts->weight)
+ attr->sample_type |= PERF_SAMPLE_WEIGHT;
+
+ if (opts->sample_address)
+ attr->sample_type |= PERF_SAMPLE_DSRC;
+
if (opts->no_delay) {
attr->watermark = 0;
attr->wakeup_events = 1;
--
1.7.9.5

2012-11-21 12:51:59

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 10/19] perf/x86: add support for PEBS Precise Store

This patch adds support for PEBS Precise Store
which is available on Intel Sandy Bridge and
Ivy Bridge processors.

To use Precise store, the proper PEBS event
must be used: mem_trans_retired:precise_stores.
For the perf tool, the generic mem-stores event
exported via sysfs can be used directly.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event.h | 5 +++
arch/x86/kernel/cpu/perf_event_intel.c | 2 ++
arch/x86/kernel/cpu/perf_event_intel_ds.c | 49 +++++++++++++++++++++++++++--
3 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3f91411..645b864 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -66,6 +66,7 @@ struct event_constraint {
* struct event_constraint flags
*/
#define PERF_X86_EVENT_PEBS_LDLAT 0x1 /* ld+ldlat data address sampling */
+#define PERF_X86_EVENT_PEBS_ST 0x2 /* st data address sampling */

struct amd_nb {
int nb_id; /* NorthBridge id */
@@ -242,6 +243,10 @@ struct cpu_hw_events {
__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LDLAT)

+#define INTEL_PST_CONSTRAINT(c, n) \
+ __EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
+ HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_ST)
+
#define EVENT_CONSTRAINT_END \
EVENT_CONSTRAINT(0, 0, 0)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 8eced4c..665e26c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -138,6 +138,7 @@ static struct extra_reg intel_snb_extra_regs[] __read_mostly = {

EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x0b,umask=0x10,ldlat=3");
EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3");
+EVENT_ATTR_STR(mem-stores, mem_st_snb, "event=0xcd,umask=0x2");

struct attribute *nhm_events_attrs[] = {
EVENT_PTR(mem_ld_nhm),
@@ -146,6 +147,7 @@ struct attribute *nhm_events_attrs[] = {

struct attribute *snb_events_attrs[] = {
EVENT_PTR(mem_ld_snb),
+ EVENT_PTR(mem_st_snb),
NULL,
};

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 759b0c0..ed0ca3e 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -69,6 +69,44 @@ static const u64 pebs_data_source[] = {
OP_LH | P(LVL,UNC) | P(SNOOP, NONE), /* 0x0f: uncached */
};

+static u64 precise_store_data(u64 status)
+{
+ union intel_x86_pebs_dse dse;
+ u64 val = P(OP, STORE) | P(SNOOP, NA) | P(LVL, L1) | P(TLB, L2);
+
+ dse.val = status;
+
+ /*
+ * bit 4: TLB access
+ * 1 = stored missed 2nd level TLB
+ *
+ * so it either hit the walker or the OS
+ * otherwise hit 2nd level TLB
+ */
+ if (dse.st_stlb_miss)
+ val |= P(TLB, MISS);
+ else
+ val |= P(TLB, HIT);
+
+ /*
+ * bit 0: hit L1 data cache
+ * if not set, then all we know is that
+ * it missed L1D
+ */
+ if (dse.st_l1d_hit)
+ val |= P(LVL, HIT);
+ else
+ val |= P(LVL, MISS);
+
+ /*
+ * bit 5: Locked prefix
+ */
+ if (dse.st_locked)
+ val |= P(LOCK, LOCKED);
+
+ return val;
+}
+
static u64 load_latency_data(u64 status)
{
union intel_x86_pebs_dse dse;
@@ -486,6 +524,7 @@ struct event_constraint intel_snb_pebs_event_constraints[] = {
INTEL_EVENT_CONSTRAINT(0xc4, 0xf), /* BR_INST_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xc5, 0xf), /* BR_MISP_RETIRED.* */
INTEL_PLD_CONSTRAINT(0x01cd, 0x8), /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
+ INTEL_PST_CONSTRAINT(0x02cd, 0x8), /* MEM_TRANS_RETIRED.PRECISE_STORES */
INTEL_EVENT_CONSTRAINT(0xd0, 0xf), /* MEM_UOP_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xd2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
@@ -500,6 +539,7 @@ struct event_constraint intel_ivb_pebs_event_constraints[] = {
INTEL_EVENT_CONSTRAINT(0xc4, 0xf), /* BR_INST_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xc5, 0xf), /* BR_MISP_RETIRED.* */
INTEL_PLD_CONSTRAINT(0x01cd, 0x8), /* MEM_TRANS_RETIRED.LAT_ABOVE_THR */
+ INTEL_PST_CONSTRAINT(0x02cd, 0x8), /* MEM_TRANS_RETIRED.PRECISE_STORES */
INTEL_EVENT_CONSTRAINT(0xd0, 0xf), /* MEM_UOP_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xd1, 0xf), /* MEM_LOAD_UOPS_RETIRED.* */
INTEL_EVENT_CONSTRAINT(0xd2, 0xf), /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */
@@ -537,6 +577,8 @@ void intel_pmu_pebs_enable(struct perf_event *event)

if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
cpuc->pebs_enabled |= 1ULL << (hwc->idx + 32);
+ else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
+ cpuc->pebs_enabled |= 1ULL << 63;
}

void intel_pmu_pebs_disable(struct perf_event *event)
@@ -657,12 +699,13 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
struct perf_sample_data data;
struct pt_regs regs;
u64 sample_type;
- int fll;
+ int fll, fst;

if (!intel_pmu_save_and_restart(event))
return;

fll = event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT;
+ fst = event->hw.flags & PERF_X86_EVENT_PEBS_ST;

perf_sample_data_init(&data, 0, event->hw.last_period);

@@ -672,7 +715,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
/*
* if PEBS-LL or PreciseStore
*/
- if (fll) {
+ if (fll || fst) {
if (sample_type & PERF_SAMPLE_ADDR)
data.addr = pebs->dla;

@@ -688,6 +731,8 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
if (sample_type & PERF_SAMPLE_DSRC) {
if (fll)
data.dsrc.val = load_latency_data(pebs->dse);
+ else
+ data.dsrc.val = precise_store_data(pebs->dse);
}
}

--
1.7.9.5

2012-11-21 12:52:10

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 14/19] perf tools: add new mem command for memory access profiling

This new command is a wrapper on top of perf record and
perf report to make it easier to configure for memory
access profiling.

To record loads:
$ perf mem -t load rec .....

To record stores:
$ perf mem -t store rec .....

To get the report:
$ perf mem -t load rep

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/Makefile | 1 +
tools/perf/builtin-mem.c | 238 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/builtin.h | 1 +
tools/perf/command-list.txt | 1 +
tools/perf/perf.c | 1 +
5 files changed, 242 insertions(+)
create mode 100644 tools/perf/builtin-mem.c

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index cca5bb8..39695cb 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -463,6 +463,7 @@ BUILTIN_OBJS += $(OUTPUT)builtin-lock.o
BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o
BUILTIN_OBJS += $(OUTPUT)builtin-inject.o
BUILTIN_OBJS += $(OUTPUT)tests/builtin-test.o
+BUILTIN_OBJS += $(OUTPUT)builtin-mem.o

PERFLIBS = $(LIB_FILE) $(LIBTRACEEVENT)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
new file mode 100644
index 0000000..c2980b8
--- /dev/null
+++ b/tools/perf/builtin-mem.c
@@ -0,0 +1,238 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+#include "util/tool.h"
+#include "util/session.h"
+
+#define MEM_OPERATION_LOAD "load"
+#define MEM_OPERATION_STORE "store"
+
+static const char *mem_operation = MEM_OPERATION_LOAD;
+
+struct perf_mem {
+ struct perf_tool tool;
+ char const *input_name;
+ symbol_filter_t annotate_init;
+ bool hide_unresolved;
+ bool dump_raw;
+ const char *cpu_list;
+ DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
+};
+
+static const char * const mem_usage[] = {
+ "perf mem [<options>] {record <command> |report}",
+ NULL
+};
+
+static int __cmd_record(int argc, const char **argv)
+{
+ int rec_argc, i = 0, j;
+ const char **rec_argv;
+ char event[64];
+ int ret;
+
+ rec_argc = argc + 4;
+ rec_argv = calloc(rec_argc + 1, sizeof(char *));
+ if (!rec_argv)
+ return -1;
+
+ rec_argv[i++] = strdup("record");
+ if (!strcmp(mem_operation, MEM_OPERATION_LOAD))
+ rec_argv[i++] = strdup("-l");
+ rec_argv[i++] = strdup("-d");
+ rec_argv[i++] = strdup("-e");
+
+ if (strcmp(mem_operation, MEM_OPERATION_LOAD))
+ sprintf(event, "cpu/mem-stores/pp");
+ else
+ sprintf(event, "cpu/mem-loads/pp");
+
+ rec_argv[i++] = strdup(event);
+ for (j = 1; j < argc; j++, i++)
+ rec_argv[i] = argv[j];
+
+ ret = cmd_record(i, rec_argv, NULL);
+ free(rec_argv);
+ return ret;
+}
+
+static int
+dump_raw_samples(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct perf_evsel *evsel __maybe_unused,
+ struct machine *machine)
+{
+ struct perf_mem *mem = container_of(tool, struct perf_mem, tool);
+ struct addr_location al;
+ const char *fmt;
+
+ if (perf_event__preprocess_sample(event, machine, &al, sample,
+ mem->annotate_init) < 0) {
+ fprintf(stderr, "problem processing %d event, skipping it.\n",
+ event->header.type);
+ return -1;
+ }
+
+ if (al.filtered || (mem->hide_unresolved && al.sym == NULL))
+ return 0;
+
+ if (al.map != NULL)
+ al.map->dso->hit = 1;
+
+ if (symbol_conf.field_sep) {
+ fmt = "%d%s%d%s0x%"PRIx64"%s0x%"PRIx64"%s%"PRIu64
+ "%s0x%"PRIx64"%s%s:%s\n";
+ } else {
+ fmt = "%5d%s%5d%s0x%016"PRIx64"%s0x016%"PRIx64
+ "%s%5"PRIu64"%s0x%06"PRIx64"%s%s:%s\n";
+ symbol_conf.field_sep = " ";
+ }
+
+ printf( fmt,
+ sample->pid,
+ symbol_conf.field_sep,
+ sample->tid,
+ symbol_conf.field_sep,
+ event->ip.ip,
+ symbol_conf.field_sep,
+ sample->addr,
+ symbol_conf.field_sep,
+ sample->weight,
+ symbol_conf.field_sep,
+ sample->dsrc,
+ symbol_conf.field_sep,
+ al.map ? (al.map->dso ? al.map->dso->long_name : "???") : "???",
+ al.sym ? al.sym->name : "???");
+
+ return 0;
+}
+
+static int process_sample_event(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct perf_evsel *evsel,
+ struct machine *machine)
+{
+ return dump_raw_samples(tool, event, sample, evsel, machine);
+}
+
+static int report_raw_events(struct perf_mem *mem)
+{
+ int err = -EINVAL;
+ int ret;
+ struct perf_session *session = perf_session__new(input_name, O_RDONLY,
+ 0, false, &mem->tool);
+
+ if (session == NULL)
+ return -ENOMEM;
+
+ if (mem->cpu_list) {
+ ret = perf_session__cpu_bitmap(session, mem->cpu_list,
+ mem->cpu_bitmap);
+ if (ret)
+ goto out_delete;
+ }
+
+ if (symbol__init() < 0)
+ return -1;
+
+ printf("# PID, TID, IP, ADDR, COST, DSRC, SYMBOL\n");
+
+ err = perf_session__process_events(session, &mem->tool);
+ if (err)
+ return err;
+
+ return 0;
+
+out_delete:
+ perf_session__delete(session);
+ return err;
+}
+
+static int report_events(int argc, const char **argv, struct perf_mem *mem)
+{
+ const char **rep_argv;
+ int ret, i = 0, j, rep_argc;
+
+ if (mem->dump_raw)
+ return report_raw_events(mem);
+
+ rep_argc = argc + 3;
+ rep_argv = calloc(rep_argc + 1, sizeof(char *));
+ if (!rep_argv)
+ return -1;
+
+ rep_argv[i++] = strdup("report");
+ rep_argv[i++] = strdup("--mem-mode");
+ rep_argv[i++] = strdup("-n"); /* display number of samples */
+
+ /*
+ * there is no cost associated with stores, so don't print
+ * the column
+ */
+ if (strcmp(mem_operation, MEM_OPERATION_LOAD))
+ rep_argv[i++] = strdup("--sort=mem,sym,dso,symbol_daddr,dso_daddr,tlb,locked");
+
+ for (j = 1; j < argc; j++, i++)
+ rep_argv[i] = argv[j];
+
+ ret = cmd_report(i, rep_argv, NULL);
+ free(rep_argv);
+ return ret;
+}
+
+int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
+{
+ struct stat st;
+ struct perf_mem mem = {
+ .tool = {
+ .sample = process_sample_event,
+ .mmap = perf_event__process_mmap,
+ .comm = perf_event__process_comm,
+ .lost = perf_event__process_lost,
+ .fork = perf_event__process_fork,
+ .build_id = perf_event__process_build_id,
+ .ordered_samples= true,
+ },
+ .input_name = "perf.data",
+ };
+ const struct option mem_options[] = {
+ OPT_STRING('t', "type", &mem_operation,
+ "type", "memory operations(load/store)"),
+ OPT_BOOLEAN('D', "dump-raw-samples", &mem.dump_raw,
+ "dump raw samples in ASCII"),
+ OPT_BOOLEAN('U', "hide-unresolved", &mem.hide_unresolved,
+ "Only display entries resolved to a symbol"),
+ OPT_STRING('i', "input", &input_name, "file", "input file name"),
+ OPT_STRING('C', "cpu", &mem.cpu_list, "cpu", "list of cpus to profile"),
+ OPT_STRING('x', "field-separator", &symbol_conf.field_sep, "separator",
+ "separator for columns, no spaces will be added between "
+ "columns '.' is reserved."),
+ OPT_END()
+ };
+
+ argc = parse_options(argc, argv, mem_options, mem_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+
+ if (!argc || !(strncmp(argv[0], "rec", 3) || mem_operation))
+ usage_with_options(mem_usage, mem_options);
+
+ if (!mem.input_name || !strlen(mem.input_name)) {
+ if (!fstat(STDIN_FILENO, &st) && S_ISFIFO(st.st_mode))
+ mem.input_name = "-";
+ else
+ mem.input_name = "perf.data";
+ }
+
+ if (!strncmp(argv[0], "rec", 3))
+ return __cmd_record(argc, argv);
+ else if (!strncmp(argv[0], "rep", 3))
+ return report_events(argc, argv, &mem);
+ else
+ usage_with_options(mem_usage, mem_options);
+
+ return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 08143bd..b210d62 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -36,6 +36,7 @@ extern int cmd_kvm(int argc, const char **argv, const char *prefix);
extern int cmd_test(int argc, const char **argv, const char *prefix);
extern int cmd_trace(int argc, const char **argv, const char *prefix);
extern int cmd_inject(int argc, const char **argv, const char *prefix);
+extern int cmd_mem(int argc, const char **argv, const char *prefix);

extern int find_scripts(char **scripts_array, char **scripts_path_array);
#endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 3e86bbd..2c5b621 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -24,3 +24,4 @@ perf-kmem mainporcelain common
perf-lock mainporcelain common
perf-kvm mainporcelain common
perf-test mainporcelain common
+perf-mem mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index a0ae290..96b62f7 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -60,6 +60,7 @@ static struct cmd_struct commands[] = {
{ "trace", cmd_trace, 0 },
#endif
{ "inject", cmd_inject, 0 },
+ { "mem", cmd_mem, 0 },
};

struct pager_config {
--
1.7.9.5

2012-11-21 12:52:16

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 16/19] perf tools: detect data vs. text mappings

Leverages the PERF_RECORD_MISC_MMAP_DATA bit in
the RECORD_MMAP record header. When the bit is set
then the mapping type is set to MAP__VARIABLE.

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/util/machine.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 1f09d05..d1c3e48 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -379,6 +379,7 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
struct thread *thread;
struct map *map;
+ enum map_type type;
int ret = 0;

if (dump_trace)
@@ -395,10 +396,17 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
thread = machine__findnew_thread(machine, event->mmap.pid);
if (thread == NULL)
goto out_problem;
+
+ if (event->header.misc & PERF_RECORD_MISC_MMAP_DATA)
+ type = MAP__VARIABLE;
+ else
+ type = MAP__FUNCTION;
+
map = map__new(&machine->user_dsos, event->mmap.start,
event->mmap.len, event->mmap.pgoff,
event->mmap.pid, event->mmap.filename,
- MAP__FUNCTION);
+ type);
+
if (map == NULL)
goto out_problem;

--
1.7.9.5

2012-11-21 12:52:22

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 19/19] perf tools: Fix output of symbol_daddr offset

From: Namhyung Kim <[email protected]>

The symbol addresses in a dso have relative offsets from the start of
a mapping. So in order to ouput correct offset value from @ip, one of
them should be converted.

Cc: Stephane Eranian <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/sort.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index e2e466d..166480c 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -186,7 +186,7 @@ static int _hist_entry__sym_snprintf(struct map *map, struct symbol *sym,
if (map->type == MAP__VARIABLE) {
ret += repsep_snprintf(bf + ret, size - ret, "%s", sym->name);
ret += repsep_snprintf(bf + ret, size - ret, "+0x%llx",
- ip - sym->start);
+ ip - map->unmap_ip(map, sym->start));
ret += repsep_snprintf(bf + ret, size - ret, "%-*s",
width - ret, "");
} else {
--
1.7.9.5

2012-11-21 12:52:24

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 18/19] perf tools: Ignore ABS symbols when loading data maps

From: Namhyung Kim <[email protected]>

When loading symbols in a data mapping, ABS symbols (which has a value
of SHN_ABS in its st_shndx) failed at elf_getscn(). And it marks the
loading as a failure so already loaded symbols cannot be fixed up.

I'm not sure what should be done. Just ignore them for now. :)

Cc: Stephane Eranian <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/symbol-elf.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index db0cc92..00cf128 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -719,6 +719,9 @@ int dso__load_sym(struct dso *dso, struct map *map,
used_opd = true;
}

+ if (sym.st_shndx == SHN_ABS)
+ continue;
+
sec = elf_getscn(runtime_ss->elf, sym.st_shndx);
if (!sec)
goto out_elf_end;
--
1.7.9.5

2012-11-21 12:52:57

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 17/19] perf tools: Synthesize data mmap events for threads

From: Namhyung Kim <[email protected]>

Current perf_event__synthesize_mmap_events() only deals with
executable mappings. With upcoming memory access sampling,
non-executable data mappings are needed also.

While at it, convert parsing code to use sscanf which makes
the code cleaner IMHO.

Cc: Stephane Eranian <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/event.c | 78 +++++++++++++++++++++--------------------------
1 file changed, 35 insertions(+), 43 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index ca9ca28..068acf6 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -193,55 +193,47 @@ static int perf_event__synthesize_mmap_events(struct perf_tool *tool,
event->header.misc = PERF_RECORD_MISC_USER;

while (1) {
- char bf[BUFSIZ], *pbf = bf;
- int n;
+ char bf[BUFSIZ];
+ char prot[5], *pprot = prot;
size_t size;
+ char exec_name[PATH_MAX], *pexec = exec_name;
+ char anonstr[] = "//anon";
+
if (fgets(bf, sizeof(bf), fp) == NULL)
break;

+ /* ensure null termination since stack will be reused */
+ strcpy(exec_name, "");
+
/* 00400000-0040c000 r-xp 00000000 fd:01 41038 /bin/cat */
- n = hex2u64(pbf, &event->mmap.start);
- if (n < 0)
- continue;
- pbf += n + 1;
- n = hex2u64(pbf, &event->mmap.len);
- if (n < 0)
+ sscanf(bf, "%"PRIx64"-%"PRIx64" %s %"PRIx64" %*x:%*x %*u %s\n",
+ &event->mmap.start, &event->mmap.len, pprot,
+ &event->mmap.pgoff, pexec);
+
+ if (prot[2] == 'x' && !strcmp(exec_name, ""))
+ strcpy(exec_name, anonstr);
+
+ /* ignore non-executable anon mappings */
+ if (!strcmp(exec_name, ""))
continue;
- pbf += n + 3;
- if (*pbf == 'x') { /* vm_exec */
- char anonstr[] = "//anon\n";
- char *execname = strchr(bf, '/');
-
- /* Catch VDSO */
- if (execname == NULL)
- execname = strstr(bf, "[vdso]");
-
- /* Catch anonymous mmaps */
- if ((execname == NULL) && !strstr(bf, "["))
- execname = anonstr;
-
- if (execname == NULL)
- continue;
-
- pbf += 3;
- n = hex2u64(pbf, &event->mmap.pgoff);
-
- size = strlen(execname);
- execname[size - 1] = '\0'; /* Remove \n */
- memcpy(event->mmap.filename, execname, size);
- size = PERF_ALIGN(size, sizeof(u64));
- event->mmap.len -= event->mmap.start;
- event->mmap.header.size = (sizeof(event->mmap) -
- (sizeof(event->mmap.filename) - size));
- memset(event->mmap.filename + size, 0, machine->id_hdr_size);
- event->mmap.header.size += machine->id_hdr_size;
- event->mmap.pid = tgid;
- event->mmap.tid = pid;
-
- if (process(tool, event, &synth_sample, machine) != 0) {
- rc = -1;
- break;
- }
+
+ if (prot[2] != 'x')
+ event->header.misc |= PERF_RECORD_MISC_MMAP_DATA;
+
+ size = strlen(exec_name) + 1;
+ memcpy(event->mmap.filename, exec_name, size);
+ size = PERF_ALIGN(size, sizeof(u64));
+ event->mmap.len -= event->mmap.start;
+ event->mmap.header.size = (sizeof(event->mmap) -
+ (sizeof(event->mmap.filename) - size));
+ memset(event->mmap.filename + size, 0, machine->id_hdr_size);
+ event->mmap.header.size += machine->id_hdr_size;
+ event->mmap.pid = tgid;
+ event->mmap.tid = pid;
+
+ if (process(tool, event, &synth_sample, machine) != 0) {
+ rc = -1;
+ break;
}
}

--
1.7.9.5

2012-11-21 12:53:23

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 15/19] perf: add PERF_RECORD_MISC_MMAP_DATA to RECORD_MMAP

Type of mapping was lost and made it hard for a tool
to distinguish code vs. data mmaps. Perf has the ability
to distinguish the two.

Use a bit in the header->misc bitmask to keep track of
the mmap type. If PERF_RECORD_MISC_MMAP_DATA is set then
the mapping is not executable (!VM_EXEC). If not set, then
the mapping is executable.

Signed-off-by: Stephane Eranian <[email protected]>
---
include/uapi/linux/perf_event.h | 1 +
kernel/events/core.c | 3 +++
2 files changed, 4 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b2a78bb..0fa3feb 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -446,6 +446,7 @@ struct perf_event_mmap_page {
#define PERF_RECORD_MISC_GUEST_KERNEL (4 << 0)
#define PERF_RECORD_MISC_GUEST_USER (5 << 0)

+#define PERF_RECORD_MISC_MMAP_DATA (1 << 13)
/*
* Indicates that the content of PERF_SAMPLE_IP points to
* the actual instruction that triggered the event. See also
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 94b2c70..e7318a7 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4764,6 +4764,9 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
mmap_event->file_name = name;
mmap_event->file_size = size;

+ if (!(vma->vm_flags & VM_EXEC))
+ mmap_event->event_id.header.misc |= PERF_RECORD_MISC_MMAP_DATA;
+
mmap_event->event_id.header.size = sizeof(mmap_event->event_id) + size;

rcu_read_lock();
--
1.7.9.5

2012-11-21 12:51:57

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 11/19] perf tools: add mem access sampling core support

This patch adds the sorting and histogram support
functions to enable profiling of memory accesses.

The following sorting orders are added:
- symbol_daddr: data address symbol (or raw address)
- dso_daddr: data address shared object
- cost: access cost
- locked: access uses locked transaction
- tlb : TLB access
- mem : memory level of the access (L1, L2, L3, RAM, ...)
- snoop: access snoop mode

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/util/event.h | 1 +
tools/perf/util/evsel.c | 4 +
tools/perf/util/hist.c | 66 ++++++++-
tools/perf/util/hist.h | 13 ++
tools/perf/util/session.c | 37 ++++++
tools/perf/util/session.h | 4 +
tools/perf/util/sort.c | 324 ++++++++++++++++++++++++++++++++++++++++++++-
tools/perf/util/sort.h | 10 +-
tools/perf/util/symbol.h | 7 +
9 files changed, 456 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index cf52977..ad66b44 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -91,6 +91,7 @@ struct perf_sample {
u32 cpu;
u32 raw_size;
u64 weight;
+ u64 dsrc;
void *raw_data;
struct ip_callchain *callchain;
struct branch_stack *branch_stack;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index b6f3697..951d4d1 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1025,6 +1025,10 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
array++;
}

+ if (type & PERF_SAMPLE_DSRC) {
+ data->dsrc = *array;
+ array++;
+ }
return 0;
}

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 7c6e73b..e937670 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -66,12 +66,16 @@ static void hists__set_unres_dso_col_len(struct hists *hists, int dso)
void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
{
const unsigned int unresolved_col_width = BITS_PER_LONG / 4;
+ int symlen;
u16 len;

if (h->ms.sym)
hists__new_col_len(hists, HISTC_SYMBOL, h->ms.sym->namelen + 4);
- else
+ else {
+ symlen = unresolved_col_width + 4 + 2;
+ hists__new_col_len(hists, HISTC_SYMBOL, symlen);
hists__set_unres_dso_col_len(hists, HISTC_DSO);
+ }

len = thread__comm_len(h->thread);
if (hists__new_col_len(hists, HISTC_COMM, len))
@@ -83,7 +87,6 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
}

if (h->branch_info) {
- int symlen;
/*
* +4 accounts for '[x] ' priv level info
* +2 account of 0x prefix on raw addresses
@@ -111,7 +114,32 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
hists__new_col_len(hists, HISTC_SYMBOL_TO, symlen);
hists__set_unres_dso_col_len(hists, HISTC_DSO_TO);
}
+ } else if (h->mem_info) {
+ /*
+ * +4 accounts for '[x] ' priv level info
+ * +2 account of 0x prefix on raw addresses
+ */
+ if (h->mem_info->daddr.sym) {
+ symlen = (int)h->mem_info->daddr.sym->namelen + 4 + unresolved_col_width + 2;
+ hists__new_col_len(hists, HISTC_MEM_DADDR_SYMBOL, symlen);
+ } else {
+ symlen = unresolved_col_width + 4 + 2;
+ hists__new_col_len(hists, HISTC_MEM_DADDR_SYMBOL, symlen);
+ }
+ if (h->mem_info->daddr.map) {
+ symlen = dso__name_len(h->mem_info->daddr.map->dso);
+ hists__new_col_len(hists, HISTC_MEM_DADDR_DSO, symlen);
+ } else {
+ symlen = unresolved_col_width + 4 + 2;
+ hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
+ }
+ hists__new_col_len(hists, HISTC_MEM_COST, 7);
+ hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
+ hists__new_col_len(hists, HISTC_MEM_TLB, 22);
+ hists__new_col_len(hists, HISTC_MEM_SNOOP, 12);
+ hists__new_col_len(hists, HISTC_MEM_LVL, 21+3);
}
+
}

void hists__output_recalc_col_len(struct hists *hists, int max_rows)
@@ -235,7 +263,7 @@ void hists__decay_entries_threaded(struct hists *hists,
static struct hist_entry *hist_entry__new(struct hist_entry *template)
{
size_t callchain_size = symbol_conf.use_callchain ? sizeof(struct callchain_root) : 0;
- struct hist_entry *he = malloc(sizeof(*he) + callchain_size);
+ struct hist_entry *he = calloc(1, sizeof(*he) + callchain_size);

if (he != NULL) {
*he = *template;
@@ -323,6 +351,35 @@ static struct hist_entry *add_hist_entry(struct hists *hists,
return he;
}

+struct hist_entry *__hists__add_mem_entry(struct hists *self,
+ struct addr_location *al,
+ struct symbol *sym_parent,
+ struct mem_info *mi,
+ u64 weight)
+{
+ struct hist_entry entry = {
+ .thread = al->thread,
+ .ms = {
+ .map = al->map,
+ .sym = al->sym,
+ },
+ .cpu = al->cpu,
+ .ip = al->addr,
+ .level = al->level,
+ .stat = {
+ .period = weight,
+ .nr_events = 1,
+ },
+ .parent = sym_parent,
+ .filtered = symbol__parent_filter(sym_parent),
+ .hists = self,
+ .mem_info = mi,
+ .branch_info = NULL,
+ };
+
+ return add_hist_entry(self, &entry, al, weight);
+}
+
struct hist_entry *__hists__add_branch_entry(struct hists *self,
struct addr_location *al,
struct symbol *sym_parent,
@@ -346,6 +403,7 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
.filtered = symbol__parent_filter(sym_parent),
.branch_info = bi,
.hists = self,
+ .mem_info = NULL,
};

return add_hist_entry(self, &entry, al, period);
@@ -371,6 +429,8 @@ struct hist_entry *__hists__add_entry(struct hists *self,
.parent = sym_parent,
.filtered = symbol__parent_filter(sym_parent),
.hists = self,
+ .branch_info = NULL,
+ .mem_info = NULL,
};

return add_hist_entry(self, &entry, al, period);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 1278c2c..25b1c16 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -49,6 +49,13 @@ enum hist_column {
HISTC_DSO_FROM,
HISTC_DSO_TO,
HISTC_SRCLINE,
+ HISTC_MEM_DADDR_SYMBOL,
+ HISTC_MEM_DADDR_DSO,
+ HISTC_MEM_COST,
+ HISTC_MEM_LOCKED,
+ HISTC_MEM_TLB,
+ HISTC_MEM_LVL,
+ HISTC_MEM_SNOOP,
HISTC_NR_COLS, /* Last entry */
};

@@ -86,6 +93,12 @@ struct hist_entry *__hists__add_branch_entry(struct hists *self,
struct branch_info *bi,
u64 period);

+struct hist_entry *__hists__add_mem_entry(struct hists *self,
+ struct addr_location *al,
+ struct symbol *sym_parent,
+ struct mem_info *mi,
+ u64 period);
+
void hists__output_resort(struct hists *self);
void hists__output_resort_threaded(struct hists *hists);
void hists__collapse_resort(struct hists *self);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index b6d7cad..58ca636 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -273,6 +273,41 @@ static void ip__resolve_ams(struct machine *self, struct thread *thread,
ams->map = al.map;
}

+static void ip__resolve_data(struct machine *self, struct thread *thread,
+ u8 m,
+ struct addr_map_symbol *ams,
+ u64 addr)
+{
+ struct addr_location al;
+
+ memset(&al, 0, sizeof(al));
+
+ thread__find_addr_location(thread, self, m, MAP__VARIABLE, addr, &al, NULL);
+ ams->addr = addr;
+ ams->al_addr = al.addr;
+ ams->sym = al.sym;
+ ams->map = al.map;
+}
+
+struct mem_info *machine__resolve_mem(struct machine *self,
+ struct thread *thr,
+ struct perf_sample *sample,
+ u8 cpumode)
+{
+ struct mem_info *mi;
+
+ mi = calloc(1, sizeof(struct mem_info));
+ if (!mi)
+ return NULL;
+
+ ip__resolve_ams(self, thr, &mi->iaddr, sample->ip);
+ ip__resolve_data(self, thr, cpumode, &mi->daddr, sample->addr);
+ mi->cost = sample->weight;
+ mi->dsrc.val = sample->dsrc;
+
+ return mi;
+}
+
struct branch_info *machine__resolve_bstack(struct machine *self,
struct thread *thr,
struct branch_stack *bs)
@@ -1013,6 +1048,8 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,
if (sample_type & PERF_SAMPLE_ADDR)
printf(" ..... data: 0x%"PRIx64"\n", sample->addr);

+ if (sample_type & PERF_SAMPLE_DSRC)
+ printf(" . data_src: 0x%"PRIx64"\n", sample->dsrc);
}

static struct machine *
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 18d1222..9283c96 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -69,6 +69,10 @@ int perf_session__resolve_callchain(struct perf_session *self, struct perf_evsel
struct ip_callchain *chain,
struct symbol **parent);

+struct mem_info *machine__resolve_mem(struct machine *self,
+ struct thread *thread,
+ struct perf_sample *sample, u8 cpumode);
+
bool perf_session__has_traces(struct perf_session *self, const char *msg);

void mem_bswap_64(void *src, int byte_size);
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index cfd1c0f..e2e466d 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -182,11 +182,19 @@ static int _hist_entry__sym_snprintf(struct map *map, struct symbol *sym,
}

ret += repsep_snprintf(bf + ret, size - ret, "[%c] ", level);
- if (sym)
- ret += repsep_snprintf(bf + ret, size - ret, "%-*s",
- width - ret,
- sym->name);
- else {
+ if (sym) {
+ if (map->type == MAP__VARIABLE) {
+ ret += repsep_snprintf(bf + ret, size - ret, "%s", sym->name);
+ ret += repsep_snprintf(bf + ret, size - ret, "+0x%llx",
+ ip - sym->start);
+ ret += repsep_snprintf(bf + ret, size - ret, "%-*s",
+ width - ret, "");
+ } else {
+ ret += repsep_snprintf(bf + ret, size - ret, "%-*s",
+ width - ret,
+ sym->name);
+ }
+ } else {
size_t len = BITS_PER_LONG / 4;
ret += repsep_snprintf(bf + ret, size - ret, "%-#.*llx",
len, ip);
@@ -469,6 +477,238 @@ static int hist_entry__mispredict_snprintf(struct hist_entry *self, char *bf,
return repsep_snprintf(bf, size, "%-*s", width, out);
}

+/* --sort daddr_sym */
+static int64_t
+sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+ struct addr_map_symbol *l = &left->mem_info->daddr;
+ struct addr_map_symbol *r = &right->mem_info->daddr;
+
+ return (int64_t)(r->addr - l->addr);
+}
+
+static int hist_entry__daddr_snprintf(struct hist_entry *self, char *bf,
+ size_t size, unsigned int width)
+{
+ return _hist_entry__sym_snprintf(self->mem_info->daddr.map,
+ self->mem_info->daddr.sym,
+ self->mem_info->daddr.addr,
+ self->level, bf, size, width);
+}
+
+static int64_t
+sort__dso_daddr_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+ return _sort__dso_cmp(left->mem_info->daddr.map, right->mem_info->daddr.map);
+}
+
+static int hist_entry__dso_daddr_snprintf(struct hist_entry *self, char *bf,
+ size_t size, unsigned int width)
+{
+ return _hist_entry__dso_snprintf(self->mem_info->daddr.map, bf, size, width);
+}
+
+static int64_t
+sort__cost_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+ u64 cost_l = left->mem_info->cost;
+ u64 cost_r = right->mem_info->cost;
+
+ return (int64_t)(cost_r - cost_l);
+}
+
+static int hist_entry__cost_snprintf(struct hist_entry *self, char *bf,
+ size_t size, unsigned int width)
+{
+ if (self->mem_info->cost == 0)
+ return repsep_snprintf(bf, size, "%*s", width, "N/A");
+ return repsep_snprintf(bf, size, "%*"PRIu64, width, self->mem_info->cost);
+}
+
+static int64_t
+sort__locked_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+ union perf_mem_dsrc dsrc_l = left->mem_info->dsrc;
+ union perf_mem_dsrc dsrc_r = right->mem_info->dsrc;
+
+ return (int64_t)(dsrc_r.mem_lock - dsrc_l.mem_lock);
+}
+
+static int hist_entry__locked_snprintf(struct hist_entry *self, char *bf,
+ size_t size, unsigned int width)
+{
+ const char *out = "??";
+ u64 mask = self->mem_info->dsrc.mem_lock;
+
+ if (mask & PERF_MEM_LOCK_NA)
+ out = "N/A";
+ else if (mask & PERF_MEM_LOCK_LOCKED)
+ out = "Yes";
+ else
+ out = "No";
+
+ return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+static int64_t
+sort__tlb_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+ union perf_mem_dsrc dsrc_l = left->mem_info->dsrc;
+ union perf_mem_dsrc dsrc_r = right->mem_info->dsrc;
+
+ return (int64_t)(dsrc_r.mem_dtlb - dsrc_l.mem_dtlb);
+}
+
+static const char *tlb_access[] = {
+ "N/A",
+ "HIT",
+ "MISS",
+ "L1",
+ "L2",
+ "Walker",
+ "Fault",
+};
+#define NUM_TLB_ACCESS (sizeof(tlb_access)/sizeof(const char *))
+
+static int hist_entry__tlb_snprintf(struct hist_entry *self, char *bf,
+ size_t size, unsigned int width)
+{
+ char out[64];
+ size_t sz = sizeof(out) - 1; /* -1 for null termination */
+ size_t l = 0, i;
+ u64 m = self->mem_info->dsrc.mem_dtlb;
+ u64 hit, miss;
+
+ out[0] = '\0';
+
+ hit = m & PERF_MEM_TLB_HIT;
+ miss = m & PERF_MEM_TLB_MISS;
+
+ /* already taken care of */
+ m &= ~(PERF_MEM_TLB_HIT|PERF_MEM_TLB_MISS);
+
+ for (i = 0; m && i < NUM_TLB_ACCESS; i++, m >>= 1) {
+ if (!(m & 0x1))
+ continue;
+ if (l) {
+ strcat(out, " or ");
+ l += 4;
+ }
+ strncat(out, tlb_access[i], sz - l);
+ l += strlen(tlb_access[i]);
+ }
+ if (hit)
+ strncat(out, " hit", sz - l);
+ if (miss)
+ strncat(out, " miss", sz - l);
+
+ return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+static int64_t
+sort__lvl_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+ union perf_mem_dsrc dsrc_l = left->mem_info->dsrc;
+ union perf_mem_dsrc dsrc_r = right->mem_info->dsrc;
+
+ return (int64_t)(dsrc_r.mem_lvl - dsrc_l.mem_lvl);
+}
+
+static const char *mem_lvl[] = {
+ "N/A",
+ "HIT",
+ "MISS",
+ "L1",
+ "LFB",
+ "L2",
+ "L3",
+ "Local RAM",
+ "Remote RAM (1 hop)",
+ "Remote RAM (2 hops)",
+ "Remote Cache (1 hop)",
+ "Remote Cache (2 hops)",
+ "I/O",
+ "Uncached",
+};
+#define NUM_MEM_LVL (sizeof(mem_lvl)/sizeof(const char *))
+
+static int hist_entry__lvl_snprintf(struct hist_entry *self, char *bf,
+ size_t size, unsigned int width)
+{
+ char out[64];
+ size_t sz = sizeof(out) - 1; /* -1 for null termination */
+ size_t i, l = 0;
+ u64 m = self->mem_info->dsrc.mem_lvl;
+ u64 hit, miss;
+
+ out[0] = '\0';
+
+ hit = m & PERF_MEM_LVL_HIT;
+ miss = m & PERF_MEM_LVL_MISS;
+
+ /* already taken care of */
+ m &= ~(PERF_MEM_LVL_HIT|PERF_MEM_LVL_MISS);
+
+ for (i = 0; m && i < NUM_MEM_LVL; i++, m >>= 1) {
+ if (!(m & 0x1))
+ continue;
+ if (l) {
+ strcat(out, " or ");
+ l += 4;
+ }
+ strncat(out, mem_lvl[i], sz - l);
+ l += strlen(mem_lvl[i]);
+ }
+ if (hit)
+ strncat(out, " hit", sz - l);
+ if (miss)
+ strncat(out, " miss", sz - l);
+
+ return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
+static int64_t
+sort__snoop_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+ union perf_mem_dsrc dsrc_l = left->mem_info->dsrc;
+ union perf_mem_dsrc dsrc_r = right->mem_info->dsrc;
+
+ return (int64_t)(dsrc_r.mem_snoop - dsrc_l.mem_snoop);
+}
+
+static const char *snoop_access[] = {
+ "N/A",
+ "None",
+ "Miss",
+ "Hit",
+ "HitM",
+};
+#define NUM_SNOOP_ACCESS (sizeof(snoop_access)/sizeof(const char *))
+
+static int hist_entry__snoop_snprintf(struct hist_entry *self, char *bf,
+ size_t size, unsigned int width)
+{
+ char out[64];
+ size_t sz = sizeof(out) - 1; /* -1 for null termination */
+ size_t i, l = 0;
+ u64 m = self->mem_info->dsrc.mem_snoop;
+
+ out[0] = '\0';
+
+ for (i = 0; m && i < NUM_SNOOP_ACCESS; i++, m >>= 1) {
+ if (!(m & 0x1))
+ continue;
+ if (l) {
+ strcat(out, " or ");
+ l += 4;
+ }
+ strncat(out, snoop_access[i], sz - l);
+ l += strlen(snoop_access[i]);
+ }
+
+ return repsep_snprintf(bf, size, "%-*s", width, out);
+}
+
struct sort_entry sort_mispredict = {
.se_header = "Branch Mispredicted",
.se_cmp = sort__mispredict_cmp,
@@ -476,6 +716,56 @@ struct sort_entry sort_mispredict = {
.se_width_idx = HISTC_MISPREDICT,
};

+struct sort_entry sort_mem_daddr_sym = {
+ .se_header = "Data Symbol",
+ .se_cmp = sort__daddr_cmp,
+ .se_snprintf = hist_entry__daddr_snprintf,
+ .se_width_idx = HISTC_MEM_DADDR_SYMBOL,
+};
+
+struct sort_entry sort_mem_daddr_dso = {
+ .se_header = "Data Object",
+ .se_cmp = sort__dso_daddr_cmp,
+ .se_snprintf = hist_entry__dso_daddr_snprintf,
+ .se_width_idx = HISTC_MEM_DADDR_SYMBOL,
+};
+
+struct sort_entry sort_mem_cost = {
+ .se_header = "Cost",
+ .se_cmp = sort__cost_cmp,
+ .se_snprintf = hist_entry__cost_snprintf,
+ .se_width_idx = HISTC_MEM_COST,
+};
+
+struct sort_entry sort_mem_locked = {
+ .se_header = "Locked",
+ .se_cmp = sort__locked_cmp,
+ .se_snprintf = hist_entry__locked_snprintf,
+ .se_width_idx = HISTC_MEM_LOCKED,
+};
+
+struct sort_entry sort_mem_tlb = {
+ .se_header = "TLB access",
+ .se_cmp = sort__tlb_cmp,
+ .se_snprintf = hist_entry__tlb_snprintf,
+ .se_width_idx = HISTC_MEM_TLB,
+};
+
+struct sort_entry sort_mem_lvl = {
+ .se_header = "Memory access",
+ .se_cmp = sort__lvl_cmp,
+ .se_snprintf = hist_entry__lvl_snprintf,
+ .se_width_idx = HISTC_MEM_LVL,
+};
+
+struct sort_entry sort_mem_snoop = {
+ .se_header = "Snoop",
+ .se_cmp = sort__snoop_cmp,
+ .se_snprintf = hist_entry__snoop_snprintf,
+ .se_width_idx = HISTC_MEM_SNOOP,
+};
+
+
struct sort_dimension {
const char *name;
struct sort_entry *entry;
@@ -497,6 +787,13 @@ static struct sort_dimension sort_dimensions[] = {
DIM(SORT_CPU, "cpu", sort_cpu),
DIM(SORT_MISPREDICT, "mispredict", sort_mispredict),
DIM(SORT_SRCLINE, "srcline", sort_srcline),
+ DIM(SORT_MEM_DADDR_SYMBOL, "symbol_daddr", sort_mem_daddr_sym),
+ DIM(SORT_MEM_DADDR_DSO, "dso_daddr", sort_mem_daddr_dso),
+ DIM(SORT_MEM_COST, "cost", sort_mem_cost),
+ DIM(SORT_MEM_LOCKED, "locked", sort_mem_locked),
+ DIM(SORT_MEM_TLB, "tlb", sort_mem_tlb),
+ DIM(SORT_MEM_LVL, "mem", sort_mem_lvl),
+ DIM(SORT_MEM_SNOOP, "snoop", sort_mem_snoop),
};

int sort_dimension__add(const char *tok)
@@ -520,7 +817,8 @@ int sort_dimension__add(const char *tok)
sort__has_parent = 1;
} else if (sd->entry == &sort_sym ||
sd->entry == &sort_sym_from ||
- sd->entry == &sort_sym_to) {
+ sd->entry == &sort_sym_to ||
+ sd->entry == &sort_mem_daddr_sym) {
sort__has_sym = 1;
}

@@ -553,6 +851,20 @@ int sort_dimension__add(const char *tok)
sort__first_dimension = SORT_DSO_TO;
else if (!strcmp(sd->name, "mispredict"))
sort__first_dimension = SORT_MISPREDICT;
+ else if (!strcmp(sd->name, "symbol_daddr"))
+ sort__first_dimension = SORT_MEM_DADDR_SYMBOL;
+ else if (!strcmp(sd->name, "dso_daddr"))
+ sort__first_dimension = SORT_MEM_DADDR_DSO;
+ else if (!strcmp(sd->name, "cost"))
+ sort__first_dimension = SORT_MEM_COST;
+ else if (!strcmp(sd->name, "locked"))
+ sort__first_dimension = SORT_MEM_LOCKED;
+ else if (!strcmp(sd->name, "tlb"))
+ sort__first_dimension = SORT_MEM_TLB;
+ else if (!strcmp(sd->name, "mem_lvl"))
+ sort__first_dimension = SORT_MEM_LVL;
+ else if (!strcmp(sd->name, "snoop"))
+ sort__first_dimension = SORT_MEM_SNOOP;
}

list_add_tail(&sd->entry->list, &hist_entry__sort_list);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index b4e8c3b..adee5eb 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -103,7 +103,8 @@ struct hist_entry {
struct rb_root sorted_chain;
struct branch_info *branch_info;
struct hists *hists;
- struct callchain_root callchain[0];
+ struct mem_info *mem_info;
+ struct callchain_root callchain[0]; /* must be last member */
};

static inline bool hist_entry__has_pairs(struct hist_entry *he)
@@ -137,6 +138,13 @@ enum sort_type {
SORT_SYM_TO,
SORT_MISPREDICT,
SORT_SRCLINE,
+ SORT_MEM_DADDR_SYMBOL,
+ SORT_MEM_DADDR_DSO,
+ SORT_MEM_COST,
+ SORT_MEM_LOCKED,
+ SORT_MEM_TLB,
+ SORT_MEM_LVL,
+ SORT_MEM_SNOOP,
};

/*
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 04ccf29..da3f85c 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -152,6 +152,13 @@ struct branch_info {
struct branch_flags flags;
};

+struct mem_info {
+ struct addr_map_symbol iaddr;
+ struct addr_map_symbol daddr;
+ u64 cost;
+ union perf_mem_dsrc dsrc;
+};
+
struct addr_location {
struct thread *thread;
struct map *map;
--
1.7.9.5

2012-11-21 12:53:57

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 12/19] perf report: add support for mem access profiling

This patch adds the --mem-mode option to perf report.

This mode requires a perf.data file created with memory
access samples.

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/builtin-report.c | 131 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 127 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index fc25100..231165a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -46,6 +46,7 @@ struct perf_report {
bool show_full_info;
bool show_threads;
bool inverted_callchain;
+ bool mem_mode;
struct perf_read_values show_threads_values;
const char *pretty_printing_style;
symbol_filter_t annotate_init;
@@ -54,6 +55,96 @@ struct perf_report {
DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
};

+static int perf_report__add_mem_hist_entry(struct perf_tool *tool,
+ struct addr_location *al,
+ struct perf_sample *sample,
+ struct perf_evsel *evsel,
+ struct machine *machine,
+ union perf_event *event)
+{
+ struct perf_report *rep = container_of(tool, struct perf_report, tool);
+ struct symbol *parent = NULL;
+ u8 cpumode = event->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+ int err = 0;
+ struct hist_entry *he;
+ struct mem_info *mi, *mx;
+ uint64_t cost;
+
+ if ((sort__has_parent || symbol_conf.use_callchain)
+ && sample->callchain) {
+ err = machine__resolve_callchain(machine, evsel, al->thread,
+ sample, &parent);
+ if (err)
+ return err;
+ }
+
+ mi = machine__resolve_mem(machine, al->thread, sample, cpumode);
+ if (!mi)
+ return -ENOMEM;
+
+ if (rep->hide_unresolved && !al->sym)
+ return 0;
+
+ cost = mi->cost;
+ if (!cost)
+ cost = 1;
+
+ /*
+ * The report shows the percentage of total branches captured
+ * and not events sampled. Thus we use a pseudo period of 1.
+ * Only in the newt browser we are doing integrated annotation,
+ * so we don't allocated the extra space needed because the stdio
+ * code will not use it.
+ */
+ he = __hists__add_mem_entry(&evsel->hists, al, parent, mi,
+ cost);
+ if (!he)
+ return -ENOMEM;
+
+ if (sort__has_sym && he->ms.sym && use_browser > 0) {
+ struct annotation *notes = symbol__annotation(he->ms.sym);
+
+ assert(evsel != NULL);
+
+ if (notes->src == NULL && symbol__alloc_hist(he->ms.sym) < 0)
+ goto out;
+
+ err = hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
+ if (err)
+ goto out;
+ }
+
+ if (sort__has_sym && he->mem_info->daddr.sym && use_browser > 0) {
+ struct annotation *notes;
+
+ mx = he->mem_info;
+
+ notes = symbol__annotation(mx->daddr.sym);
+ if (!notes->src
+ && symbol__alloc_hist(mx->daddr.sym) < 0)
+ goto out;
+
+ err = symbol__inc_addr_samples(mx->daddr.sym,
+ mx->daddr.map,
+ evsel->idx,
+ mx->daddr.al_addr);
+ if (err)
+ goto out;
+ }
+
+ evsel->hists.stats.total_period += sample->period;
+ hists__inc_nr_events(&evsel->hists, PERF_RECORD_SAMPLE);
+ err = 0;
+
+ if (symbol_conf.use_callchain) {
+ err = callchain_append(he->callchain,
+ &callchain_cursor,
+ sample->period);
+ }
+out:
+ return err;
+}
+
static int perf_report__add_branch_hist_entry(struct perf_tool *tool,
struct addr_location *al,
struct perf_sample *sample,
@@ -209,6 +300,12 @@ static int process_sample_event(struct perf_tool *tool,
pr_debug("problem adding lbr entry, skipping event\n");
return -1;
}
+ } else if (rep->mem_mode == 1) {
+ if (perf_report__add_mem_hist_entry(tool, &al, sample,
+ evsel, machine, event)) {
+ pr_debug("problem adding mem entry, skipping event\n");
+ return -1;
+ }
} else {
if (al.map != NULL)
al.map->dso->hit = 1;
@@ -292,7 +389,8 @@ static void sig_handler(int sig __maybe_unused)
session_done = 1;
}

-static size_t hists__fprintf_nr_sample_events(struct hists *self,
+static size_t hists__fprintf_nr_sample_events(struct perf_report *rep,
+ struct hists *self,
const char *evname, FILE *fp)
{
size_t ret;
@@ -305,7 +403,11 @@ static size_t hists__fprintf_nr_sample_events(struct hists *self,
if (evname != NULL)
ret += fprintf(fp, " of event '%s'", evname);

- ret += fprintf(fp, "\n# Event count (approx.): %" PRIu64, nr_events);
+ if (rep->mem_mode) {
+ ret += fprintf(fp, "\n# Total cost : %" PRIu64, nr_events);
+ ret += fprintf(fp, "\n# Sort order : %s", sort_order);
+ } else
+ ret += fprintf(fp, "\n# Event count (approx.): %" PRIu64, nr_events);
return ret + fprintf(fp, "\n#\n");
}

@@ -319,7 +421,7 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
struct hists *hists = &pos->hists;
const char *evname = perf_evsel__name(pos);

- hists__fprintf_nr_sample_events(hists, evname, stdout);
+ hists__fprintf_nr_sample_events(rep, hists, evname, stdout);
hists__fprintf(hists, true, 0, 0, stdout);
fprintf(stdout, "\n\n");
}
@@ -596,7 +698,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
"Use the stdio interface"),
OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
"sort by key(s): pid, comm, dso, symbol, parent, dso_to,"
- " dso_from, symbol_to, symbol_from, mispredict"),
+ " dso_from, symbol_to, symbol_from, mispredict, mem, cost, symbol_daddr, dso_daddr, tlb, snoop, locked"),
OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization,
"Show sample percentage for different cpu modes"),
OPT_STRING('p', "parent", &parent_pattern, "regex",
@@ -642,6 +744,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
"use branch records for histogram filling", parse_branch_mode),
OPT_STRING(0, "objdump", &objdump_path, "path",
"objdump binary to use for disassembly and annotations"),
+ OPT_BOOLEAN(0, "mem-mode", &report.mem_mode, "mem access profile"),
OPT_END()
};

@@ -687,6 +790,18 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
"dso_to,symbol_to";

}
+ if (report.mem_mode) {
+ if (sort__branch_mode == 1) {
+ fprintf(stderr, "branch and mem mode incompatible\n");
+ goto error;
+ }
+ /*
+ * if no sort_order is provided, then specify
+ * branch-mode specific order
+ */
+ if (sort_order == default_sort_order)
+ sort_order = "cost,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked";
+ }

if (strcmp(input_name, "-") != 0)
setup_browser(true);
@@ -758,6 +873,14 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
sort_entry__setup_elide(&sort_sym_from, symbol_conf.sym_from_list, "sym_from", stdout);
sort_entry__setup_elide(&sort_sym_to, symbol_conf.sym_to_list, "sym_to", stdout);
} else {
+ if (report.mem_mode) {
+ sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "symbol_daddr", stdout);
+ sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso_daddr", stdout);
+ sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "mem", stdout);
+ sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "cost", stdout);
+ sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "tlb", stdout);
+ sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "snoop", stdout);
+ }
sort_entry__setup_elide(&sort_dso, symbol_conf.dso_list, "dso", stdout);
sort_entry__setup_elide(&sort_sym, symbol_conf.sym_list, "symbol", stdout);
}
--
1.7.9.5

2012-11-21 12:54:37

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 09/19] perf/x86: export PEBS load latency threshold register to sysfs

Make the PEBS Load Latency threshold register layout
and encoding visible to user level tools.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 1d2a7ca..8eced4c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1755,6 +1755,8 @@ static void intel_pmu_flush_branch_stack(void)

PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");

+PMU_FORMAT_ATTR(ldlat, "config1:0-15");
+
static struct attribute *intel_arch3_formats_attr[] = {
&format_attr_event.attr,
&format_attr_umask.attr,
@@ -1765,6 +1767,7 @@ static struct attribute *intel_arch3_formats_attr[] = {
&format_attr_cmask.attr,

&format_attr_offcore_rsp.attr, /* XXX do NHM/WSM + SNB breakout */
+ &format_attr_ldlat.attr, /* PEBS load latency */
NULL,
};

--
1.7.9.5

2012-11-21 12:55:01

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 06/19] perf: add support for PERF_SAMPLE_ADDR in dump_sampple()

Was missing from current code yet PERF_SAMPLE_ADDR has
been present for a long time. Needed for PEBS-LL mode.

Signed-off-by: Stephane Eranian <[email protected]>
---
tools/perf/util/session.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 962c04f..b6d7cad 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1009,6 +1009,10 @@ static void dump_sample(struct perf_evsel *evsel, union perf_event *event,

if (sample_type & PERF_SAMPLE_WEIGHT)
printf(" ... weight: %"PRIu64"\n", sample->weight);
+
+ if (sample_type & PERF_SAMPLE_ADDR)
+ printf(" ..... data: 0x%"PRIx64"\n", sample->addr);
+
}

static struct machine *
--
1.7.9.5

2012-11-21 12:55:31

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v3 04/19] perf, core: Add a concept of a weightened sample

From: Andi Kleen <[email protected]>

For some events it's useful to weight sample with a hardware
provided number. This expresses how expensive the action the
sample represent was. This allows the profiler to scale
the samples to be more informative to the programmer.

There is already the period which is used similarly, but it means
something different, so I chose to not overload it. Instead
a new sample type for WEIGHT is added.

Can be used for multiple things. Initially it is used for TSX abort costs
and profiling by memory latencies (so to make expensive load appear higher
up in the histograms) The concept is quite generic and can be extended
to many other kinds of events or architectures, as long as the hardware
provides suitable auxillary values. In principle it could be also
used for software tracpoints.

This adds the generic glue. A new optional sample format for a 64bit
weight value.

Signed-off-by: Andi Kleen <[email protected]>
---
include/linux/perf_event.h | 2 ++
include/uapi/linux/perf_event.h | 8 ++++++--
kernel/events/core.c | 6 ++++++
3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 484cfbc..bb2429d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -584,6 +584,7 @@ struct perf_sample_data {
struct perf_branch_stack *br_stack;
struct perf_regs_user regs_user;
u64 stack_user_size;
+ u64 weight;
};

static inline void perf_sample_data_init(struct perf_sample_data *data,
@@ -597,6 +598,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
data->regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
data->regs_user.regs = NULL;
data->stack_user_size = 0;
+ data->weight = 0;
}

extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 4f63c05..c52caab 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -132,8 +132,10 @@ enum perf_event_sample_format {
PERF_SAMPLE_BRANCH_STACK = 1U << 11,
PERF_SAMPLE_REGS_USER = 1U << 12,
PERF_SAMPLE_STACK_USER = 1U << 13,
+ PERF_SAMPLE_WEIGHT = 1U << 14,
+
+ PERF_SAMPLE_MAX = 1U << 15, /* non-ABI */

- PERF_SAMPLE_MAX = 1U << 14, /* non-ABI */
};

/*
@@ -198,8 +200,9 @@ enum perf_event_read_format {
PERF_FORMAT_TOTAL_TIME_RUNNING = 1U << 1,
PERF_FORMAT_ID = 1U << 2,
PERF_FORMAT_GROUP = 1U << 3,
+ PERF_FORMAT_WEIGHT = 1U << 4,

- PERF_FORMAT_MAX = 1U << 4, /* non-ABI */
+ PERF_FORMAT_MAX = 1U << 5, /* non-ABI */
};

#define PERF_ATTR_SIZE_VER0 64 /* sizeof first published struct */
@@ -559,6 +562,7 @@ enum perf_event_type {
* { u64 stream_id;} && PERF_SAMPLE_STREAM_ID
* { u32 cpu, res; } && PERF_SAMPLE_CPU
* { u64 period; } && PERF_SAMPLE_PERIOD
+ * { u64 weight; } && PERF_SAMPLE_WEIGHT
*
* { struct read_format values; } && PERF_SAMPLE_READ
*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dbccf83..d633581 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -952,6 +952,9 @@ static void perf_event__header_size(struct perf_event *event)
if (sample_type & PERF_SAMPLE_PERIOD)
size += sizeof(data->period);

+ if (sample_type & PERF_SAMPLE_WEIGHT)
+ size += sizeof(data->weight);
+
if (sample_type & PERF_SAMPLE_READ)
size += event->read_size;

@@ -4080,6 +4083,9 @@ void perf_output_sample(struct perf_output_handle *handle,
if (sample_type & PERF_SAMPLE_PERIOD)
perf_output_put(handle, data->period);

+ if (sample_type & PERF_SAMPLE_WEIGHT)
+ perf_output_put(handle, data->weight);
+
if (sample_type & PERF_SAMPLE_READ)
perf_output_read(handle, event);

--
1.7.9.5

2012-11-28 15:24:14

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v3 04/19] perf, core: Add a concept of a weightened sample

On Wed, Nov 21, 2012 at 01:49:30PM +0100, Stephane Eranian wrote:
> From: Andi Kleen <[email protected]>

SNIP

> index 4f63c05..c52caab 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -132,8 +132,10 @@ enum perf_event_sample_format {
> PERF_SAMPLE_BRANCH_STACK = 1U << 11,
> PERF_SAMPLE_REGS_USER = 1U << 12,
> PERF_SAMPLE_STACK_USER = 1U << 13,
> + PERF_SAMPLE_WEIGHT = 1U << 14,
> +
> + PERF_SAMPLE_MAX = 1U << 15, /* non-ABI */
>
> - PERF_SAMPLE_MAX = 1U << 14, /* non-ABI */
> };
>
> /*
> @@ -198,8 +200,9 @@ enum perf_event_read_format {
> PERF_FORMAT_TOTAL_TIME_RUNNING = 1U << 1,
> PERF_FORMAT_ID = 1U << 2,
> PERF_FORMAT_GROUP = 1U << 3,
> + PERF_FORMAT_WEIGHT = 1U << 4,

what's PERF_FORMAT_WEIGHT for?

thanks,
jirka

2012-11-28 15:27:51

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v3 04/19] perf, core: Add a concept of a weightened sample

> > @@ -198,8 +200,9 @@ enum perf_event_read_format {
> > PERF_FORMAT_TOTAL_TIME_RUNNING = 1U << 1,
> > PERF_FORMAT_ID = 1U << 2,
> > PERF_FORMAT_GROUP = 1U << 3,
> > + PERF_FORMAT_WEIGHT = 1U << 4,
>
> what's PERF_FORMAT_WEIGHT for?

It was used in a earlier iteration, but is now obsolete. I'll remove it.

-Andi

--
[email protected] -- Speaking for myself only