2013-08-10 17:49:03

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: [PATCH 0/7]: Enable 'perf mem' command for Power

[PATCH 0/7]: Enable 'perf mem' command for Power

'perf mem' command enables analyzing the memory operations of an
application. It needs the kernel to export the memory hierarcy
level from which a load instruction was satisfied.

It also needs the Power kernel to make the 'mem-loads' and 'mem-stores'
generic events available in sysfs. While there, we also export the
other Power8 generic events in sysfs.

Thanks to input from Stephane Eranian and Michael Ellerman.

P.S. The patchset builds on several configurations including pmac32_defconfig.
But I am unable to verify the build on few other configs due to a problem
unrleated to this patchset. That is being discussed in a separate thread.
I would like some feedback on this patchset in the meanwhile.

Sukadev Bhattiprolu (7):
powerpc/perf: Rename Power8 macros to start with PME
powerpc/perf: Export Power8 generic events in sysfs
powerpc/perf: Create mem-loads/mem-stores events for Power8
powerpc/perf: Create mem-loads/mem-stores events for Power7
powerpc/perf: Define big-endian version of perf_mem_data_src
powerpc/perf: Export Power8 memory hierarchy info to user space.
powerpc/perf: Export Power7 memory hierarchy info to user space.

arch/powerpc/include/asm/perf_event_server.h | 2 +
arch/powerpc/perf/core-book3s.c | 11 +++
arch/powerpc/perf/power7-pmu.c | 81 ++++++++++++++++++++
arch/powerpc/perf/power8-pmu.c | 106 +++++++++++++++++++++++---
include/uapi/linux/perf_event.h | 55 +++++++++++++
5 files changed, 243 insertions(+), 12 deletions(-)


2013-08-10 17:49:59

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: [PATCH 1/7] powerpc/perf: Rename Power8 macros to start with PME

[PATCH 1/7] powerpc/perf: Rename Power8 macros to start with PME

We use helpers like GENERIC_EVENT_ATTR() to list the generic events in
sysfs. To avoid name collisions, GENERIC_EVENT_ATTR() requires the perf
event macros to start with PME.

Cc: Paul Mckerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
---
arch/powerpc/perf/power8-pmu.c | 24 ++++++++++++------------
1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 96a64d6..30c6b12 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -18,12 +18,12 @@
/*
* Some power8 event codes.
*/
-#define PM_CYC 0x0001e
-#define PM_GCT_NOSLOT_CYC 0x100f8
-#define PM_CMPLU_STALL 0x4000a
-#define PM_INST_CMPL 0x00002
-#define PM_BRU_FIN 0x10068
-#define PM_BR_MPRED_CMPL 0x400f6
+#define PME_PM_CYC 0x0001e
+#define PME_PM_GCT_NOSLOT_CYC 0x100f8
+#define PME_PM_CMPLU_STALL 0x4000a
+#define PME_PM_INST_CMPL 0x00002
+#define PME_PM_BRU_FIN 0x10068
+#define PME_PM_BR_MPRED_CMPL 0x400f6


/*
@@ -550,12 +550,12 @@ static const struct attribute_group *power8_pmu_attr_groups[] = {
};

static int power8_generic_events[] = {
- [PERF_COUNT_HW_CPU_CYCLES] = PM_CYC,
- [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = PM_GCT_NOSLOT_CYC,
- [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = PM_CMPLU_STALL,
- [PERF_COUNT_HW_INSTRUCTIONS] = PM_INST_CMPL,
- [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BRU_FIN,
- [PERF_COUNT_HW_BRANCH_MISSES] = PM_BR_MPRED_CMPL,
+ [PERF_COUNT_HW_CPU_CYCLES] = PME_PM_CYC,
+ [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = PME_PM_GCT_NOSLOT_CYC,
+ [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = PME_PM_CMPLU_STALL,
+ [PERF_COUNT_HW_INSTRUCTIONS] = PME_PM_INST_CMPL,
+ [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PME_PM_BRU_FIN,
+ [PERF_COUNT_HW_BRANCH_MISSES] = PME_PM_BR_MPRED_CMPL,
};

static u64 power8_bhrb_filter_map(u64 branch_sample_type)
--
1.7.1

2013-08-10 17:50:39

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: [PATCH 2/7] powerpc/perf: Export Power8 generic events in sysfs

[PATCH 2/7] powerpc/perf: Export Power8 generic events in sysfs

Export existing Power8 generic events in sysfs.

Cc: Paul Mckerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
---
arch/powerpc/perf/power8-pmu.c | 23 +++++++++++++++++++++++
1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 30c6b12..ff98fb8 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -510,6 +510,28 @@ static void power8_disable_pmc(unsigned int pmc, unsigned long mmcr[])
mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SHIFT(pmc + 1));
}

+GENERIC_EVENT_ATTR(cpu-cyles, PM_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-frontend, PM_GCT_NOSLOT_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-backend, PM_CMPLU_STALL);
+GENERIC_EVENT_ATTR(instructions, PM_INST_CMPL);
+GENERIC_EVENT_ATTR(branch-instructions, PM_BRU_FIN);
+GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED_CMPL);
+
+static struct attribute *power8_events_attr[] = {
+ GENERIC_EVENT_PTR(PM_CYC),
+ GENERIC_EVENT_PTR(PM_GCT_NOSLOT_CYC),
+ GENERIC_EVENT_PTR(PM_CMPLU_STALL),
+ GENERIC_EVENT_PTR(PM_INST_CMPL),
+ GENERIC_EVENT_PTR(PM_BRU_FIN),
+ GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
+ NULL
+};
+
+static struct attribute_group power8_pmu_events_group = {
+ .name = "events",
+ .attrs = power8_events_attr,
+};
+
PMU_FORMAT_ATTR(event, "config:0-49");
PMU_FORMAT_ATTR(pmcxsel, "config:0-7");
PMU_FORMAT_ATTR(mark, "config:8");
@@ -546,6 +568,7 @@ struct attribute_group power8_pmu_format_group = {

static const struct attribute_group *power8_pmu_attr_groups[] = {
&power8_pmu_format_group,
+ &power8_pmu_events_group,
NULL,
};

--
1.7.1

2013-08-10 17:51:14

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: [PATCH 3/7] powerpc/perf: Create mem-loads/mem-stores events for Power8


[PATCH 3/7] powerpc/perf: Create mem-loads/mem-stores events for Power8

'perf mem' command depends on the support for generic hardware events
'mem-loads' and 'mem-stores'.

Create those events for Power8 and map them both to the event PM_MRK_GRP_CMPL.
While PM_MRK_GRP_CMPL is strictly not restricted to loads and stores, that
seems to be a close/resonable match.

Cc: Stephane Eranian <[email protected]>
Cc: Paul Mckerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
---
arch/powerpc/perf/power8-pmu.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index ff98fb8..0a7b632 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -24,6 +24,8 @@
#define PME_PM_INST_CMPL 0x00002
#define PME_PM_BRU_FIN 0x10068
#define PME_PM_BR_MPRED_CMPL 0x400f6
+#define PME_PM_MEM_LOADS 0x40130
+#define PME_PM_MEM_STORES 0x40130


/*
@@ -516,6 +518,8 @@ GENERIC_EVENT_ATTR(stalled-cycles-backend, PM_CMPLU_STALL);
GENERIC_EVENT_ATTR(instructions, PM_INST_CMPL);
GENERIC_EVENT_ATTR(branch-instructions, PM_BRU_FIN);
GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED_CMPL);
+GENERIC_EVENT_ATTR(mem-loads, PM_MEM_LOADS);
+GENERIC_EVENT_ATTR(mem-stores, PM_MEM_STORES);

static struct attribute *power8_events_attr[] = {
GENERIC_EVENT_PTR(PM_CYC),
@@ -524,6 +528,8 @@ static struct attribute *power8_events_attr[] = {
GENERIC_EVENT_PTR(PM_INST_CMPL),
GENERIC_EVENT_PTR(PM_BRU_FIN),
GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
+ GENERIC_EVENT_PTR(PM_MEM_LOADS),
+ GENERIC_EVENT_PTR(PM_MEM_STORES),
NULL
};

--
1.7.1

2013-08-10 17:51:52

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: [PATCH 4/7] powerpc/perf: Create mem-loads/mem-stores events for Power7

[PATCH 4/7] powerpc/perf: Create mem-loads/mem-stores events for Power7

'perf mem' command depends on the support for generic hardware events
'mem-loads' and 'mem-stores'.

Create those events for Power7 and map them both to the event PM_MRK_GRP_CMPL.
While PM_MRK_GRP_CMPL is strictly not restricted to loads and stores, that
seems to be a close/resonable match.

Cc: Stephane Eranian <[email protected]>
Cc: Paul Mckerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
---
arch/powerpc/perf/power7-pmu.c | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 56c67bc..161861d 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -58,6 +58,18 @@

enum {
#include "power7-events-list.h"
+ /*
+ * Normally, generic events like 'cycles' are aliases for a real
+ * event PM_CYC. So we want both the event and the alias listed in
+ * sysfs. But mem-loads and mem-stores are just aliases - they are
+ * not listed in power7_generic_events[] for instance. Adding them
+ * to power7-events-list.h will unnecessariliy create PM_MEM_LOADS
+ * and PM_MEM_STORES events in sysfs when that file is processed
+ * again below. To save a couple of sysfs entries, define these
+ * separately.
+ */
+ EVENT(PM_MEM_LOADS, 0x40030)
+ EVENT(PM_MEM_STORES, 0x40030)
};
#undef EVENT

@@ -382,6 +394,8 @@ GENERIC_EVENT_ATTR(cache-references, PM_LD_REF_L1);
GENERIC_EVENT_ATTR(cache-misses, PM_LD_MISS_L1);
GENERIC_EVENT_ATTR(branch-instructions, PM_BRU_FIN);
GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED);
+GENERIC_EVENT_ATTR(mem-loads, PM_MEM_LOADS);
+GENERIC_EVENT_ATTR(mem-stores, PM_MEM_STORES);

#define EVENT(_name, _code) POWER_EVENT_ATTR(_name, _name);
#include "power7-events-list.h"
@@ -398,6 +412,8 @@ static struct attribute *power7_events_attr[] = {
GENERIC_EVENT_PTR(PM_LD_MISS_L1),
GENERIC_EVENT_PTR(PM_BRU_FIN),
GENERIC_EVENT_PTR(PM_BR_MPRED),
+ GENERIC_EVENT_PTR(PM_MEM_LOADS),
+ GENERIC_EVENT_PTR(PM_MEM_STORES),

#include "power7-events-list.h"
#undef EVENT
--
1.7.1

2013-08-10 17:52:23

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: [PATCH 5/7] powerpc/perf: Define big-endian version of perf_mem_data_src


[PATCH 5/7] powerpc/perf: Define big-endian version of perf_mem_data_src

perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms, we also need a big-endian represenation of perf_mem_data_src.

Cc: Stephane Eranian <[email protected]>
Cc: Paul Mckerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
---

Thanks to input from Stephane Eranian and Michael Ellerman.

include/uapi/linux/perf_event.h | 55 +++++++++++++++++++++++++++++++++++++++
1 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 62c25a2..8497c51 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -19,6 +19,47 @@
#include <asm/byteorder.h>

/*
+ * Kernel and userspace check for endianness in incompatible ways.
+ * In user space, <endian.h> defines both __BIG_ENDIAN and __LITTLE_ENDIAN
+ * but sets __BYTE_ORDER to one or the other. So user space uses checks are:
+ *
+ * #if __BYTE_ORDER == __LITTLE_ENDIAN
+ *
+ * In the kernel, __BYTE_ORDER is undefined, so using the above check doesn't
+ * work. Further, kernel code assumes that exactly one of __BIG_ENDIAN and
+ * __LITTLE_ENDIAN is defined. So the kernel checks are like:
+ *
+ * #if defined(__LITTLE_ENDIAN)
+ *
+ * But we can't use that check in user space since __LITTLE_ENDIAN (and
+ * __BIG_ENDIAN) are always defined.
+ *
+ * Since some perf data structures depend on endianness _and_ are shared
+ * between kernel and user, perf needs its own notion of endian macros (at
+ * least until user and kernel endian checks converge).
+ */
+#define __PERF_LE 1234
+#define __PERF_BE 4321
+
+#if defined(__KERNEL__)
+
+#if defined(__LITTLE_ENDIAN)
+#define __PERF_BYTE_ORDER __PERF_LE
+#elif defined(__BIG_ENDIAN)
+#define __PERF_BYTE_ORDER __PERF_BE
+#endif
+
+#else /* __KERNEL__ */
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define __PERF_BYTE_ORDER __PERF_LE
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define __PERF_BYTE_ORDER __PERF_BE
+#endif
+
+#endif /* __KERNEL__ */
+
+/*
* User-space ABI bits:
*/

@@ -659,6 +700,7 @@ enum perf_callchain_context {
#define PERF_FLAG_FD_OUTPUT (1U << 1)
#define PERF_FLAG_PID_CGROUP (1U << 2) /* pid=cgroup id, per-cpu mode only */

+#if __PERF_BYTE_ORDER == __PERF_LE
union perf_mem_data_src {
__u64 val;
struct {
@@ -670,6 +712,19 @@ union perf_mem_data_src {
mem_rsvd:31;
};
};
+#elif __PERF_BYTE_ORDER == __PERF_BE
+union perf_mem_data_src {
+ __u64 val;
+ struct {
+ __u64 mem_rsvd:31,
+ mem_dtlb:7, /* tlb access */
+ mem_lock:2, /* lock instr */
+ mem_snoop:5, /* snoop mode */
+ mem_lvl:14, /* memory hierarchy level */
+ mem_op:5; /* type of opcode */
+ };
+};
+#endif

/* type of opcode (load/store/prefetch,code) */
#define PERF_MEM_OP_NA 0x01 /* not available */
--
1.7.1

2013-08-10 17:52:59

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: [PATCH 6/7] powerpc/perf: Export Power8 memory hierarchy info to user space


[PATCH 6/7] powerpc/perf: Export Power8 memory hierarchy info to user space.

On Power8, the LDST field in SIER identifies the memory hierarchy level
(eg: L1, L2 etc), from which a data-cache miss for a marked instruction
was satisfied.

Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Fortunately, the memory hierarchy levels in Power8 map fairly easily
into the arch-neutral levels as described by the ldst_src_map[] table.

This hierarchy level information can be used with 'perf record --data'
or 'perf mem' command to analyze application behavior.

Usage:

$ perf mem record <application>
$ perf mem report

OR

$ perf record --data <application>
$ perf report -D

Sample records contain a 'data_src' field which encodes the memory
hierarchy level: Eg: data_src 0x442 indicates MEM_OP_LOAD, MEM_LVL_HIT,
MEM_LVL_L2 (i.e load hit L2).

Cc: Stephane Eranian <[email protected]>
Cc: Paul Mckerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
---
arch/powerpc/include/asm/perf_event_server.h | 2 +
arch/powerpc/perf/core-book3s.c | 11 +++++
arch/powerpc/perf/power8-pmu.c | 53 ++++++++++++++++++++++++++
3 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index cc5f45b..2252798 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -37,6 +37,8 @@ struct power_pmu {
void (*config_bhrb)(u64 pmu_bhrb_filter);
void (*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
int (*limited_pmc_event)(u64 event_id);
+ void (*get_mem_data_src)(union perf_mem_data_src *dsrc,
+ struct pt_regs *regs);
u32 flags;
const struct attribute_group **attr_groups;
int n_generic;
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index a3985ae..e61fd05 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1693,6 +1693,13 @@ ssize_t power_events_sysfs_show(struct device *dev,
return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
}

+static inline void power_get_mem_data_src(union perf_mem_data_src *dsrc,
+ struct pt_regs *regs)
+{
+ if (ppmu->get_mem_data_src)
+ ppmu->get_mem_data_src(dsrc, regs);
+}
+
struct pmu power_pmu = {
.pmu_enable = power_pmu_enable,
.pmu_disable = power_pmu_disable,
@@ -1774,6 +1781,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
data.br_stack = &cpuhw->bhrb_stack;
}

+ if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
+ ppmu->get_mem_data_src)
+ ppmu->get_mem_data_src(&data.data_src, regs);
+
if (perf_event_overflow(event, &data, regs))
power_pmu_stop(event, 0);
}
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 0a7b632..2aaae63 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -538,6 +538,58 @@ static struct attribute_group power8_pmu_events_group = {
.attrs = power8_events_attr,
};

+#define POWER8_SIER_TYPE_SHIFT 15
+#define POWER8_SIER_TYPE_MASK (0x7LL << POWER8_SIER_TYPE_SHIFT)
+
+#define POWER8_SIER_LDST_SHIFT 1
+#define POWER8_SIER_LDST_MASK (0x7LL << POWER8_SIER_LDST_SHIFT)
+
+#define P(a, b) PERF_MEM_S(a, b)
+#define PLH(a, b) (P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+#define PSM(a, b) (P(OP, STORE) | P(LVL, MISS) | P(a, b))
+
+/*
+ * Power8 interpretations:
+ * REM_CCE1: 1-hop indicates L2/L3 cache of a different core on same chip
+ * REM_CCE2: 2-hop indicates different chip or different node.
+ */
+static u64 ldst_src_map[] = {
+ /* 000 */ P(LVL, NA),
+
+ /* 001 */ PLH(LVL, L1),
+ /* 010 */ PLH(LVL, L2),
+ /* 011 */ PLH(LVL, L3),
+ /* 100 */ PLH(LVL, LOC_RAM),
+ /* 101 */ PLH(LVL, REM_CCE1),
+ /* 110 */ PLH(LVL, REM_CCE2),
+
+ /* 111 */ PSM(LVL, L1),
+};
+
+static inline bool is_load_store_inst(u64 sier)
+{
+ u64 val;
+ val = (sier & POWER8_SIER_TYPE_MASK) >> POWER8_SIER_TYPE_SHIFT;
+
+ /* 1 = load, 2 = store */
+ return val == 1 || val == 2;
+}
+
+static void power8_get_mem_data_src(union perf_mem_data_src *dsrc,
+ struct pt_regs *regs)
+{
+ u64 idx;
+ u64 sier;
+
+ sier = mfspr(SPRN_SIER);
+
+ if (is_load_store_inst(sier)) {
+ idx = (sier & POWER8_SIER_LDST_MASK) >> POWER8_SIER_LDST_SHIFT;
+
+ dsrc->val |= ldst_src_map[idx];
+ }
+}
+
PMU_FORMAT_ATTR(event, "config:0-49");
PMU_FORMAT_ATTR(pmcxsel, "config:0-7");
PMU_FORMAT_ATTR(mark, "config:8");
@@ -641,6 +693,7 @@ static struct power_pmu power8_pmu = {
.get_constraint = power8_get_constraint,
.get_alternatives = power8_get_alternatives,
.disable_pmc = power8_disable_pmc,
+ .get_mem_data_src = power8_get_mem_data_src,
.flags = PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB | PPMU_EBB,
.n_generic = ARRAY_SIZE(power8_generic_events),
.generic_events = power8_generic_events,
--
1.7.1

2013-08-10 17:53:28

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: [PATCH 7/7] powerpc/perf: Export Power7 memory hierarchy info to user space


[PATCH 7/7] powerpc/perf: Export Power7 memory hierarchy info to user space.

On Power7, the DCACHE_SRC field in MMCRA register identify the memory
hierarchy level (eg: L1, L2 etc) from which a data-cache miss for a
marked instruction was satisfied.

Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Some memory hierarchy levels in Power7 don't map into the arch-neutral
levels. However, since newer generation of the processor (i.e. Power8) uses
fewer levels than in Power7, we don't really need to define new hierarchy
levels just for Power7.

We instead, map as many levels as possible and approximate the rest. See
comments near dcache-src_map[] in the patch.

The hierarchy level information could be used with 'perf record --data'
or 'perf mem' command to analyze application behavior.

Usage:

perf mem record <application>
perf mem report

OR

perf record --data <application>
perf report -D

Sample records contain a 'data_src' field which encodes the memory
hierarchy level: Eg: data_src 0x442 indicates MEM_OP_LOAD, MEM_LVL_HIT,
MEM_LVL_L2 (i.e load hit L2).

Cc: Stephane Eranian <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
---
Thanks to input from Stephane Eranian and Michael Ellerman.

Changelog[v3]:
[Michael Ellerman] If newer levels that we defined in [v2] are not
needed for Power8, ignore the new levels for Power7 also, and
approximate them.
Separate the TLB level mapping to a separate patchset.

Changelog[v2]:
[Stephane Eranian] Define new levels rather than ORing the L2 and L3
with REM_CCE1 and REM_CCE2.
[Stephane Eranian] allocate a bit PERF_MEM_XLVL_NA for architectures
that don't use the ->mem_xlvl field.
Insert the TLB patch ahead so the new TLB bits are contigous with
existing TLB bits.

arch/powerpc/perf/power7-pmu.c | 65 ++++++++++++++++++++++++++++++++++++++++
1 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 161861d..f8143d6 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -329,6 +329,70 @@ static void power7_disable_pmc(unsigned int pmc, unsigned long mmcr[])
mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SH(pmc));
}

+#define POWER7_MMCRA_DCACHE_MISS (0x1LL << 55)
+#define POWER7_MMCRA_DCACHE_SRC_SHIFT 51
+#define POWER7_MMCRA_DCACHE_SRC_MASK (0xFLL << POWER7_MMCRA_DCACHE_SRC_SHIFT)
+
+#define P(a, b) PERF_MEM_S(a, b)
+#define PLH(a, b) (P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+/*
+ * Map the Power7 DCACHE_SRC field (bits 9..12) in MMCRA register to the
+ * architecture-neutral memory hierarchy levels. For the levels in Power7
+ * that don't map to the arch-neutral levels, approximate to nearest
+ * level.
+ *
+ * 1-hop: indicates another core on the same chip (2.1 and 3.1 levels).
+ * 2-hops: indicates a different chip on same or different node (remote
+ * and distant levels).
+ *
+ * For consistency with this interpretation of the hops, we dont use
+ * the REM_RAM1 level below.
+ *
+ * The *SHR and *MOD states of the cache are ignored/not exported to user.
+ *
+ * ### Levels marked with ### in comments below are approximated
+ */
+static u64 dcache_src_map[] = {
+ PLH(LVL, L2), /* 00: FROM_L2 */
+ PLH(LVL, L3), /* 01: FROM_L3 */
+
+ P(LVL, NA), /* 02: Reserved */
+ P(LVL, NA), /* 03: Reserved */
+
+ PLH(LVL, REM_CCE1), /* 04: FROM_L2.1_SHR ### */
+ PLH(LVL, REM_CCE1), /* 05: FROM_L2.1_MOD ### */
+
+ PLH(LVL, REM_CCE1), /* 06: FROM_L3.1_SHR ### */
+ PLH(LVL, REM_CCE1), /* 07: FROM_L3.1_MOD ### */
+
+ PLH(LVL, REM_CCE2), /* 08: FROM_RL2L3_SHR ### */
+ PLH(LVL, REM_CCE2), /* 09: FROM_RL2L3_MOD ### */
+
+ PLH(LVL, REM_CCE2), /* 10: FROM_DL2L3_SHR ### */
+ PLH(LVL, REM_CCE2), /* 11: FROM_DL2L3_MOD ### */
+
+ PLH(LVL, LOC_RAM), /* 12: FROM_LMEM */
+ PLH(LVL, REM_RAM2), /* 13: FROM_RMEM ### */
+ PLH(LVL, REM_RAM2), /* 14: FROM_DMEM */
+
+ P(LVL, NA), /* 15: Reserved */
+};
+
+static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
+ struct pt_regs *regs)
+{
+ u64 idx;
+ u64 mmcra = regs->dsisr;
+
+ if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
+ idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
+ idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
+
+ dsrc->val |= dcache_src_map[idx];
+ }
+}
+
+
static int power7_generic_events[] = {
[PERF_COUNT_HW_CPU_CYCLES] = PME_PM_CYC,
[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = PME_PM_GCT_NOSLOT_CYC,
@@ -453,6 +517,7 @@ static struct power_pmu power7_pmu = {
.get_constraint = power7_get_constraint,
.get_alternatives = power7_get_alternatives,
.disable_pmc = power7_disable_pmc,
+ .get_mem_data_src = power7_get_mem_data_src,
.flags = PPMU_ALT_SIPR,
.attr_groups = power7_pmu_attr_groups,
.n_generic = ARRAY_SIZE(power7_generic_events),
--
1.7.1

2013-08-11 02:33:56

by Vince Weaver

[permalink] [raw]
Subject: Re: [PATCH 5/7] powerpc/perf: Define big-endian version of perf_mem_data_src

On Sat, 10 Aug 2013, Sukadev Bhattiprolu wrote:

>
> include/uapi/linux/perf_event.h | 55 +++++++++++++++++++++++++++++++++++++++
> 1 files changed, 55 insertions(+), 0 deletions(-)

> +#define __PERF_LE 1234
> +#define __PERF_BE 4321
> +
> +#if defined(__KERNEL__)

I could be wrong, but I thought files under uapi weren't supposed to
contain __KERNEL__ code. Wasn't that the whole point of uapi?

Also having the perf_event interface depend on endianess just seems like a
complicated mess. Can't we just declare the interface to be a certain
endianess and have the kernel byte-swap as necessary?

Vince

2013-08-11 17:16:09

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH 5/7] powerpc/perf: Define big-endian version of perf_mem_data_src

Vince Weaver [[email protected]] wrote:
| On Sat, 10 Aug 2013, Sukadev Bhattiprolu wrote:
|
| >
| > include/uapi/linux/perf_event.h | 55 +++++++++++++++++++++++++++++++++++++++
| > 1 files changed, 55 insertions(+), 0 deletions(-)
|
| > +#define __PERF_LE 1234
| > +#define __PERF_BE 4321
| > +
| > +#if defined(__KERNEL__)
|
| I could be wrong, but I thought files under uapi weren't supposed to
| contain __KERNEL__ code. Wasn't that the whole point of uapi?
|
| Also having the perf_event interface depend on endianess just seems like a
| complicated mess. Can't we just declare the interface to be a certain
| endianess and have the kernel byte-swap as necessary?

Except for the __KERNEL__ check, it looked like this approach would keep
the kernel and user code same. Would it complicate user space ?

I tried to avoid the __KERNEL__ check hack, but like I tried to explain
in the patch, user space and kernel do the endian check differently.
And, there are about ~300 sites in the kernel with __*ENDIAN checks

Sukadev

2013-08-11 23:58:07

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 5/7] powerpc/perf: Define big-endian version of perf_mem_data_src

On Sat, Aug 10, 2013 at 10:34:58PM -0400, Vince Weaver wrote:
> On Sat, 10 Aug 2013, Sukadev Bhattiprolu wrote:
>
> >
> > include/uapi/linux/perf_event.h | 55 +++++++++++++++++++++++++++++++++++++++
> > 1 files changed, 55 insertions(+), 0 deletions(-)
>
> > +#define __PERF_LE 1234
> > +#define __PERF_BE 4321
> > +
> > +#if defined(__KERNEL__)
>
> I could be wrong, but I thought files under uapi weren't supposed to
> contain __KERNEL__ code. Wasn't that the whole point of uapi?

Yes.

> Also having the perf_event interface depend on endianess just seems like a
> complicated mess. Can't we just declare the interface to be a certain
> endianess and have the kernel byte-swap as necessary?

Yes I think so. The interface is already defined and it's little endian,
so on big endian we just need to swap.

The only part I'm not clear on is how things are handled in perf
userspace, it seems to already do some byte swapping.

cheers

2013-08-12 03:18:52

by Vince Weaver

[permalink] [raw]
Subject: Re: [PATCH 5/7] powerpc/perf: Define big-endian version of perf_mem_data_src

On Mon, 12 Aug 2013, Michael Ellerman wrote:
>
> Yes I think so. The interface is already defined and it's little endian,
> so on big endian we just need to swap.
>
> The only part I'm not clear on is how things are handled in perf
> userspace, it seems to already do some byte swapping.

It would be nice to clarify this.

"struct perf_branch_entry" also has bitfields like this, though to make
things more confusing that structure isn't exported via the uapi header
so it's not clear how userspace code is supposed to interpret the values.

As you say it gets complicated with perf userspace, especially in cases
where you record the data on big-endian but then try to analyze the
results on a little-endian machine.

It would be nice to get confirmation that these bitfields will always be
little-endian. I guess they currently are by definition because only
x86/pebs sets data.data_src.val so far?

Vince