Power8/Power9 Perforence Monitoring Unit (PMU) supports
different sampling modes (SM) such as Random Instruction
Sampling (RIS), Random Load/Store Facility Sampling (RLS)
and Random Branch Sampling (RBS). Sample mode RLS updates
Sampled Instruction Event Register [SIER] bits with memory
hierarchy information for a cache reload. Patchset exports
the hierarchy information to the user via the perf_mem_data_src
object from SIER.
Patchset is a rebase of the work posted previously with minor
updates to it.
https://lkml.org/lkml/2015/6/11/92
Changelog v1:
- Fixed author-ship for the first patch and added suka's "Signed-off-by:".
Madhavan Srinivasan (5):
powerpc/perf: Export memory hierarchy info to user space
powerpc/perf: Support to export MMCRA[TEC*] field to userspace
powerpc/perf: Support to export SIERs bit in Power8
powerpc/perf: Support to export SIERs bit in Power9
powerpc/perf: Add Power8 mem_access event to sysfs
Sukadev Bhattiprolu (1):
powerpc/perf: Define big-endian version of perf_mem_data_src
arch/powerpc/include/asm/perf_event_server.h | 3 +
arch/powerpc/perf/core-book3s.c | 8 +++
arch/powerpc/perf/isa207-common.c | 86 ++++++++++++++++++++++++++++
arch/powerpc/perf/isa207-common.h | 26 ++++++++-
arch/powerpc/perf/power8-events-list.h | 6 ++
arch/powerpc/perf/power8-pmu.c | 4 ++
arch/powerpc/perf/power9-pmu.c | 2 +
include/uapi/linux/perf_event.h | 16 ++++++
tools/include/uapi/linux/perf_event.h | 16 ++++++
9 files changed, 166 insertions(+), 1 deletion(-)
--
2.7.4
Patch to export SIER bits to userspace via
perf_mem_data_src and perf_sample_data struct.
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Andrew Donnellan <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/power9-pmu.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 7f6582708e06..018f8e90ac35 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -427,6 +427,8 @@ static struct power_pmu power9_pmu = {
.bhrb_filter_map = power9_bhrb_filter_map,
.get_constraint = isa207_get_constraint,
.get_alternatives = power9_get_alternatives,
+ .get_mem_data_src = isa207_get_mem_data_src,
+ .get_mem_weight = isa207_get_mem_weight,
.disable_pmc = isa207_disable_pmc,
.flags = PPMU_HAS_SIER | PPMU_ARCH_207S,
.n_generic = ARRAY_SIZE(power9_generic_events),
--
2.7.4
From: Sukadev Bhattiprolu <[email protected]>
perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms, we also need a big-endian represenation of perf_mem_data_src.
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
include/uapi/linux/perf_event.h | 16 ++++++++++++++++
tools/include/uapi/linux/perf_event.h | 16 ++++++++++++++++
2 files changed, 32 insertions(+)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485a24ac..c4af1159a200 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -891,6 +891,7 @@ enum perf_callchain_context {
#define PERF_FLAG_PID_CGROUP (1UL << 2) /* pid=cgroup id, per-cpu mode only */
#define PERF_FLAG_FD_CLOEXEC (1UL << 3) /* O_CLOEXEC */
+#if defined(__LITTLE_ENDIAN_BITFIELD)
union perf_mem_data_src {
__u64 val;
struct {
@@ -902,6 +903,21 @@ union perf_mem_data_src {
mem_rsvd:31;
};
};
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+ __u64 val;
+ struct {
+ __u64 mem_rsvd:31,
+ mem_dtlb:7, /* tlb access */
+ mem_lock:2, /* lock instr */
+ mem_snoop:5, /* snoop mode */
+ mem_lvl:14, /* memory hierarchy level */
+ mem_op:5; /* type of opcode */
+ };
+};
+#else
+#error "Unknown endianness"
+#endif
/* type of opcode (load/store/prefetch,code) */
#define PERF_MEM_OP_NA 0x01 /* not available */
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index c66a485a24ac..c4af1159a200 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -891,6 +891,7 @@ enum perf_callchain_context {
#define PERF_FLAG_PID_CGROUP (1UL << 2) /* pid=cgroup id, per-cpu mode only */
#define PERF_FLAG_FD_CLOEXEC (1UL << 3) /* O_CLOEXEC */
+#if defined(__LITTLE_ENDIAN_BITFIELD)
union perf_mem_data_src {
__u64 val;
struct {
@@ -902,6 +903,21 @@ union perf_mem_data_src {
mem_rsvd:31;
};
};
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+ __u64 val;
+ struct {
+ __u64 mem_rsvd:31,
+ mem_dtlb:7, /* tlb access */
+ mem_lock:2, /* lock instr */
+ mem_snoop:5, /* snoop mode */
+ mem_lvl:14, /* memory hierarchy level */
+ mem_op:5; /* type of opcode */
+ };
+};
+#else
+#error "Unknown endianness"
+#endif
/* type of opcode (load/store/prefetch,code) */
#define PERF_MEM_OP_NA 0x01 /* not available */
--
2.7.4
The LDST field and DATA_SRC in SIER identifies the memory hierarchy level
(eg: L1, L2 etc), from which a data-cache miss for a marked instruction
was satisfied. Use the 'perf_mem_data_src' object to export this
hierarchy level to user space.
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Daniel Axtens <[email protected]>
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/include/asm/perf_event_server.h | 2 +
arch/powerpc/perf/core-book3s.c | 4 ++
arch/powerpc/perf/isa207-common.c | 78 ++++++++++++++++++++++++++++
arch/powerpc/perf/isa207-common.h | 16 +++++-
4 files changed, 99 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index ae0a23091a9b..446cdcd9b7f5 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -38,6 +38,8 @@ struct power_pmu {
unsigned long *valp);
int (*get_alternatives)(u64 event_id, unsigned int flags,
u64 alt[]);
+ void (*get_mem_data_src)(union perf_mem_data_src *dsrc,
+ u32 flags, struct pt_regs *regs);
u64 (*bhrb_filter_map)(u64 branch_sample_type);
void (*config_bhrb)(u64 pmu_bhrb_filter);
void (*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 595dd718ea87..d644c5ab4d2f 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2047,6 +2047,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
data.br_stack = &cpuhw->bhrb_stack;
}
+ if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
+ ppmu->get_mem_data_src)
+ ppmu->get_mem_data_src(&data.data_src, ppmu->flags, regs);
+
if (perf_event_overflow(event, &data, regs))
power_pmu_stop(event, 0);
}
diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index e79fb5fb817d..08bb62454a2e 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -119,6 +119,84 @@ static bool is_thresh_cmp_valid(u64 event)
return true;
}
+static inline u64 isa207_find_source(u64 idx, u32 sub_idx)
+{
+ u64 ret = 0;
+
+ switch(idx) {
+ case 0:
+ ret = P(LVL, NA);
+ break;
+ case 1:
+ ret = PLH(LVL, L1);
+ break;
+ case 2:
+ ret = PLH(LVL, L2);
+ break;
+ case 3:
+ ret = PLH(LVL, L3);
+ break;
+ case 4:
+ if (sub_idx <= 1)
+ ret = PLH(LVL, LOC_RAM);
+ else if (sub_idx > 1 && sub_idx <= 2)
+ ret = PLH(LVL, REM_RAM1);
+ else
+ ret = PLH(LVL, REM_RAM2);
+ ret |= P(SNOOP, HIT);
+ break;
+ case 5:
+ if ((sub_idx == 0) || (sub_idx == 2) || (sub_idx == 4))
+ ret = (PLH(LVL, REM_CCE1) | P(SNOOP, HIT));
+ else if ((sub_idx == 1) || (sub_idx == 3) || (sub_idx == 5))
+ ret = (PLH(LVL, REM_CCE1) | P(SNOOP, HITM));
+ break;
+ case 6:
+ if ((sub_idx == 0) || (sub_idx == 2))
+ ret = (PLH(LVL, REM_CCE2) | P(SNOOP, HIT));
+ else if ((sub_idx == 1) || (sub_idx == 3))
+ ret = (PLH(LVL, REM_CCE2) | P(SNOOP, HITM));
+ break;
+ case 7:
+ ret = PSM(LVL, L1);
+ break;
+ }
+
+ return ret;
+}
+
+static inline bool is_load_store_inst(u64 sier)
+{
+ u64 val;
+ val = (sier & ISA207_SIER_TYPE_MASK) >> ISA207_SIER_TYPE_SHIFT;
+
+ /* 1 = load, 2 = store */
+ return val == 1 || val == 2;
+}
+
+void isa207_get_mem_data_src(union perf_mem_data_src *dsrc, u32 flags,
+ struct pt_regs *regs)
+{
+ u64 idx;
+ u32 sub_idx;
+ u64 sier;
+
+ /* Skip if no SIER support */
+ if (!(flags & PPMU_HAS_SIER)) {
+ dsrc->val = 0;
+ return;
+ }
+
+ sier = mfspr(SPRN_SIER);
+ if (is_load_store_inst(sier)) {
+ idx = (sier & ISA207_SIER_LDST_MASK) >> ISA207_SIER_LDST_SHIFT;
+ sub_idx = (sier & ISA207_SIER_DATA_SRC_MASK) >> ISA207_SIER_DATA_SRC_SHIFT;
+
+ dsrc->val = isa207_find_source(idx, sub_idx);
+ }
+}
+
+
int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
{
unsigned int unit, pmc, cache, ebb;
diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
index cf9bd8990159..982542cce991 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -259,6 +259,19 @@
#define MAX_ALT 2
#define MAX_PMU_COUNTERS 6
+#define ISA207_SIER_TYPE_SHIFT 15
+#define ISA207_SIER_TYPE_MASK (0x7ull << ISA207_SIER_TYPE_SHIFT)
+
+#define ISA207_SIER_LDST_SHIFT 1
+#define ISA207_SIER_LDST_MASK (0x7ull << ISA207_SIER_LDST_SHIFT)
+
+#define ISA207_SIER_DATA_SRC_SHIFT 53
+#define ISA207_SIER_DATA_SRC_MASK (0x7ull << ISA207_SIER_DATA_SRC_SHIFT)
+
+#define P(a, b) PERF_MEM_S(a, b)
+#define PLH(a, b) (P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+#define PSM(a, b) (P(OP, STORE) | P(LVL, MISS) | P(a, b))
+
int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp);
int isa207_compute_mmcr(u64 event[], int n_ev,
unsigned int hwc[], unsigned long mmcr[],
@@ -266,6 +279,7 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
void isa207_disable_pmc(unsigned int pmc, unsigned long mmcr[]);
int isa207_get_alternatives(u64 event, u64 alt[],
const unsigned int ev_alt[][MAX_ALT], int size);
-
+void isa207_get_mem_data_src(union perf_mem_data_src *dsrc, u32 flags,
+ struct pt_regs *regs);
#endif
--
2.7.4
Threshold feature when used with MMCRA [Threshold Event Counter Event],
MMCRA[Threshold Start event] and MMCRA[Threshold End event] will update
MMCRA[Threashold Event Counter Exponent] and MMCRA[Threshold Event
Counter Multiplier] with the corresponding threshold event count values.
Patch to export MMCRA[TECX/TECM] to userspace in 'weight' field of
struct perf_sample_data.
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Anna-Maria Gleixner <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/include/asm/perf_event_server.h | 1 +
arch/powerpc/perf/core-book3s.c | 4 ++++
arch/powerpc/perf/isa207-common.c | 8 ++++++++
arch/powerpc/perf/isa207-common.h | 10 ++++++++++
4 files changed, 23 insertions(+)
diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index 446cdcd9b7f5..723bf48e7494 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -40,6 +40,7 @@ struct power_pmu {
u64 alt[]);
void (*get_mem_data_src)(union perf_mem_data_src *dsrc,
u32 flags, struct pt_regs *regs);
+ void (*get_mem_weight)(u64 *weight);
u64 (*bhrb_filter_map)(u64 branch_sample_type);
void (*config_bhrb)(u64 pmu_bhrb_filter);
void (*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index d644c5ab4d2f..a6b265e31663 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2051,6 +2051,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
ppmu->get_mem_data_src)
ppmu->get_mem_data_src(&data.data_src, ppmu->flags, regs);
+ if (event->attr.sample_type & PERF_SAMPLE_WEIGHT &&
+ ppmu->get_mem_weight)
+ ppmu->get_mem_weight(&data.weight);
+
if (perf_event_overflow(event, &data, regs))
power_pmu_stop(event, 0);
}
diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index 08bb62454a2e..42e999da934e 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -196,6 +196,14 @@ void isa207_get_mem_data_src(union perf_mem_data_src *dsrc, u32 flags,
}
}
+void isa207_get_mem_weight(u64 *weight)
+{
+ u64 mmcra = mfspr(SPRN_MMCRA);
+ u64 exp = MMCRA_THR_CTR_EXP(mmcra);
+ u64 mantissa = MMCRA_THR_CTR_MANT(mmcra);
+
+ *weight = mantissa << (2 * exp);
+}
int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
{
diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
index 982542cce991..b4d02ae3a6e0 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -247,6 +247,15 @@
#define MMCRA_SDAR_MODE_SHIFT 42
#define MMCRA_SDAR_MODE_TLB (1ull << MMCRA_SDAR_MODE_SHIFT)
#define MMCRA_IFM_SHIFT 30
+#define MMCRA_THR_CTR_MANT_SHIFT 19
+#define MMCRA_THR_CTR_MANT_MASK 0x7Ful
+#define MMCRA_THR_CTR_MANT(v) (((v) >> MMCRA_THR_CTR_MANT_SHIFT) &\
+ MMCRA_THR_CTR_MANT_MASK)
+
+#define MMCRA_THR_CTR_EXP_SHIFT 27
+#define MMCRA_THR_CTR_EXP_MASK 0x7ul
+#define MMCRA_THR_CTR_EXP(v) (((v) >> MMCRA_THR_CTR_EXP_SHIFT) &\
+ MMCRA_THR_CTR_EXP_MASK)
/* MMCR1 Threshold Compare bit constant for power9 */
#define p9_MMCRA_THR_CMP_SHIFT 45
@@ -281,5 +290,6 @@ int isa207_get_alternatives(u64 event, u64 alt[],
const unsigned int ev_alt[][MAX_ALT], int size);
void isa207_get_mem_data_src(union perf_mem_data_src *dsrc, u32 flags,
struct pt_regs *regs);
+void isa207_get_mem_weight(u64 *weight);
#endif
--
2.7.4
On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
> From: Sukadev Bhattiprolu <[email protected]>
>
> perf_mem_data_src is an union that is initialized via the ->val field
> and accessed via the bitmap fields. For this to work on big endian
> platforms, we also need a big-endian represenation of perf_mem_data_src.
Doesn't this break interpreting the data on a different endian machine?
Patch add "mem_access" event to sysfs. This as-is not a raw event
supported by Power8 pmu. Instead, it is formed based on
raw event encoding specificed in isa207-common.h.
Primary PMU event used here is PM_MRK_INST_CMPL.
This event tracks only the completed marked instructions.
Random sampling mode (MMCRA[SM]) with Random Instruction
Sampling (RIS) is enabled to mark type of instructions.
With Random sampling in RLS mode with PM_MRK_INST_CMPL event,
the LDST /DATA_SRC fields in SIER identifies the memory
hierarchy level (eg: L1, L2 etc) statisfied a data-cache
miss for a marked instruction.
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Andrew Donnellan <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/power8-events-list.h | 6 ++++++
arch/powerpc/perf/power8-pmu.c | 2 ++
2 files changed, 8 insertions(+)
diff --git a/arch/powerpc/perf/power8-events-list.h b/arch/powerpc/perf/power8-events-list.h
index 3a2e6e8ebb92..0f1d184627cc 100644
--- a/arch/powerpc/perf/power8-events-list.h
+++ b/arch/powerpc/perf/power8-events-list.h
@@ -89,3 +89,9 @@ EVENT(PM_MRK_FILT_MATCH, 0x2013c)
EVENT(PM_MRK_FILT_MATCH_ALT, 0x3012e)
/* Alternate event code for PM_LD_MISS_L1 */
EVENT(PM_LD_MISS_L1_ALT, 0x400f0)
+/*
+ * Memory Access Event -- mem_access
+ * Primary PMU event used here is PM_MRK_INST_CMPL, along with
+ * Random Load/Store Facility Sampling (RIS) in Random sampling mode (MMCRA[SM]).
+ */
+EVENT(MEM_ACCESS, 0x10401e0)
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 932d7536f0eb..5463516e369b 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -90,6 +90,7 @@ GENERIC_EVENT_ATTR(branch-instructions, PM_BRU_FIN);
GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED_CMPL);
GENERIC_EVENT_ATTR(cache-references, PM_LD_REF_L1);
GENERIC_EVENT_ATTR(cache-misses, PM_LD_MISS_L1);
+GENERIC_EVENT_ATTR(mem_access, MEM_ACCESS);
CACHE_EVENT_ATTR(L1-dcache-load-misses, PM_LD_MISS_L1);
CACHE_EVENT_ATTR(L1-dcache-loads, PM_LD_REF_L1);
@@ -120,6 +121,7 @@ static struct attribute *power8_events_attr[] = {
GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
GENERIC_EVENT_PTR(PM_LD_REF_L1),
GENERIC_EVENT_PTR(PM_LD_MISS_L1),
+ GENERIC_EVENT_PTR(MEM_ACCESS),
CACHE_EVENT_PTR(PM_LD_MISS_L1),
CACHE_EVENT_PTR(PM_LD_REF_L1),
--
2.7.4
Patch to export SIER bits to userspace via
perf_mem_data_src and perf_sample_data struct.
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Andrew Donnellan <[email protected]>
Signed-off-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/power8-pmu.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index ce15b19a7962..932d7536f0eb 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -325,6 +325,8 @@ static struct power_pmu power8_pmu = {
.bhrb_filter_map = power8_bhrb_filter_map,
.get_constraint = isa207_get_constraint,
.get_alternatives = power8_get_alternatives,
+ .get_mem_data_src = isa207_get_mem_data_src,
+ .get_mem_weight = isa207_get_mem_weight,
.disable_pmc = isa207_disable_pmc,
.flags = PPMU_HAS_SIER | PPMU_ARCH_207S,
.n_generic = ARRAY_SIZE(power8_generic_events),
--
2.7.4
From: Peter Zijlstra
> Sent: 06 March 2017 11:22
> To: Madhavan Srinivasan
> Cc: Wang Nan; Alexander Shishkin; [email protected]; Arnaldo Carvalho de Melo; Alexei
> Starovoitov; Ingo Molnar; Stephane Eranian; Sukadev Bhattiprolu; [email protected]
> Subject: Re: [PATCH v2 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src
>
> On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
> > From: Sukadev Bhattiprolu <[email protected]>
> >
> > perf_mem_data_src is an union that is initialized via the ->val field
> > and accessed via the bitmap fields. For this to work on big endian
> > platforms, we also need a big-endian represenation of perf_mem_data_src.
>
> Doesn't this break interpreting the data on a different endian machine?
Best to avoid bitfields if you ever care about the bit order.
David
On Mon, Mar 06, 2017 at 02:59:07PM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 06 March 2017 11:22
> > To: Madhavan Srinivasan
> > Cc: Wang Nan; Alexander Shishkin; [email protected]; Arnaldo Carvalho de Melo; Alexei
> > Starovoitov; Ingo Molnar; Stephane Eranian; Sukadev Bhattiprolu; [email protected]
> > Subject: Re: [PATCH v2 1/6] powerpc/perf: Define big-endian version of perf_mem_data_src
> >
> > On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
> > > From: Sukadev Bhattiprolu <[email protected]>
> > >
> > > perf_mem_data_src is an union that is initialized via the ->val field
> > > and accessed via the bitmap fields. For this to work on big endian
> > > platforms, we also need a big-endian represenation of perf_mem_data_src.
> >
> > Doesn't this break interpreting the data on a different endian machine?
>
> Best to avoid bitfields if you ever care about the bit order.
Too late for that. But perf tool has quite a bit of code to muck fields
between different endians. With the full intent that generation on one
machine is readable by another.
On Tue, Mar 07, 2017 at 03:28:17PM +0530, Madhavan Srinivasan wrote:
>
>
> On Monday 06 March 2017 04:52 PM, Peter Zijlstra wrote:
> >On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
> >>From: Sukadev Bhattiprolu <[email protected]>
> >>
> >>perf_mem_data_src is an union that is initialized via the ->val field
> >>and accessed via the bitmap fields. For this to work on big endian
> >>platforms, we also need a big-endian represenation of perf_mem_data_src.
> >Doesn't this break interpreting the data on a different endian machine?
>
> IIUC, we will need this patch to not to break the interpreting data
> on a different endian machine. Data collected from power8 LE/BE
> guests with this patchset applied. Kindly correct me if I missed
> your question here.
So your patch adds compile time bitfield differences. My worry was that
there was no dynamic conversion routine in the tools (it has for a lot
of other places).
This yields two questions:
- are these two static layouts identical? (seeing that you illustrate
cross-endian things working this seems likely).
- should you not have fixed this in the tool only? This patch
effectively breaks ABI on big-endian architectures.
On Monday 06 March 2017 04:52 PM, Peter Zijlstra wrote:
> On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
>> From: Sukadev Bhattiprolu <[email protected]>
>>
>> perf_mem_data_src is an union that is initialized via the ->val field
>> and accessed via the bitmap fields. For this to work on big endian
>> platforms, we also need a big-endian represenation of perf_mem_data_src.
> Doesn't this break interpreting the data on a different endian machine?
IIUC, we will need this patch to not to break the interpreting data
on a different endian machine. Data collected from power8 LE/BE
guests with this patchset applied. Kindly correct me if I missed
your question here.
With this patchset applied, perf.data from a power8 BigEndian guest:
==============================================================
$ sudo ./perf record -d -e mem_access ls
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.007 MB perf.data (8 samples) ]
$ sudo ./perf report --mem-mode --stdio
# To display the perf.data header info, please use
--header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 8 of event 'mem_access'
# Total weight : 8
# Sort order :
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead Local Weight Memory access Symbol
Shared Object Data Symbol Data
Object Snoop TLB access Locked
# ........ ............ ........................
........................... ................
......................................
.................................. ............ ......................
......
#
25.00% 0 L2 hit [H]
0xc00000000000c910 [unknown] [H] 0xc000000f170e5310
[unknown] N/A N/A No
12.50% 0 L2 hit [k]
.idle_cpu [kernel.vmlinux] [k] __per_cpu_offset+0x68
[kernel.vmlinux].data..read_mostly N/A N/A No
12.50% 0 L2 hit [H]
0xc00000000000ca58 [unknown] [H] 0xc000000f170e5200
[unknown] N/A N/A No
12.50% 0 L3 hit [k]
.copypage_power7 [kernel.vmlinux] [k] 0xc00000002f6fc600
[kernel.vmlinux].bss N/A N/A No
12.50% 0 L3 hit [k]
.copypage_power7 [kernel.vmlinux] [k] 0xc00000003f8b1980
[kernel.vmlinux].bss N/A N/A No
12.50% 0 Local RAM hit [k]
._raw_spin_lock_irqsave [kernel.vmlinux] [k] 0xc000000033b5bdf4
[kernel.vmlinux].bss Miss N/A No
12.50% 0 Remote Cache (1 hop) hit [k]
.perf_iterate_ctx [kernel.vmlinux] [k] 0xc000000000e88648
[kernel.vmlinux] HitM N/A No
perf report from power8 LittleEndian guest (with this patch applied to
perf tool):
==================================================================================
$ ./perf report --mem-mode --stdio -i perf.data.p8be.withpatch
No kallsyms or vmlinux with build-id
ca8a1a9d4b62b2a67ee01050afb1dfa03565a655 was found
/boot/vmlinux with build id ca8a1a9d4b62b2a67ee01050afb1dfa03565a655
not found, continuing without symbols
No kallsyms or vmlinux with build-id
ca8a1a9d4b62b2a67ee01050afb1dfa03565a655 was found
/boot/vmlinux with build id ca8a1a9d4b62b2a67ee01050afb1dfa03565a655
not found, continuing without symbols
# To display the perf.data header info, please use
--header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 8 of event 'mem_access'
# Total weight : 8
# Sort order :
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead Local Weight Memory access Symbol
Shared Object Data Symbol Data Object Snoop TLB
access Locked
# ........ ............ ........................
...................... ................ ......................
................ ............ ...................... ......
#
25.00% 0 L2 hit [H]
0xc00000000000c910 [unknown] [H] 0xc000000f170e5310
[unknown] N/A N/A No
12.50% 0 L2 hit [k]
0xc0000000000f4d0c [kernel.vmlinux] [k] 0xc000000000f2dac8
[kernel.vmlinux] N/A N/A No
12.50% 0 L2 hit [H]
0xc00000000000ca58 [unknown] [H] 0xc000000f170e5200
[unknown] N/A N/A No
12.50% 0 L3 hit [k]
0xc00000000006b560 [kernel.vmlinux] [k] 0xc00000002f6fc600
[kernel.vmlinux] N/A N/A No
12.50% 0 L3 hit [k]
0xc00000000006b560 [kernel.vmlinux] [k] 0xc00000003f8b1980
[kernel.vmlinux] N/A N/A No
12.50% 0 Local RAM hit [k]
0xc00000000094ad14 [kernel.vmlinux] [k] 0xc000000033b5bdf4
[kernel.vmlinux] Miss N/A No
12.50% 0 Remote Cache (1 hop) hit [k]
0xc0000000001ce31c [kernel.vmlinux] [k] 0xc000000000e88648
[kernel.vmlinux] HitM N/A No
With this patch, perf.data from a power8 LE guest:
===================================================
$ sudo ./perf record -d -e mem_access ls
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.010 MB perf.data (6 samples) ]
$ sudo ./perf report --mem-mode --stdio
# To display the perf.data header info, please use
--header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 6 of event 'mem_access'
# Total weight : 6
# Sort order :
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead Local Weight Memory access Symbol
Shared Object Data Symbol Data Object Snoop TLB
access Locked
# ........ ............ ........................
...................... ................ ......................
................ ............ ...................... ......
#
16.67% 0 L2 hit [.]
_init ls [.] 0x000001002ef4dd71
[heap] N/A N/A No
16.67% 0 L2 hit [k]
irq_exit [kernel.vmlinux] [k] 0xc0000000ff6e2080
[kernel.vmlinux] N/A N/A No
16.67% 0 L2 hit [H]
0xc0000000000ba210 [unknown] [H] 0xc00000075ce2fe38
[unknown] N/A N/A No
16.67% 0 L2 hit [H]
0xc00000000000bdb0 [unknown] [H] 0xc00000075ce2fee0
[unknown] N/A N/A No
16.67% 0 L2 hit [H]
0xc0000000000bb444 [unknown] [H] 0xc00000075ce30490
[unknown] N/A N/A No
16.67% 0 L3 hit [H]
0x0000000000066524 [unknown] [H] 0xc0000000014e0000
[unknown] N/A N/A No
perf report from power7 BE guest(with this patch applied to perf tool):
=========================================================================
$ ./perf report --mem-mode --stdio
No kallsyms or vmlinux with build-id
06240a7589956e7d388bdffdab9f7f138834fa81 was found
/lib/modules/4.10.0-rc5+/build/vmlinux with build id
06240a7589956e7d388bdffdab9f7f138834fa81 not found, continuing without
symbols
No kallsyms or vmlinux with build-id
06240a7589956e7d388bdffdab9f7f138834fa81 was found
/lib/modules/4.10.0-rc5+/build/vmlinux with build id
06240a7589956e7d388bdffdab9f7f138834fa81 not found, continuing without
symbols
/usr/bin/ls with build id cf69c39cf0d28e5be86c03de5c556e3ce8d6ce27
not found, continuing without symbols
# To display the perf.data header info, please use
--header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 6 of event 'mem_access'
# Total weight : 6
# Sort order :
local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead Local Weight Memory access Symbol
Shared Object Data Symbol Data Object Snoop TLB
access Locked
# ........ ............ ........................
...................... ................ ......................
................ ............ ...................... ......
#
16.67% 0 L2 hit [k]
0xc0000000000d3cc8 [kernel.vmlinux] [k] 0xc0000000ff6e2080
[kernel.vmlinux] N/A N/A No
16.67% 0 L2 hit [.]
0x0000000000012ba8 ls [.] 0x000001002ef4dd71
[heap] N/A N/A No
16.67% 0 L2 hit [H]
0xc0000000000ba210 [unknown] [H] 0xc00000075ce2fe38
[unknown] N/A N/A No
16.67% 0 L2 hit [H]
0xc00000000000bdb0 [unknown] [H] 0xc00000075ce2fee0
[unknown] N/A N/A No
16.67% 0 L2 hit [H]
0xc0000000000bb444 [unknown] [H] 0xc00000075ce30490
[unknown] N/A N/A No
16.67% 0 L3 hit [H]
0x0000000000066524 [unknown] [H] 0xc0000000014e0000
[unknown] N/A N/A No
Maddy
>
On Tuesday 07 March 2017 03:53 PM, Peter Zijlstra wrote:
> On Tue, Mar 07, 2017 at 03:28:17PM +0530, Madhavan Srinivasan wrote:
>>
>> On Monday 06 March 2017 04:52 PM, Peter Zijlstra wrote:
>>> On Mon, Mar 06, 2017 at 04:13:08PM +0530, Madhavan Srinivasan wrote:
>>>> From: Sukadev Bhattiprolu <[email protected]>
>>>>
>>>> perf_mem_data_src is an union that is initialized via the ->val field
>>>> and accessed via the bitmap fields. For this to work on big endian
>>>> platforms, we also need a big-endian represenation of perf_mem_data_src.
>>> Doesn't this break interpreting the data on a different endian machine?
>> IIUC, we will need this patch to not to break the interpreting data
>> on a different endian machine. Data collected from power8 LE/BE
>> guests with this patchset applied. Kindly correct me if I missed
>> your question here.
> So your patch adds compile time bitfield differences. My worry was that
> there was no dynamic conversion routine in the tools (it has for a lot
> of other places).
>
> This yields two questions:
>
> - are these two static layouts identical? (seeing that you illustrate
> cross-endian things working this seems likely).
>
> - should you not have fixed this in the tool only? This patch
> effectively breaks ABI on big-endian architectures.
IIUC, we are the first BE user for this feature
(Kindly correct me if I am wrong), so technically we
are not breaking ABI here :) . But let me also look at
the dynamic conversion part.
Maddy
>
On Mon, Mar 13, 2017 at 04:45:51PM +0530, Madhavan Srinivasan wrote:
> > - should you not have fixed this in the tool only? This patch
> > effectively breaks ABI on big-endian architectures.
>
> IIUC, we are the first BE user for this feature
> (Kindly correct me if I am wrong), so technically we
> are not breaking ABI here :) . But let me also look at
> the dynamic conversion part.
Huh? PPC hasn't yet implemented this? Then why are you fixing it?
Madhavan Srinivasan [[email protected]] wrote:
> The LDST field and DATA_SRC in SIER identifies the memory hierarchy level
> (eg: L1, L2 etc), from which a data-cache miss for a marked instruction
> was satisfied. Use the 'perf_mem_data_src' object to export this
> hierarchy level to user space.
>
<snip>
> diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
> int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
> {
> unsigned int unit, pmc, cache, ebb;
> diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
> index cf9bd8990159..982542cce991 100644
> --- a/arch/powerpc/perf/isa207-common.h
> +++ b/arch/powerpc/perf/isa207-common.h
> @@ -259,6 +259,19 @@
> #define MAX_ALT 2
> #define MAX_PMU_COUNTERS 6
>
> +#define ISA207_SIER_TYPE_SHIFT 15
> +#define ISA207_SIER_TYPE_MASK (0x7ull << ISA207_SIER_TYPE_SHIFT)
> +
> +#define ISA207_SIER_LDST_SHIFT 1
> +#define ISA207_SIER_LDST_MASK (0x7ull << ISA207_SIER_LDST_SHIFT)
> +
> +#define ISA207_SIER_DATA_SRC_SHIFT 53
> +#define ISA207_SIER_DATA_SRC_MASK (0x7ull << ISA207_SIER_DATA_SRC_SHIFT)
> +
> +#define P(a, b) PERF_MEM_S(a, b)
Madhavan, Peter,
Can we see if we can get the kernel to set 'perf_mem_data_src.val' in
endian-nuetral format?
With something like (untested) in include/uapi/linux/perf_event.h
#define PERF_MEM_OP_NBITS PERF_MEM_LVL_SHIFT
#define PERF_MEM_LVL_NBITS PERF_MEM_SNOOP_SHIFT
#define PERF_MEM_SNOOP_NBITS PERF_MEM_LOCK_SHIFT
#define PERF_MEM_TLB_NBITS PERF_MEM_TLB_SHIFT
and here in arch/powerpc/perf/isa207-common.h
#define PERF_MEM_S_BE_SHIFT(a) \
(63 - PERF_MEM_##a##_NBITS - PERF_MEM_##a##_SHIFT)
#define PERF_MEM_S_BE(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_S_BE_SHIFT(a))
#define P(a, b) PERF_MEM_S_BE(a, b)
Basically, have PERF_MEM_OP_NA be the right most bit and PERF_MEM_TLB_OS
the left most bit in perf_mem_data_src.val regardless of the endianness?
Sukadev
On Monday 13 March 2017 06:20 PM, Peter Zijlstra wrote:
> On Mon, Mar 13, 2017 at 04:45:51PM +0530, Madhavan Srinivasan wrote:
>>> - should you not have fixed this in the tool only? This patch
>>> effectively breaks ABI on big-endian architectures.
>> IIUC, we are the first BE user for this feature
>> (Kindly correct me if I am wrong), so technically we
>> are not breaking ABI here :) . But let me also look at
>> the dynamic conversion part.
> Huh? PPC hasn't yet implemented this? Then why are you fixing it?
yes, PPC hasn't implemented this (until now).
And did not understand "Then why are you fixing it?"
Maddy
>
On Tue, Mar 14, 2017 at 02:31:51PM +0530, Madhavan Srinivasan wrote:
> >Huh? PPC hasn't yet implemented this? Then why are you fixing it?
>
> yes, PPC hasn't implemented this (until now).
until now where?
> And did not understand "Then why are you fixing it?"
I see no implementation; so why are you poking at it.
Hi Peter,
Peter Zijlstra <[email protected]> writes:
> On Tue, Mar 14, 2017 at 02:31:51PM +0530, Madhavan Srinivasan wrote:
>
>> >Huh? PPC hasn't yet implemented this? Then why are you fixing it?
>>
>> yes, PPC hasn't implemented this (until now).
>
> until now where?
On powerpc there is currently no kernel support for filling the data_src
value with anything meaningful.
A user can still request PERF_SAMPLE_DATA_SRC (perf report -d), but they
just get the default value from perf_sample_data_init(), which is
PERF_MEM_NA.
Though even that is currently broken with a big endian perf tool.
>> And did not understand "Then why are you fixing it?"
>
> I see no implementation; so why are you poking at it.
Maddy has posted an implementation of the kernel part for powerpc in
patch 2 of this series, but maybe you're not on Cc?
Regardless of us wanting to do the kernel side on powerpc, the current
API is broken on big endian.
That's because in the kernel the PERF_MEM_NA value is constructed using
shifts:
/* TLB access */
#define PERF_MEM_TLB_NA 0x01 /* not available */
...
#define PERF_MEM_TLB_SHIFT 26
#define PERF_MEM_S(a, s) \
(((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
#define PERF_MEM_NA (PERF_MEM_S(OP, NA) |\
PERF_MEM_S(LVL, NA) |\
PERF_MEM_S(SNOOP, NA) |\
PERF_MEM_S(LOCK, NA) |\
PERF_MEM_S(TLB, NA))
Which works out as:
((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))
Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
in CPU endian.
But then in the perf tool, the code uses the bitfields to inspect the
value, and currently the bitfields are defined using little endian
ordering.
So eg. in perf_mem__tlb_scnprintf() we see:
data_src->val = 0x5080021
op = 0x0
lvl = 0x0
snoop = 0x0
lock = 0x0
dtlb = 0x0
rsvd = 0x5080021
So this patch does what I think is the minimal fix, of changing the
definition of the bitfields to match the values that are already
exported by the kernel on big endian. And it makes no change on little
endian.
cheers
On Wed, Mar 15, 2017 at 05:20:15PM +1100, Michael Ellerman wrote:
> > I see no implementation; so why are you poking at it.
>
> Maddy has posted an implementation of the kernel part for powerpc in
> patch 2 of this series, but maybe you're not on Cc?
I am not indeed. That and a completely inadequate Changelog have lead to
great confusion.
On Wednesday 15 March 2017 11:50 AM, Michael Ellerman wrote:
> Hi Peter,
>
> Peter Zijlstra <[email protected]> writes:
>> On Tue, Mar 14, 2017 at 02:31:51PM +0530, Madhavan Srinivasan wrote:
>>
>>>> Huh? PPC hasn't yet implemented this? Then why are you fixing it?
>>> yes, PPC hasn't implemented this (until now).
>> until now where?
> On powerpc there is currently no kernel support for filling the data_src
> value with anything meaningful.
>
> A user can still request PERF_SAMPLE_DATA_SRC (perf report -d), but they
> just get the default value from perf_sample_data_init(), which is
> PERF_MEM_NA.
>
> Though even that is currently broken with a big endian perf tool.
>
>>> And did not understand "Then why are you fixing it?"
>> I see no implementation; so why are you poking at it.
> Maddy has posted an implementation of the kernel part for powerpc in
> patch 2 of this series, but maybe you're not on Cc?
Sorry, was out yesterday.
Yes my bad. I CCed lkml and ppcdev and took the emails
from get_maintainer script and added to each file.
I will send out a v3 with peterz and others in all patch.
>
> Regardless of us wanting to do the kernel side on powerpc, the current
> API is broken on big endian.
>
> That's because in the kernel the PERF_MEM_NA value is constructed using
> shifts:
>
> /* TLB access */
> #define PERF_MEM_TLB_NA 0x01 /* not available */
> ...
> #define PERF_MEM_TLB_SHIFT 26
>
> #define PERF_MEM_S(a, s) \
> (((__u64)PERF_MEM_##a##_##s) << PERF_MEM_##a##_SHIFT)
>
> #define PERF_MEM_NA (PERF_MEM_S(OP, NA) |\
> PERF_MEM_S(LVL, NA) |\
> PERF_MEM_S(SNOOP, NA) |\
> PERF_MEM_S(LOCK, NA) |\
> PERF_MEM_S(TLB, NA))
>
> Which works out as:
>
> ((0x01 << 0) | (0x01 << 5) | (0x01 << 19) | (0x01 << 24) | (0x01 << 26))
>
>
> Which means the PERF_MEM_NA value comes out of the kernel as 0x5080021
> in CPU endian.
>
> But then in the perf tool, the code uses the bitfields to inspect the
> value, and currently the bitfields are defined using little endian
> ordering.
>
> So eg. in perf_mem__tlb_scnprintf() we see:
> data_src->val = 0x5080021
> op = 0x0
> lvl = 0x0
> snoop = 0x0
> lock = 0x0
> dtlb = 0x0
> rsvd = 0x5080021
>
>
> So this patch does what I think is the minimal fix, of changing the
> definition of the bitfields to match the values that are already
> exported by the kernel on big endian. And it makes no change on little
> endian.
Thanks for the detailed explanation. I will add this to the patch
commit message in the v3.
Maddy
>
> cheers
>
On Wednesday 15 March 2017 05:53 PM, Peter Zijlstra wrote:
> On Wed, Mar 15, 2017 at 05:20:15PM +1100, Michael Ellerman wrote:
>
>>> I see no implementation; so why are you poking at it.
>> Maddy has posted an implementation of the kernel part for powerpc in
>> patch 2 of this series, but maybe you're not on Cc?
> I am not indeed. That and a completely inadequate Changelog have lead to
> great confusion.
Yes. my bad. I will send out a v3 today and will CC. Also will add
ellerman's explanation to the commit message.
Sorry for the confusion.
Maddy