2019-08-02 09:37:14

by Tan Xiaojun

[permalink] [raw]
Subject: [RFC PATCH 0/3] perf tools: Add support for "report" for some spe events

After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
Profiling Extensions (SPE) support") is merged, "perf record" and
"perf report --dump-raw-trace" have been supported. However, the
raw data that is dumped cannot be used without parsing.

This patchset is to improve the "perf report" support for spe, and
further process the data. Currently, support for the three events
of llc-miss, tlb-miss, and branch-miss is added.

More details in [2/3].

Tan Xiaojun (3):
perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
perf tools: Add support for "report" for some spe events
perf report: add --spe options for arm-spe

tools/perf/Documentation/perf-report.txt | 9 +
tools/perf/builtin-report.c | 5 +
tools/perf/util/Build | 2 +-
tools/perf/util/arm-spe-decoder/Build | 1 +
tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 214 ++++++
tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 51 ++
.../util/arm-spe-decoder/arm-spe-pkt-decoder.c | 462 +++++++++++++
.../util/arm-spe-decoder/arm-spe-pkt-decoder.h | 45 ++
tools/perf/util/arm-spe-pkt-decoder.c | 462 -------------
tools/perf/util/arm-spe-pkt-decoder.h | 43 --
tools/perf/util/arm-spe.c | 717 ++++++++++++++++++++-
tools/perf/util/auxtrace.c | 45 ++
tools/perf/util/auxtrace.h | 27 +
tools/perf/util/session.h | 2 +
14 files changed, 1544 insertions(+), 541 deletions(-)
create mode 100644 tools/perf/util/arm-spe-decoder/Build
create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
delete mode 100644 tools/perf/util/arm-spe-pkt-decoder.c
delete mode 100644 tools/perf/util/arm-spe-pkt-decoder.h

--
2.7.4


2019-08-02 09:37:45

by Tan Xiaojun

[permalink] [raw]
Subject: [RFC PATCH 1/3] perf tools: Move arm-spe-pkt-decoder.h/c to the new dir

Create a new arm-spe-decoder directory for subsequent extensions and
move arm-spe-pkt-decoder.h/c to this directory. No code changes.

Signed-off-by: Tan Xiaojun <[email protected]>
---
tools/perf/util/Build | 2 +-
tools/perf/util/arm-spe-decoder/Build | 1 +
.../util/arm-spe-decoder/arm-spe-pkt-decoder.c | 462 +++++++++++++++++++++
.../util/arm-spe-decoder/arm-spe-pkt-decoder.h | 43 ++
tools/perf/util/arm-spe-pkt-decoder.c | 462 ---------------------
tools/perf/util/arm-spe-pkt-decoder.h | 43 --
tools/perf/util/arm-spe.c | 2 +-
7 files changed, 508 insertions(+), 507 deletions(-)
create mode 100644 tools/perf/util/arm-spe-decoder/Build
create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
delete mode 100644 tools/perf/util/arm-spe-pkt-decoder.c
delete mode 100644 tools/perf/util/arm-spe-pkt-decoder.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 14f812b..762625c 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -96,7 +96,7 @@ perf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
perf-$(CONFIG_AUXTRACE) += intel-pt.o
perf-$(CONFIG_AUXTRACE) += intel-bts.o
perf-$(CONFIG_AUXTRACE) += arm-spe.o
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-decoder/
perf-$(CONFIG_AUXTRACE) += s390-cpumsf.o

ifdef CONFIG_LIBOPENCSD
diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
new file mode 100644
index 0000000..16efbc2
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -0,0 +1 @@
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
new file mode 100644
index 0000000..b94001b
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
@@ -0,0 +1,462 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Arm Statistical Profiling Extensions (SPE) support
+ * Copyright (c) 2017-2018, Arm Ltd.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <endian.h>
+#include <byteswap.h>
+
+#include "arm-spe-pkt-decoder.h"
+
+#define BIT(n) (1ULL << (n))
+
+#define NS_FLAG BIT(63)
+#define EL_FLAG (BIT(62) | BIT(61))
+
+#define SPE_HEADER0_PAD 0x0
+#define SPE_HEADER0_END 0x1
+#define SPE_HEADER0_ADDRESS 0x30 /* address packet (short) */
+#define SPE_HEADER0_ADDRESS_MASK 0x38
+#define SPE_HEADER0_COUNTER 0x18 /* counter packet (short) */
+#define SPE_HEADER0_COUNTER_MASK 0x38
+#define SPE_HEADER0_TIMESTAMP 0x71
+#define SPE_HEADER0_TIMESTAMP 0x71
+#define SPE_HEADER0_EVENTS 0x2
+#define SPE_HEADER0_EVENTS_MASK 0xf
+#define SPE_HEADER0_SOURCE 0x3
+#define SPE_HEADER0_SOURCE_MASK 0xf
+#define SPE_HEADER0_CONTEXT 0x24
+#define SPE_HEADER0_CONTEXT_MASK 0x3c
+#define SPE_HEADER0_OP_TYPE 0x8
+#define SPE_HEADER0_OP_TYPE_MASK 0x3c
+#define SPE_HEADER1_ALIGNMENT 0x0
+#define SPE_HEADER1_ADDRESS 0xb0 /* address packet (extended) */
+#define SPE_HEADER1_ADDRESS_MASK 0xf8
+#define SPE_HEADER1_COUNTER 0x98 /* counter packet (extended) */
+#define SPE_HEADER1_COUNTER_MASK 0xf8
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define le16_to_cpu bswap_16
+#define le32_to_cpu bswap_32
+#define le64_to_cpu bswap_64
+#define memcpy_le64(d, s, n) do { \
+ memcpy((d), (s), (n)); \
+ *(d) = le64_to_cpu(*(d)); \
+} while (0)
+#else
+#define le16_to_cpu
+#define le32_to_cpu
+#define le64_to_cpu
+#define memcpy_le64 memcpy
+#endif
+
+static const char * const arm_spe_packet_name[] = {
+ [ARM_SPE_PAD] = "PAD",
+ [ARM_SPE_END] = "END",
+ [ARM_SPE_TIMESTAMP] = "TS",
+ [ARM_SPE_ADDRESS] = "ADDR",
+ [ARM_SPE_COUNTER] = "LAT",
+ [ARM_SPE_CONTEXT] = "CONTEXT",
+ [ARM_SPE_OP_TYPE] = "OP-TYPE",
+ [ARM_SPE_EVENTS] = "EVENTS",
+ [ARM_SPE_DATA_SOURCE] = "DATA-SOURCE",
+};
+
+const char *arm_spe_pkt_name(enum arm_spe_pkt_type type)
+{
+ return arm_spe_packet_name[type];
+}
+
+/* return ARM SPE payload size from its encoding,
+ * which is in bits 5:4 of the byte.
+ * 00 : byte
+ * 01 : halfword (2)
+ * 10 : word (4)
+ * 11 : doubleword (8)
+ */
+static int payloadlen(unsigned char byte)
+{
+ return 1 << ((byte & 0x30) >> 4);
+}
+
+static int arm_spe_get_payload(const unsigned char *buf, size_t len,
+ struct arm_spe_pkt *packet)
+{
+ size_t payload_len = payloadlen(buf[0]);
+
+ if (len < 1 + payload_len)
+ return ARM_SPE_NEED_MORE_BYTES;
+
+ buf++;
+
+ switch (payload_len) {
+ case 1: packet->payload = *(uint8_t *)buf; break;
+ case 2: packet->payload = le16_to_cpu(*(uint16_t *)buf); break;
+ case 4: packet->payload = le32_to_cpu(*(uint32_t *)buf); break;
+ case 8: packet->payload = le64_to_cpu(*(uint64_t *)buf); break;
+ default: return ARM_SPE_BAD_PACKET;
+ }
+
+ return 1 + payload_len;
+}
+
+static int arm_spe_get_pad(struct arm_spe_pkt *packet)
+{
+ packet->type = ARM_SPE_PAD;
+ return 1;
+}
+
+static int arm_spe_get_alignment(const unsigned char *buf, size_t len,
+ struct arm_spe_pkt *packet)
+{
+ unsigned int alignment = 1 << ((buf[0] & 0xf) + 1);
+
+ if (len < alignment)
+ return ARM_SPE_NEED_MORE_BYTES;
+
+ packet->type = ARM_SPE_PAD;
+ return alignment - (((uintptr_t)buf) & (alignment - 1));
+}
+
+static int arm_spe_get_end(struct arm_spe_pkt *packet)
+{
+ packet->type = ARM_SPE_END;
+ return 1;
+}
+
+static int arm_spe_get_timestamp(const unsigned char *buf, size_t len,
+ struct arm_spe_pkt *packet)
+{
+ packet->type = ARM_SPE_TIMESTAMP;
+ return arm_spe_get_payload(buf, len, packet);
+}
+
+static int arm_spe_get_events(const unsigned char *buf, size_t len,
+ struct arm_spe_pkt *packet)
+{
+ int ret = arm_spe_get_payload(buf, len, packet);
+
+ packet->type = ARM_SPE_EVENTS;
+
+ /* we use index to identify Events with a less number of
+ * comparisons in arm_spe_pkt_desc(): E.g., the LLC-ACCESS,
+ * LLC-REFILL, and REMOTE-ACCESS events are identified iff
+ * index > 1.
+ */
+ packet->index = ret - 1;
+
+ return ret;
+}
+
+static int arm_spe_get_data_source(const unsigned char *buf, size_t len,
+ struct arm_spe_pkt *packet)
+{
+ packet->type = ARM_SPE_DATA_SOURCE;
+ return arm_spe_get_payload(buf, len, packet);
+}
+
+static int arm_spe_get_context(const unsigned char *buf, size_t len,
+ struct arm_spe_pkt *packet)
+{
+ packet->type = ARM_SPE_CONTEXT;
+ packet->index = buf[0] & 0x3;
+
+ return arm_spe_get_payload(buf, len, packet);
+}
+
+static int arm_spe_get_op_type(const unsigned char *buf, size_t len,
+ struct arm_spe_pkt *packet)
+{
+ packet->type = ARM_SPE_OP_TYPE;
+ packet->index = buf[0] & 0x3;
+ return arm_spe_get_payload(buf, len, packet);
+}
+
+static int arm_spe_get_counter(const unsigned char *buf, size_t len,
+ const unsigned char ext_hdr, struct arm_spe_pkt *packet)
+{
+ if (len < 2)
+ return ARM_SPE_NEED_MORE_BYTES;
+
+ packet->type = ARM_SPE_COUNTER;
+ if (ext_hdr)
+ packet->index = ((buf[0] & 0x3) << 3) | (buf[1] & 0x7);
+ else
+ packet->index = buf[0] & 0x7;
+
+ packet->payload = le16_to_cpu(*(uint16_t *)(buf + 1));
+
+ return 1 + ext_hdr + 2;
+}
+
+static int arm_spe_get_addr(const unsigned char *buf, size_t len,
+ const unsigned char ext_hdr, struct arm_spe_pkt *packet)
+{
+ if (len < 8)
+ return ARM_SPE_NEED_MORE_BYTES;
+
+ packet->type = ARM_SPE_ADDRESS;
+ if (ext_hdr)
+ packet->index = ((buf[0] & 0x3) << 3) | (buf[1] & 0x7);
+ else
+ packet->index = buf[0] & 0x7;
+
+ memcpy_le64(&packet->payload, buf + 1, 8);
+
+ return 1 + ext_hdr + 8;
+}
+
+static int arm_spe_do_get_packet(const unsigned char *buf, size_t len,
+ struct arm_spe_pkt *packet)
+{
+ unsigned int byte;
+
+ memset(packet, 0, sizeof(struct arm_spe_pkt));
+
+ if (!len)
+ return ARM_SPE_NEED_MORE_BYTES;
+
+ byte = buf[0];
+ if (byte == SPE_HEADER0_PAD)
+ return arm_spe_get_pad(packet);
+ else if (byte == SPE_HEADER0_END) /* no timestamp at end of record */
+ return arm_spe_get_end(packet);
+ else if (byte & 0xc0 /* 0y11xxxxxx */) {
+ if (byte & 0x80) {
+ if ((byte & SPE_HEADER0_ADDRESS_MASK) == SPE_HEADER0_ADDRESS)
+ return arm_spe_get_addr(buf, len, 0, packet);
+ if ((byte & SPE_HEADER0_COUNTER_MASK) == SPE_HEADER0_COUNTER)
+ return arm_spe_get_counter(buf, len, 0, packet);
+ } else
+ if (byte == SPE_HEADER0_TIMESTAMP)
+ return arm_spe_get_timestamp(buf, len, packet);
+ else if ((byte & SPE_HEADER0_EVENTS_MASK) == SPE_HEADER0_EVENTS)
+ return arm_spe_get_events(buf, len, packet);
+ else if ((byte & SPE_HEADER0_SOURCE_MASK) == SPE_HEADER0_SOURCE)
+ return arm_spe_get_data_source(buf, len, packet);
+ else if ((byte & SPE_HEADER0_CONTEXT_MASK) == SPE_HEADER0_CONTEXT)
+ return arm_spe_get_context(buf, len, packet);
+ else if ((byte & SPE_HEADER0_OP_TYPE_MASK) == SPE_HEADER0_OP_TYPE)
+ return arm_spe_get_op_type(buf, len, packet);
+ } else if ((byte & 0xe0) == 0x20 /* 0y001xxxxx */) {
+ /* 16-bit header */
+ byte = buf[1];
+ if (byte == SPE_HEADER1_ALIGNMENT)
+ return arm_spe_get_alignment(buf, len, packet);
+ else if ((byte & SPE_HEADER1_ADDRESS_MASK) == SPE_HEADER1_ADDRESS)
+ return arm_spe_get_addr(buf, len, 1, packet);
+ else if ((byte & SPE_HEADER1_COUNTER_MASK) == SPE_HEADER1_COUNTER)
+ return arm_spe_get_counter(buf, len, 1, packet);
+ }
+
+ return ARM_SPE_BAD_PACKET;
+}
+
+int arm_spe_get_packet(const unsigned char *buf, size_t len,
+ struct arm_spe_pkt *packet)
+{
+ int ret;
+
+ ret = arm_spe_do_get_packet(buf, len, packet);
+ /* put multiple consecutive PADs on the same line, up to
+ * the fixed-width output format of 16 bytes per line.
+ */
+ if (ret > 0 && packet->type == ARM_SPE_PAD) {
+ while (ret < 16 && len > (size_t)ret && !buf[ret])
+ ret += 1;
+ }
+ return ret;
+}
+
+int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
+ size_t buf_len)
+{
+ int ret, ns, el, idx = packet->index;
+ unsigned long long payload = packet->payload;
+ const char *name = arm_spe_pkt_name(packet->type);
+
+ switch (packet->type) {
+ case ARM_SPE_BAD:
+ case ARM_SPE_PAD:
+ case ARM_SPE_END:
+ return snprintf(buf, buf_len, "%s", name);
+ case ARM_SPE_EVENTS: {
+ size_t blen = buf_len;
+
+ ret = 0;
+ ret = snprintf(buf, buf_len, "EV");
+ buf += ret;
+ blen -= ret;
+ if (payload & 0x1) {
+ ret = snprintf(buf, buf_len, " EXCEPTION-GEN");
+ buf += ret;
+ blen -= ret;
+ }
+ if (payload & 0x2) {
+ ret = snprintf(buf, buf_len, " RETIRED");
+ buf += ret;
+ blen -= ret;
+ }
+ if (payload & 0x4) {
+ ret = snprintf(buf, buf_len, " L1D-ACCESS");
+ buf += ret;
+ blen -= ret;
+ }
+ if (payload & 0x8) {
+ ret = snprintf(buf, buf_len, " L1D-REFILL");
+ buf += ret;
+ blen -= ret;
+ }
+ if (payload & 0x10) {
+ ret = snprintf(buf, buf_len, " TLB-ACCESS");
+ buf += ret;
+ blen -= ret;
+ }
+ if (payload & 0x20) {
+ ret = snprintf(buf, buf_len, " TLB-REFILL");
+ buf += ret;
+ blen -= ret;
+ }
+ if (payload & 0x40) {
+ ret = snprintf(buf, buf_len, " NOT-TAKEN");
+ buf += ret;
+ blen -= ret;
+ }
+ if (payload & 0x80) {
+ ret = snprintf(buf, buf_len, " MISPRED");
+ buf += ret;
+ blen -= ret;
+ }
+ if (idx > 1) {
+ if (payload & 0x100) {
+ ret = snprintf(buf, buf_len, " LLC-ACCESS");
+ buf += ret;
+ blen -= ret;
+ }
+ if (payload & 0x200) {
+ ret = snprintf(buf, buf_len, " LLC-REFILL");
+ buf += ret;
+ blen -= ret;
+ }
+ if (payload & 0x400) {
+ ret = snprintf(buf, buf_len, " REMOTE-ACCESS");
+ buf += ret;
+ blen -= ret;
+ }
+ }
+ if (ret < 0)
+ return ret;
+ blen -= ret;
+ return buf_len - blen;
+ }
+ case ARM_SPE_OP_TYPE:
+ switch (idx) {
+ case 0: return snprintf(buf, buf_len, "%s", payload & 0x1 ?
+ "COND-SELECT" : "INSN-OTHER");
+ case 1: {
+ size_t blen = buf_len;
+
+ if (payload & 0x1)
+ ret = snprintf(buf, buf_len, "ST");
+ else
+ ret = snprintf(buf, buf_len, "LD");
+ buf += ret;
+ blen -= ret;
+ if (payload & 0x2) {
+ if (payload & 0x4) {
+ ret = snprintf(buf, buf_len, " AT");
+ buf += ret;
+ blen -= ret;
+ }
+ if (payload & 0x8) {
+ ret = snprintf(buf, buf_len, " EXCL");
+ buf += ret;
+ blen -= ret;
+ }
+ if (payload & 0x10) {
+ ret = snprintf(buf, buf_len, " AR");
+ buf += ret;
+ blen -= ret;
+ }
+ } else if (payload & 0x4) {
+ ret = snprintf(buf, buf_len, " SIMD-FP");
+ buf += ret;
+ blen -= ret;
+ }
+ if (ret < 0)
+ return ret;
+ blen -= ret;
+ return buf_len - blen;
+ }
+ case 2: {
+ size_t blen = buf_len;
+
+ ret = snprintf(buf, buf_len, "B");
+ buf += ret;
+ blen -= ret;
+ if (payload & 0x1) {
+ ret = snprintf(buf, buf_len, " COND");
+ buf += ret;
+ blen -= ret;
+ }
+ if (payload & 0x2) {
+ ret = snprintf(buf, buf_len, " IND");
+ buf += ret;
+ blen -= ret;
+ }
+ if (ret < 0)
+ return ret;
+ blen -= ret;
+ return buf_len - blen;
+ }
+ default: return 0;
+ }
+ case ARM_SPE_DATA_SOURCE:
+ case ARM_SPE_TIMESTAMP:
+ return snprintf(buf, buf_len, "%s %lld", name, payload);
+ case ARM_SPE_ADDRESS:
+ switch (idx) {
+ case 0:
+ case 1: ns = !!(packet->payload & NS_FLAG);
+ el = (packet->payload & EL_FLAG) >> 61;
+ payload &= ~(0xffULL << 56);
+ return snprintf(buf, buf_len, "%s 0x%llx el%d ns=%d",
+ (idx == 1) ? "TGT" : "PC", payload, el, ns);
+ case 2: return snprintf(buf, buf_len, "VA 0x%llx", payload);
+ case 3: ns = !!(packet->payload & NS_FLAG);
+ payload &= ~(0xffULL << 56);
+ return snprintf(buf, buf_len, "PA 0x%llx ns=%d",
+ payload, ns);
+ default: return 0;
+ }
+ case ARM_SPE_CONTEXT:
+ return snprintf(buf, buf_len, "%s 0x%lx el%d", name,
+ (unsigned long)payload, idx + 1);
+ case ARM_SPE_COUNTER: {
+ size_t blen = buf_len;
+
+ ret = snprintf(buf, buf_len, "%s %d ", name,
+ (unsigned short)payload);
+ buf += ret;
+ blen -= ret;
+ switch (idx) {
+ case 0: ret = snprintf(buf, buf_len, "TOT"); break;
+ case 1: ret = snprintf(buf, buf_len, "ISSUE"); break;
+ case 2: ret = snprintf(buf, buf_len, "XLAT"); break;
+ default: ret = 0;
+ }
+ if (ret < 0)
+ return ret;
+ blen -= ret;
+ return buf_len - blen;
+ }
+ default:
+ break;
+ }
+
+ return snprintf(buf, buf_len, "%s 0x%llx (%d)",
+ name, payload, packet->index);
+}
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
new file mode 100644
index 0000000..d786ef6
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Arm Statistical Profiling Extensions (SPE) support
+ * Copyright (c) 2017-2018, Arm Ltd.
+ */
+
+#ifndef INCLUDE__ARM_SPE_PKT_DECODER_H__
+#define INCLUDE__ARM_SPE_PKT_DECODER_H__
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define ARM_SPE_PKT_DESC_MAX 256
+
+#define ARM_SPE_NEED_MORE_BYTES -1
+#define ARM_SPE_BAD_PACKET -2
+
+enum arm_spe_pkt_type {
+ ARM_SPE_BAD,
+ ARM_SPE_PAD,
+ ARM_SPE_END,
+ ARM_SPE_TIMESTAMP,
+ ARM_SPE_ADDRESS,
+ ARM_SPE_COUNTER,
+ ARM_SPE_CONTEXT,
+ ARM_SPE_OP_TYPE,
+ ARM_SPE_EVENTS,
+ ARM_SPE_DATA_SOURCE,
+};
+
+struct arm_spe_pkt {
+ enum arm_spe_pkt_type type;
+ unsigned char index;
+ uint64_t payload;
+};
+
+const char *arm_spe_pkt_name(enum arm_spe_pkt_type);
+
+int arm_spe_get_packet(const unsigned char *buf, size_t len,
+ struct arm_spe_pkt *packet);
+
+int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf, size_t len);
+#endif
diff --git a/tools/perf/util/arm-spe-pkt-decoder.c b/tools/perf/util/arm-spe-pkt-decoder.c
deleted file mode 100644
index b94001b..0000000
--- a/tools/perf/util/arm-spe-pkt-decoder.c
+++ /dev/null
@@ -1,462 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Arm Statistical Profiling Extensions (SPE) support
- * Copyright (c) 2017-2018, Arm Ltd.
- */
-
-#include <stdio.h>
-#include <string.h>
-#include <endian.h>
-#include <byteswap.h>
-
-#include "arm-spe-pkt-decoder.h"
-
-#define BIT(n) (1ULL << (n))
-
-#define NS_FLAG BIT(63)
-#define EL_FLAG (BIT(62) | BIT(61))
-
-#define SPE_HEADER0_PAD 0x0
-#define SPE_HEADER0_END 0x1
-#define SPE_HEADER0_ADDRESS 0x30 /* address packet (short) */
-#define SPE_HEADER0_ADDRESS_MASK 0x38
-#define SPE_HEADER0_COUNTER 0x18 /* counter packet (short) */
-#define SPE_HEADER0_COUNTER_MASK 0x38
-#define SPE_HEADER0_TIMESTAMP 0x71
-#define SPE_HEADER0_TIMESTAMP 0x71
-#define SPE_HEADER0_EVENTS 0x2
-#define SPE_HEADER0_EVENTS_MASK 0xf
-#define SPE_HEADER0_SOURCE 0x3
-#define SPE_HEADER0_SOURCE_MASK 0xf
-#define SPE_HEADER0_CONTEXT 0x24
-#define SPE_HEADER0_CONTEXT_MASK 0x3c
-#define SPE_HEADER0_OP_TYPE 0x8
-#define SPE_HEADER0_OP_TYPE_MASK 0x3c
-#define SPE_HEADER1_ALIGNMENT 0x0
-#define SPE_HEADER1_ADDRESS 0xb0 /* address packet (extended) */
-#define SPE_HEADER1_ADDRESS_MASK 0xf8
-#define SPE_HEADER1_COUNTER 0x98 /* counter packet (extended) */
-#define SPE_HEADER1_COUNTER_MASK 0xf8
-
-#if __BYTE_ORDER == __BIG_ENDIAN
-#define le16_to_cpu bswap_16
-#define le32_to_cpu bswap_32
-#define le64_to_cpu bswap_64
-#define memcpy_le64(d, s, n) do { \
- memcpy((d), (s), (n)); \
- *(d) = le64_to_cpu(*(d)); \
-} while (0)
-#else
-#define le16_to_cpu
-#define le32_to_cpu
-#define le64_to_cpu
-#define memcpy_le64 memcpy
-#endif
-
-static const char * const arm_spe_packet_name[] = {
- [ARM_SPE_PAD] = "PAD",
- [ARM_SPE_END] = "END",
- [ARM_SPE_TIMESTAMP] = "TS",
- [ARM_SPE_ADDRESS] = "ADDR",
- [ARM_SPE_COUNTER] = "LAT",
- [ARM_SPE_CONTEXT] = "CONTEXT",
- [ARM_SPE_OP_TYPE] = "OP-TYPE",
- [ARM_SPE_EVENTS] = "EVENTS",
- [ARM_SPE_DATA_SOURCE] = "DATA-SOURCE",
-};
-
-const char *arm_spe_pkt_name(enum arm_spe_pkt_type type)
-{
- return arm_spe_packet_name[type];
-}
-
-/* return ARM SPE payload size from its encoding,
- * which is in bits 5:4 of the byte.
- * 00 : byte
- * 01 : halfword (2)
- * 10 : word (4)
- * 11 : doubleword (8)
- */
-static int payloadlen(unsigned char byte)
-{
- return 1 << ((byte & 0x30) >> 4);
-}
-
-static int arm_spe_get_payload(const unsigned char *buf, size_t len,
- struct arm_spe_pkt *packet)
-{
- size_t payload_len = payloadlen(buf[0]);
-
- if (len < 1 + payload_len)
- return ARM_SPE_NEED_MORE_BYTES;
-
- buf++;
-
- switch (payload_len) {
- case 1: packet->payload = *(uint8_t *)buf; break;
- case 2: packet->payload = le16_to_cpu(*(uint16_t *)buf); break;
- case 4: packet->payload = le32_to_cpu(*(uint32_t *)buf); break;
- case 8: packet->payload = le64_to_cpu(*(uint64_t *)buf); break;
- default: return ARM_SPE_BAD_PACKET;
- }
-
- return 1 + payload_len;
-}
-
-static int arm_spe_get_pad(struct arm_spe_pkt *packet)
-{
- packet->type = ARM_SPE_PAD;
- return 1;
-}
-
-static int arm_spe_get_alignment(const unsigned char *buf, size_t len,
- struct arm_spe_pkt *packet)
-{
- unsigned int alignment = 1 << ((buf[0] & 0xf) + 1);
-
- if (len < alignment)
- return ARM_SPE_NEED_MORE_BYTES;
-
- packet->type = ARM_SPE_PAD;
- return alignment - (((uintptr_t)buf) & (alignment - 1));
-}
-
-static int arm_spe_get_end(struct arm_spe_pkt *packet)
-{
- packet->type = ARM_SPE_END;
- return 1;
-}
-
-static int arm_spe_get_timestamp(const unsigned char *buf, size_t len,
- struct arm_spe_pkt *packet)
-{
- packet->type = ARM_SPE_TIMESTAMP;
- return arm_spe_get_payload(buf, len, packet);
-}
-
-static int arm_spe_get_events(const unsigned char *buf, size_t len,
- struct arm_spe_pkt *packet)
-{
- int ret = arm_spe_get_payload(buf, len, packet);
-
- packet->type = ARM_SPE_EVENTS;
-
- /* we use index to identify Events with a less number of
- * comparisons in arm_spe_pkt_desc(): E.g., the LLC-ACCESS,
- * LLC-REFILL, and REMOTE-ACCESS events are identified iff
- * index > 1.
- */
- packet->index = ret - 1;
-
- return ret;
-}
-
-static int arm_spe_get_data_source(const unsigned char *buf, size_t len,
- struct arm_spe_pkt *packet)
-{
- packet->type = ARM_SPE_DATA_SOURCE;
- return arm_spe_get_payload(buf, len, packet);
-}
-
-static int arm_spe_get_context(const unsigned char *buf, size_t len,
- struct arm_spe_pkt *packet)
-{
- packet->type = ARM_SPE_CONTEXT;
- packet->index = buf[0] & 0x3;
-
- return arm_spe_get_payload(buf, len, packet);
-}
-
-static int arm_spe_get_op_type(const unsigned char *buf, size_t len,
- struct arm_spe_pkt *packet)
-{
- packet->type = ARM_SPE_OP_TYPE;
- packet->index = buf[0] & 0x3;
- return arm_spe_get_payload(buf, len, packet);
-}
-
-static int arm_spe_get_counter(const unsigned char *buf, size_t len,
- const unsigned char ext_hdr, struct arm_spe_pkt *packet)
-{
- if (len < 2)
- return ARM_SPE_NEED_MORE_BYTES;
-
- packet->type = ARM_SPE_COUNTER;
- if (ext_hdr)
- packet->index = ((buf[0] & 0x3) << 3) | (buf[1] & 0x7);
- else
- packet->index = buf[0] & 0x7;
-
- packet->payload = le16_to_cpu(*(uint16_t *)(buf + 1));
-
- return 1 + ext_hdr + 2;
-}
-
-static int arm_spe_get_addr(const unsigned char *buf, size_t len,
- const unsigned char ext_hdr, struct arm_spe_pkt *packet)
-{
- if (len < 8)
- return ARM_SPE_NEED_MORE_BYTES;
-
- packet->type = ARM_SPE_ADDRESS;
- if (ext_hdr)
- packet->index = ((buf[0] & 0x3) << 3) | (buf[1] & 0x7);
- else
- packet->index = buf[0] & 0x7;
-
- memcpy_le64(&packet->payload, buf + 1, 8);
-
- return 1 + ext_hdr + 8;
-}
-
-static int arm_spe_do_get_packet(const unsigned char *buf, size_t len,
- struct arm_spe_pkt *packet)
-{
- unsigned int byte;
-
- memset(packet, 0, sizeof(struct arm_spe_pkt));
-
- if (!len)
- return ARM_SPE_NEED_MORE_BYTES;
-
- byte = buf[0];
- if (byte == SPE_HEADER0_PAD)
- return arm_spe_get_pad(packet);
- else if (byte == SPE_HEADER0_END) /* no timestamp at end of record */
- return arm_spe_get_end(packet);
- else if (byte & 0xc0 /* 0y11xxxxxx */) {
- if (byte & 0x80) {
- if ((byte & SPE_HEADER0_ADDRESS_MASK) == SPE_HEADER0_ADDRESS)
- return arm_spe_get_addr(buf, len, 0, packet);
- if ((byte & SPE_HEADER0_COUNTER_MASK) == SPE_HEADER0_COUNTER)
- return arm_spe_get_counter(buf, len, 0, packet);
- } else
- if (byte == SPE_HEADER0_TIMESTAMP)
- return arm_spe_get_timestamp(buf, len, packet);
- else if ((byte & SPE_HEADER0_EVENTS_MASK) == SPE_HEADER0_EVENTS)
- return arm_spe_get_events(buf, len, packet);
- else if ((byte & SPE_HEADER0_SOURCE_MASK) == SPE_HEADER0_SOURCE)
- return arm_spe_get_data_source(buf, len, packet);
- else if ((byte & SPE_HEADER0_CONTEXT_MASK) == SPE_HEADER0_CONTEXT)
- return arm_spe_get_context(buf, len, packet);
- else if ((byte & SPE_HEADER0_OP_TYPE_MASK) == SPE_HEADER0_OP_TYPE)
- return arm_spe_get_op_type(buf, len, packet);
- } else if ((byte & 0xe0) == 0x20 /* 0y001xxxxx */) {
- /* 16-bit header */
- byte = buf[1];
- if (byte == SPE_HEADER1_ALIGNMENT)
- return arm_spe_get_alignment(buf, len, packet);
- else if ((byte & SPE_HEADER1_ADDRESS_MASK) == SPE_HEADER1_ADDRESS)
- return arm_spe_get_addr(buf, len, 1, packet);
- else if ((byte & SPE_HEADER1_COUNTER_MASK) == SPE_HEADER1_COUNTER)
- return arm_spe_get_counter(buf, len, 1, packet);
- }
-
- return ARM_SPE_BAD_PACKET;
-}
-
-int arm_spe_get_packet(const unsigned char *buf, size_t len,
- struct arm_spe_pkt *packet)
-{
- int ret;
-
- ret = arm_spe_do_get_packet(buf, len, packet);
- /* put multiple consecutive PADs on the same line, up to
- * the fixed-width output format of 16 bytes per line.
- */
- if (ret > 0 && packet->type == ARM_SPE_PAD) {
- while (ret < 16 && len > (size_t)ret && !buf[ret])
- ret += 1;
- }
- return ret;
-}
-
-int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
- size_t buf_len)
-{
- int ret, ns, el, idx = packet->index;
- unsigned long long payload = packet->payload;
- const char *name = arm_spe_pkt_name(packet->type);
-
- switch (packet->type) {
- case ARM_SPE_BAD:
- case ARM_SPE_PAD:
- case ARM_SPE_END:
- return snprintf(buf, buf_len, "%s", name);
- case ARM_SPE_EVENTS: {
- size_t blen = buf_len;
-
- ret = 0;
- ret = snprintf(buf, buf_len, "EV");
- buf += ret;
- blen -= ret;
- if (payload & 0x1) {
- ret = snprintf(buf, buf_len, " EXCEPTION-GEN");
- buf += ret;
- blen -= ret;
- }
- if (payload & 0x2) {
- ret = snprintf(buf, buf_len, " RETIRED");
- buf += ret;
- blen -= ret;
- }
- if (payload & 0x4) {
- ret = snprintf(buf, buf_len, " L1D-ACCESS");
- buf += ret;
- blen -= ret;
- }
- if (payload & 0x8) {
- ret = snprintf(buf, buf_len, " L1D-REFILL");
- buf += ret;
- blen -= ret;
- }
- if (payload & 0x10) {
- ret = snprintf(buf, buf_len, " TLB-ACCESS");
- buf += ret;
- blen -= ret;
- }
- if (payload & 0x20) {
- ret = snprintf(buf, buf_len, " TLB-REFILL");
- buf += ret;
- blen -= ret;
- }
- if (payload & 0x40) {
- ret = snprintf(buf, buf_len, " NOT-TAKEN");
- buf += ret;
- blen -= ret;
- }
- if (payload & 0x80) {
- ret = snprintf(buf, buf_len, " MISPRED");
- buf += ret;
- blen -= ret;
- }
- if (idx > 1) {
- if (payload & 0x100) {
- ret = snprintf(buf, buf_len, " LLC-ACCESS");
- buf += ret;
- blen -= ret;
- }
- if (payload & 0x200) {
- ret = snprintf(buf, buf_len, " LLC-REFILL");
- buf += ret;
- blen -= ret;
- }
- if (payload & 0x400) {
- ret = snprintf(buf, buf_len, " REMOTE-ACCESS");
- buf += ret;
- blen -= ret;
- }
- }
- if (ret < 0)
- return ret;
- blen -= ret;
- return buf_len - blen;
- }
- case ARM_SPE_OP_TYPE:
- switch (idx) {
- case 0: return snprintf(buf, buf_len, "%s", payload & 0x1 ?
- "COND-SELECT" : "INSN-OTHER");
- case 1: {
- size_t blen = buf_len;
-
- if (payload & 0x1)
- ret = snprintf(buf, buf_len, "ST");
- else
- ret = snprintf(buf, buf_len, "LD");
- buf += ret;
- blen -= ret;
- if (payload & 0x2) {
- if (payload & 0x4) {
- ret = snprintf(buf, buf_len, " AT");
- buf += ret;
- blen -= ret;
- }
- if (payload & 0x8) {
- ret = snprintf(buf, buf_len, " EXCL");
- buf += ret;
- blen -= ret;
- }
- if (payload & 0x10) {
- ret = snprintf(buf, buf_len, " AR");
- buf += ret;
- blen -= ret;
- }
- } else if (payload & 0x4) {
- ret = snprintf(buf, buf_len, " SIMD-FP");
- buf += ret;
- blen -= ret;
- }
- if (ret < 0)
- return ret;
- blen -= ret;
- return buf_len - blen;
- }
- case 2: {
- size_t blen = buf_len;
-
- ret = snprintf(buf, buf_len, "B");
- buf += ret;
- blen -= ret;
- if (payload & 0x1) {
- ret = snprintf(buf, buf_len, " COND");
- buf += ret;
- blen -= ret;
- }
- if (payload & 0x2) {
- ret = snprintf(buf, buf_len, " IND");
- buf += ret;
- blen -= ret;
- }
- if (ret < 0)
- return ret;
- blen -= ret;
- return buf_len - blen;
- }
- default: return 0;
- }
- case ARM_SPE_DATA_SOURCE:
- case ARM_SPE_TIMESTAMP:
- return snprintf(buf, buf_len, "%s %lld", name, payload);
- case ARM_SPE_ADDRESS:
- switch (idx) {
- case 0:
- case 1: ns = !!(packet->payload & NS_FLAG);
- el = (packet->payload & EL_FLAG) >> 61;
- payload &= ~(0xffULL << 56);
- return snprintf(buf, buf_len, "%s 0x%llx el%d ns=%d",
- (idx == 1) ? "TGT" : "PC", payload, el, ns);
- case 2: return snprintf(buf, buf_len, "VA 0x%llx", payload);
- case 3: ns = !!(packet->payload & NS_FLAG);
- payload &= ~(0xffULL << 56);
- return snprintf(buf, buf_len, "PA 0x%llx ns=%d",
- payload, ns);
- default: return 0;
- }
- case ARM_SPE_CONTEXT:
- return snprintf(buf, buf_len, "%s 0x%lx el%d", name,
- (unsigned long)payload, idx + 1);
- case ARM_SPE_COUNTER: {
- size_t blen = buf_len;
-
- ret = snprintf(buf, buf_len, "%s %d ", name,
- (unsigned short)payload);
- buf += ret;
- blen -= ret;
- switch (idx) {
- case 0: ret = snprintf(buf, buf_len, "TOT"); break;
- case 1: ret = snprintf(buf, buf_len, "ISSUE"); break;
- case 2: ret = snprintf(buf, buf_len, "XLAT"); break;
- default: ret = 0;
- }
- if (ret < 0)
- return ret;
- blen -= ret;
- return buf_len - blen;
- }
- default:
- break;
- }
-
- return snprintf(buf, buf_len, "%s 0x%llx (%d)",
- name, payload, packet->index);
-}
diff --git a/tools/perf/util/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-pkt-decoder.h
deleted file mode 100644
index d786ef6..0000000
--- a/tools/perf/util/arm-spe-pkt-decoder.h
+++ /dev/null
@@ -1,43 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Arm Statistical Profiling Extensions (SPE) support
- * Copyright (c) 2017-2018, Arm Ltd.
- */
-
-#ifndef INCLUDE__ARM_SPE_PKT_DECODER_H__
-#define INCLUDE__ARM_SPE_PKT_DECODER_H__
-
-#include <stddef.h>
-#include <stdint.h>
-
-#define ARM_SPE_PKT_DESC_MAX 256
-
-#define ARM_SPE_NEED_MORE_BYTES -1
-#define ARM_SPE_BAD_PACKET -2
-
-enum arm_spe_pkt_type {
- ARM_SPE_BAD,
- ARM_SPE_PAD,
- ARM_SPE_END,
- ARM_SPE_TIMESTAMP,
- ARM_SPE_ADDRESS,
- ARM_SPE_COUNTER,
- ARM_SPE_CONTEXT,
- ARM_SPE_OP_TYPE,
- ARM_SPE_EVENTS,
- ARM_SPE_DATA_SOURCE,
-};
-
-struct arm_spe_pkt {
- enum arm_spe_pkt_type type;
- unsigned char index;
- uint64_t payload;
-};
-
-const char *arm_spe_pkt_name(enum arm_spe_pkt_type);
-
-int arm_spe_get_packet(const unsigned char *buf, size_t len,
- struct arm_spe_pkt *packet);
-
-int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf, size_t len);
-#endif
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index a314e5b..c07837c 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -24,7 +24,7 @@
#include "debug.h"
#include "auxtrace.h"
#include "arm-spe.h"
-#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder/arm-spe-pkt-decoder.h"

struct arm_spe {
struct auxtrace auxtrace;
--
2.7.4

2019-08-02 09:38:32

by Tan Xiaojun

[permalink] [raw]
Subject: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
Profiling Extensions (SPE) support") is merged, "perf record" and
"perf report --dump-raw-trace" have been supported. However, the
raw data that is dumped cannot be used without parsing.

This patch is to improve the "perf report" support for spe, and
further process the data. Currently, support for the three events
of llc-miss, tlb-miss, and branch-miss is added.

Example usage:

--------------------------------------------------------------------
...
37.84% 37.84% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
16.22% 16.22% dd [kernel.kallsyms] [k] copy_page
5.41% 5.41% dd [kernel.kallsyms] [k] find_vma
5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap
5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range
5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x
5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data
2.70% 2.70% dd [kernel.kallsyms] [k] __remove_shared_vm_struct.isra.1
2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free
2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
2.70% 2.70% dd dd [.] 0x000000000000d9d8
2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object
2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork
2.70% 2.70% dd libc-2.28.so [.] _dl_addr

12.50% 12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry
12.50% 12.50% dd [kernel.kallsyms] [k] kmem_cache_free
12.50% 12.50% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
12.50% 12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
12.50% 12.50% dd dd [.] 0x000000000000d9d8
12.50% 12.50% dd libc-2.28.so [.] __unregister_atfork
12.50% 12.50% dd libc-2.28.so [.] _nl_intern_locale_data
12.50% 12.50% dd libc-2.28.so [.] vfprintf

16.67% 16.67% dd libc-2.28.so [.] read_alias_file
8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user
8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user
8.33% 8.33% dd [kernel.kallsyms] [k] lookup_fast
8.33% 8.33% dd [kernel.kallsyms] [k] strncpy_from_user
8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x
8.33% 8.33% dd ld-2.28.so [.] check_match
8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l
8.33% 8.33% dd libc-2.28.so [.] _dl_addr
8.33% 8.33% dd libc-2.28.so [.] _int_malloc
8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data

--------------------------------------------------------------------

After that, more analysis and processing of the raw data of spe
will be done.

Signed-off-by: Tan Xiaojun <[email protected]>
---
tools/perf/builtin-report.c | 5 +
tools/perf/util/arm-spe-decoder/Build | 2 +-
tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 214 ++++++
tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 51 ++
.../util/arm-spe-decoder/arm-spe-pkt-decoder.h | 2 +
tools/perf/util/arm-spe.c | 715 ++++++++++++++++++++-
tools/perf/util/auxtrace.c | 45 ++
tools/perf/util/auxtrace.h | 27 +
tools/perf/util/session.h | 2 +
9 files changed, 1028 insertions(+), 35 deletions(-)
create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index abf0b9b..fadc8eb 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1007,6 +1007,7 @@ int cmd_report(int argc, const char **argv)
{
struct perf_session *session;
struct itrace_synth_opts itrace_synth_opts = { .set = 0, };
+ struct arm_spe_synth_opts arm_spe_synth_opts;
struct stat st;
bool has_br_stack = false;
int branch_mode = -1;
@@ -1165,6 +1166,9 @@ int cmd_report(int argc, const char **argv)
OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
"Instruction Tracing options\n" ITRACE_HELP,
itrace_parse_synth_opts),
+ OPT_CALLBACK_OPTARG(0, "spe", &arm_spe_synth_opts, NULL, "spe opts",
+ "ARM SPE Tracing options",
+ arm_spe_parse_synth_opts),
OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename,
"Show full source file name path for source lines"),
OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
@@ -1266,6 +1270,7 @@ int cmd_report(int argc, const char **argv)
}

session->itrace_synth_opts = &itrace_synth_opts;
+ session->arm_spe_synth_opts = &arm_spe_synth_opts;

report.session = session;

diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
index 16efbc2..f8dae13 100644
--- a/tools/perf/util/arm-spe-decoder/Build
+++ b/tools/perf/util/arm-spe-decoder/Build
@@ -1 +1 @@
-perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
+perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
new file mode 100644
index 0000000..8008375
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -0,0 +1,214 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <linux/compiler.h>
+#include <linux/zalloc.h>
+
+#include "../util.h"
+#include "../auxtrace.h"
+
+#include "arm-spe-pkt-decoder.h"
+#include "arm-spe-decoder.h"
+
+struct arm_spe_decoder {
+ int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+ void *data;
+ struct arm_spe_state state;
+ const unsigned char *buf;
+ size_t len;
+ uint64_t pos;
+ struct arm_spe_pkt packet;
+ int pkt_step;
+ int pkt_len;
+ int last_packet_type;
+
+ uint64_t last_ip;
+ uint64_t ip;
+ uint64_t timestamp;
+ uint64_t sample_timestamp;
+ const unsigned char *next_buf;
+ size_t next_len;
+ unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
+};
+
+static uint64_t arm_spe_calc_ip(uint64_t payload)
+{
+ uint64_t ip = (payload & ~(0xffULL << 56));
+
+ /* fill high 8 bits for kernel virtual address */
+ if (ip & 0x1000000000000ULL)
+ ip |= (uint64_t)0xff00000000000000ULL;
+
+ return ip;
+}
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
+{
+ struct arm_spe_decoder *decoder;
+
+ if (!params->get_trace)
+ return NULL;
+
+ decoder = zalloc(sizeof(struct arm_spe_decoder));
+ if (!decoder)
+ return NULL;
+
+ decoder->get_trace = params->get_trace;
+ decoder->data = params->data;
+
+ return decoder;
+}
+
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
+{
+ free(decoder);
+}
+
+static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
+{
+ decoder->pkt_len = 1;
+ decoder->pkt_step = 1;
+ pr_debug("ERROR: Bad packet\n");
+
+ return -EBADMSG;
+}
+
+
+static int arm_spe_get_data(struct arm_spe_decoder *decoder)
+{
+ struct arm_spe_buffer buffer = { .buf = 0, };
+ int ret;
+
+ decoder->pkt_step = 0;
+
+ pr_debug("Getting more data\n");
+ ret = decoder->get_trace(&buffer, decoder->data);
+ if (ret)
+ return ret;
+
+ decoder->buf = buffer.buf;
+ decoder->len = buffer.len;
+ if (!decoder->len) {
+ pr_debug("No more data\n");
+ return -ENODATA;
+ }
+
+ return 0;
+}
+
+static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
+{
+ return arm_spe_get_data(decoder);
+}
+
+static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
+{
+ int ret;
+
+ decoder->last_packet_type = decoder->packet.type;
+
+ do {
+ decoder->pos += decoder->pkt_step;
+ decoder->buf += decoder->pkt_step;
+ decoder->len -= decoder->pkt_step;
+
+
+ if (!decoder->len) {
+ ret = arm_spe_get_next_data(decoder);
+ if (ret)
+ return ret;
+ }
+
+ ret = arm_spe_get_packet(decoder->buf, decoder->len,
+ &decoder->packet);
+ if (ret <= 0)
+ return arm_spe_bad_packet(decoder);
+
+ decoder->pkt_len = ret;
+ decoder->pkt_step = ret;
+ } while (decoder->packet.type == ARM_SPE_PAD);
+
+ return 0;
+}
+
+static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
+{
+ int err;
+ int idx;
+ uint64_t payload;
+
+ while (1) {
+ err = arm_spe_get_next_packet(decoder);
+ if (err)
+ return err;
+
+ idx = decoder->packet.index;
+ payload = decoder->packet.payload;
+
+ switch (decoder->packet.type) {
+ case ARM_SPE_TIMESTAMP:
+ decoder->sample_timestamp = payload;
+ return 0;
+ case ARM_SPE_END:
+ decoder->sample_timestamp = 0;
+ return 0;
+ case ARM_SPE_ADDRESS:
+ decoder->ip = arm_spe_calc_ip(payload);
+ if (idx == 0)
+ decoder->state.from_ip = decoder->ip;
+ else if (idx == 1)
+ decoder->state.to_ip = decoder->ip;
+ break;
+ case ARM_SPE_COUNTER:
+ break;
+ case ARM_SPE_CONTEXT:
+ break;
+ case ARM_SPE_OP_TYPE:
+ break;
+ case ARM_SPE_EVENTS:
+ if (payload & 0x20)
+ decoder->state.type |= ARM_SPE_TLB_MISS;
+ if (payload & 0x80)
+ decoder->state.type |= ARM_SPE_BRANCH_MISS;
+ if (idx > 1 && (payload & 0x200))
+ decoder->state.type |= ARM_SPE_LLC_MISS;
+
+ break;
+ case ARM_SPE_DATA_SOURCE:
+ break;
+ case ARM_SPE_BAD:
+ break;
+ case ARM_SPE_PAD:
+ break;
+ default:
+ pr_err("Get Packet Error!\n");
+ return -ENOSYS;
+ }
+ }
+}
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
+{
+ int err;
+
+ decoder->state.type = 0;
+
+ err = arm_spe_walk_trace(decoder);
+ if (err)
+ decoder->state.err = err;
+
+ decoder->state.timestamp = decoder->sample_timestamp;
+
+ return &decoder->state;
+}
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
new file mode 100644
index 0000000..e327378
--- /dev/null
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * arm_spe_decoder.c: ARM SPE support
+ */
+
+#ifndef INCLUDE__ARM_SPE_DECODER_H__
+#define INCLUDE__ARM_SPE_DECODER_H__
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+enum arm_spe_sample_type {
+ ARM_SPE_LLC_MISS = 1 << 0,
+ ARM_SPE_TLB_MISS = 1 << 1,
+ ARM_SPE_BRANCH_MISS = 1 << 2,
+ ARM_SPE_EX_STOP = 1 << 6,
+};
+
+struct arm_spe_state {
+ enum arm_spe_sample_type type;
+ int err;
+ uint64_t from_ip;
+ uint64_t to_ip;
+ uint64_t timestamp;
+};
+
+struct arm_spe_insn;
+
+struct arm_spe_buffer {
+ const unsigned char *buf;
+ size_t len;
+ u64 offset;
+ bool consecutive;
+ uint64_t ref_timestamp;
+ uint64_t trace_nr;
+};
+
+struct arm_spe_params {
+ int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
+ void *data;
+};
+
+struct arm_spe_decoder;
+
+struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params);
+void arm_spe_decoder_free(struct arm_spe_decoder *decoder);
+
+const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder);
+
+#endif
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
index d786ef6..865d1e3 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.h
@@ -15,6 +15,8 @@
#define ARM_SPE_NEED_MORE_BYTES -1
#define ARM_SPE_BAD_PACKET -2

+#define ARM_SPE_PKT_MAX_SZ 16
+
enum arm_spe_pkt_type {
ARM_SPE_BAD,
ARM_SPE_PAD,
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index c07837c..cf066a1 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -21,30 +21,57 @@
#include "machine.h"
#include "session.h"
#include "thread.h"
+#include "thread-stack.h"
+#include "symbol.h"
#include "debug.h"
#include "auxtrace.h"
#include "arm-spe.h"
+#include "arm-spe-decoder/arm-spe-decoder.h"
#include "arm-spe-decoder/arm-spe-pkt-decoder.h"

+#define MAX_TIMESTAMP (~0ULL)
+
struct arm_spe {
struct auxtrace auxtrace;
struct auxtrace_queues queues;
struct auxtrace_heap heap;
+ struct arm_spe_synth_opts synth_opts;
u32 auxtrace_type;
struct perf_session *session;
struct machine *machine;
u32 pmu_type;
+
+ u8 timeless_decoding;
+ u8 data_queued;
+
+ u8 sample_llc_miss;
+ u8 sample_tlb_miss;
+ u8 sample_branch_miss;
+ u64 llc_miss_id;
+ u64 tlb_miss_id;
+ u64 branch_miss_id;
+ u64 kernel_start;
+
+ unsigned long num_events;
};

struct arm_spe_queue {
- struct arm_spe *spe;
- unsigned int queue_nr;
- struct auxtrace_buffer *buffer;
- bool on_heap;
- bool done;
- pid_t pid;
- pid_t tid;
- int cpu;
+ struct arm_spe *spe;
+ unsigned int queue_nr;
+ struct auxtrace_buffer *buffer;
+ struct auxtrace_buffer *old_buffer;
+ union perf_event *event_buf;
+ bool on_heap;
+ bool done;
+ pid_t pid;
+ pid_t tid;
+ int cpu;
+ void *decoder;
+ const struct arm_spe_state *state;
+ u64 time;
+ u64 timestamp;
+ struct thread *thread;
+ bool have_sample;
};

static void arm_spe_dump(struct arm_spe *spe __maybe_unused,
@@ -93,44 +120,487 @@ static void arm_spe_dump_event(struct arm_spe *spe, unsigned char *buf,
arm_spe_dump(spe, buf, len);
}

-static int arm_spe_process_event(struct perf_session *session __maybe_unused,
- union perf_event *event __maybe_unused,
- struct perf_sample *sample __maybe_unused,
- struct perf_tool *tool __maybe_unused)
+static int arm_spe_get_trace(struct arm_spe_buffer *b, void *data)
{
+ struct arm_spe_queue *speq = data;
+ struct auxtrace_buffer *buffer = speq->buffer;
+ struct auxtrace_buffer *old_buffer = speq->old_buffer;
+ struct auxtrace_queue *queue;
+
+ queue = &speq->spe->queues.queue_array[speq->queue_nr];
+
+ buffer = auxtrace_buffer__next(queue, buffer);
+ /* If no more data, drop the previous auxtrace_buffer and return */
+ if (!buffer) {
+ if (old_buffer)
+ auxtrace_buffer__drop_data(old_buffer);
+ b->len = 0;
+ return 0;
+ }
+
+ speq->buffer = buffer;
+
+ /* If the aux_buffer doesn't have data associated, try to load it */
+ if (!buffer->data) {
+ /* get the file desc associated with the perf data file */
+ int fd = perf_data__fd(speq->spe->session->data);
+
+ buffer->data = auxtrace_buffer__get_data(buffer, fd);
+ if (!buffer->data)
+ return -ENOMEM;
+ }
+
+ if (buffer->use_data) {
+ b->len = buffer->use_size;
+ b->buf = buffer->use_data;
+ } else {
+ b->len = buffer->size;
+ b->buf = buffer->data;
+ }
+
+ b->ref_timestamp = buffer->reference;
+
+ if (b->len) {
+ if (old_buffer)
+ auxtrace_buffer__drop_data(old_buffer);
+ speq->old_buffer = buffer;
+ } else {
+ auxtrace_buffer__drop_data(buffer);
+ return arm_spe_get_trace(b, data);
+ }
+
return 0;
}

+static struct arm_spe_queue *arm_spe__alloc_queue(struct arm_spe *spe,
+ unsigned int queue_nr)
+{
+ struct arm_spe_params params = { .get_trace = 0, };
+ struct arm_spe_queue *speq;
+
+ speq = zalloc(sizeof(*speq));
+ if (!speq)
+ return NULL;
+
+ speq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+ if (!speq->event_buf)
+ goto out_free;
+
+ speq->spe = spe;
+ speq->queue_nr = queue_nr;
+ speq->pid = -1;
+ speq->tid = -1;
+ speq->cpu = -1;
+
+ /* params set */
+ params.get_trace = arm_spe_get_trace;
+ params.data = speq;
+
+ /* create new decoder */
+ speq->decoder = arm_spe_decoder_new(&params);
+ if (!speq->decoder)
+ goto out_free;
+
+ return speq;
+
+out_free:
+ zfree(&speq->event_buf);
+ free(speq);
+
+ return NULL;
+}
+
+static inline u8 arm_spe_cpumode(struct arm_spe *spe, uint64_t ip)
+{
+ return ip >= spe->kernel_start ?
+ PERF_RECORD_MISC_KERNEL :
+ PERF_RECORD_MISC_USER;
+}
+
+static void arm_spe_prep_sample(struct arm_spe *spe,
+ struct arm_spe_queue *speq,
+ union perf_event *event,
+ struct perf_sample *sample)
+{
+ if (!spe->timeless_decoding)
+ sample->time = speq->timestamp;
+
+ sample->ip = speq->state->from_ip;
+ sample->cpumode = arm_spe_cpumode(spe, sample->ip);
+ sample->pid = speq->pid;
+ sample->tid = speq->tid;
+ sample->addr = speq->state->to_ip;
+ sample->period = 1;
+ sample->cpu = speq->cpu;
+
+ event->sample.header.type = PERF_RECORD_SAMPLE;
+ event->sample.header.misc = sample->cpumode;
+ event->sample.header.size = sizeof(struct perf_event_header);
+}
+
+static inline int
+arm_spe_deliver_synth_event(struct arm_spe *spe,
+ struct arm_spe_queue *speq __maybe_unused,
+ union perf_event *event,
+ struct perf_sample *sample)
+{
+ int ret;
+
+ ret = perf_session__deliver_synth_event(spe->session, event, sample);
+ if (ret)
+ pr_err("ARM SPE: failed to deliver event, error %d\n", ret);
+
+ return ret;
+}
+
+static int
+arm_spe_synth_spe_events_sample(struct arm_spe_queue *speq,
+ u64 spe_events_id)
+{
+ struct arm_spe *spe = speq->spe;
+ union perf_event *event = speq->event_buf;
+ struct perf_sample sample = { .ip = 0, };
+
+ arm_spe_prep_sample(spe, speq, event, &sample);
+
+ sample.id = spe_events_id;
+ sample.stream_id = spe_events_id;
+
+ return arm_spe_deliver_synth_event(spe, speq, event, &sample);
+}
+
+static int arm_spe_sample(struct arm_spe_queue *speq)
+{
+ const struct arm_spe_state *state = speq->state;
+ struct arm_spe *spe = speq->spe;
+ int err;
+
+ if (!speq->have_sample)
+ return 0;
+
+ speq->have_sample = false;
+
+ if (spe->sample_llc_miss && (state->type & ARM_SPE_LLC_MISS)) {
+ err = arm_spe_synth_spe_events_sample(speq, spe->llc_miss_id);
+ if (err)
+ return err;
+ }
+
+ if (spe->sample_tlb_miss && (state->type & ARM_SPE_TLB_MISS)) {
+ err = arm_spe_synth_spe_events_sample(speq, spe->tlb_miss_id);
+ if (err)
+ return err;
+ }
+
+ if (spe->sample_branch_miss && (state->type & ARM_SPE_BRANCH_MISS)) {
+ err = arm_spe_synth_spe_events_sample(speq,
+ spe->branch_miss_id);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+static int arm_spe_run_decoder(struct arm_spe_queue *speq, u64 *timestamp)
+{
+ const struct arm_spe_state *state = speq->state;
+ struct arm_spe *spe = speq->spe;
+ int err;
+
+ if (!spe->kernel_start)
+ spe->kernel_start = machine__kernel_start(spe->machine);
+
+ while (1) {
+ err = arm_spe_sample(speq);
+ if (err)
+ return err;
+
+ state = arm_spe_decode(speq->decoder);
+ if (state->err) {
+ if (state->err == -ENODATA) {
+ pr_debug("No data or all data has been processed.\n");
+ return 1;
+ }
+ continue;
+ }
+
+ speq->state = state;
+ speq->have_sample = true;
+
+ if (!spe->timeless_decoding && speq->timestamp >= *timestamp) {
+ *timestamp = speq->timestamp;
+ return 0;
+ }
+ }
+
+ return 0;
+}
+
+static int arm_spe__setup_queue(struct arm_spe *spe,
+ struct auxtrace_queue *queue,
+ unsigned int queue_nr)
+{
+ struct arm_spe_queue *speq = queue->priv;
+
+ if (list_empty(&queue->head) || speq)
+ return 0;
+
+ speq = arm_spe__alloc_queue(spe, queue_nr);
+
+ if (!speq)
+ return -ENOMEM;
+
+ queue->priv = speq;
+
+ if (queue->cpu != -1)
+ speq->cpu = queue->cpu;
+
+ speq->tid = queue->tid;
+
+ if (!speq->on_heap) {
+ const struct arm_spe_state *state;
+ int ret;
+
+ if (spe->timeless_decoding)
+ return 0;
+
+retry:
+ state = arm_spe_decode(speq->decoder);
+ if (state->err) {
+ if (state->err == -ENODATA) {
+ pr_debug("queue %u has no timestamp\n",
+ queue_nr);
+ return 0;
+ }
+ goto retry;
+ }
+
+ speq->timestamp = state->timestamp;
+ speq->state = state;
+ speq->have_sample = true;
+ ret = auxtrace_heap__add(&spe->heap, queue_nr, speq->timestamp);
+ if (ret)
+ return ret;
+ speq->on_heap = true;
+ }
+
+ return 0;
+}
+
+static int arm_spe__setup_queues(struct arm_spe *spe)
+{
+ unsigned int i;
+ int ret;
+
+ for (i = 0; i < spe->queues.nr_queues; i++) {
+ ret = arm_spe__setup_queue(spe, &spe->queues.queue_array[i], i);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static int arm_spe__update_queues(struct arm_spe *spe)
+{
+ if (spe->queues.new_data) {
+ spe->queues.new_data = false;
+ return arm_spe__setup_queues(spe);
+ }
+
+ return 0;
+}
+
+static bool arm_spe__is_timeless_decoding(struct arm_spe *spe)
+{
+ struct perf_evsel *evsel;
+ struct perf_evlist *evlist = spe->session->evlist;
+ bool timeless_decoding = true;
+
+ /*
+ * Circle through the list of event and complain if we find one
+ * with the time bit set.
+ */
+ evlist__for_each_entry(evlist, evsel) {
+ if ((evsel->attr.sample_type & PERF_SAMPLE_TIME))
+ timeless_decoding = false;
+ }
+
+ return timeless_decoding;
+}
+
+static void arm_spe_set_pid_tid_cpu(struct arm_spe *spe,
+ struct auxtrace_queue *queue)
+{
+ struct arm_spe_queue *speq = queue->priv;
+
+ if (queue->tid == -1) {
+ speq->tid = machine__get_current_tid(spe->machine, speq->cpu);
+ thread__zput(speq->thread);
+ }
+
+ if ((!speq->thread) && (speq->tid != -1)) {
+ speq->thread = machine__find_thread(spe->machine, -1,
+ speq->tid);
+ }
+
+ if (speq->thread) {
+ speq->pid = speq->thread->pid_;
+ if (queue->cpu == -1)
+ speq->cpu = speq->thread->cpu;
+ }
+}
+
+static int arm_spe_process_queues(struct arm_spe *spe, u64 timestamp)
+{
+ unsigned int queue_nr;
+ u64 ts;
+ int ret;
+
+ while (1) {
+ struct auxtrace_queue *queue;
+ struct arm_spe_queue *speq;
+
+ if (!spe->heap.heap_cnt)
+ return 0;
+
+ if (spe->heap.heap_array[0].ordinal >= timestamp)
+ return 0;
+
+ queue_nr = spe->heap.heap_array[0].queue_nr;
+ queue = &spe->queues.queue_array[queue_nr];
+ speq = queue->priv;
+
+ auxtrace_heap__pop(&spe->heap);
+
+ if (spe->heap.heap_cnt) {
+ ts = spe->heap.heap_array[0].ordinal + 1;
+ if (ts > timestamp)
+ ts = timestamp;
+ } else {
+ ts = timestamp;
+ }
+
+ arm_spe_set_pid_tid_cpu(spe, queue);
+
+ ret = arm_spe_run_decoder(speq, &ts);
+ if (ret < 0) {
+ auxtrace_heap__add(&spe->heap, queue_nr, ts);
+ return ret;
+ }
+
+ if (!ret) {
+ ret = auxtrace_heap__add(&spe->heap, queue_nr, ts);
+ if (ret < 0)
+ return ret;
+ } else {
+ speq->on_heap = false;
+ }
+ }
+
+ return 0;
+}
+
+static int arm_spe_process_timeless_queues(struct arm_spe *spe, pid_t tid,
+ u64 time_)
+{
+ struct auxtrace_queues *queues = &spe->queues;
+ unsigned int i;
+ u64 ts = 0;
+
+ for (i = 0; i < queues->nr_queues; i++) {
+ struct auxtrace_queue *queue = &spe->queues.queue_array[i];
+ struct arm_spe_queue *speq = queue->priv;
+
+ if (speq && (tid == -1 || speq->tid == tid)) {
+ speq->time = time_;
+ arm_spe_set_pid_tid_cpu(spe, queue);
+ arm_spe_run_decoder(speq, &ts);
+ }
+ }
+ return 0;
+}
+
+static int arm_spe_process_event(struct perf_session *session,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct perf_tool *tool)
+{
+ int err = 0;
+ u64 timestamp;
+ struct arm_spe *spe = container_of(session->auxtrace,
+ struct arm_spe, auxtrace);
+
+ if (dump_trace)
+ return 0;
+
+ if (!tool->ordered_events) {
+ pr_err("CoreSight SPE Trace requires ordered events\n");
+ return -EINVAL;
+ }
+
+ if (sample->time && (sample->time != (u64) -1))
+ timestamp = sample->time;
+ else
+ timestamp = 0;
+
+ if (timestamp || spe->timeless_decoding) {
+ err = arm_spe__update_queues(spe);
+ if (err)
+ return err;
+ }
+
+ if (spe->timeless_decoding) {
+ if (event->header.type == PERF_RECORD_EXIT) {
+ err = arm_spe_process_timeless_queues(spe,
+ event->fork.tid,
+ sample->time);
+ }
+ } else if (timestamp) {
+ if (event->header.type == PERF_RECORD_EXIT) {
+ err = arm_spe_process_queues(spe, timestamp);
+ if (err)
+ return err;
+ }
+ }
+
+ return err;
+}
+
static int arm_spe_process_auxtrace_event(struct perf_session *session,
union perf_event *event,
struct perf_tool *tool __maybe_unused)
{
struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
auxtrace);
- struct auxtrace_buffer *buffer;
- off_t data_offset;
- int fd = perf_data__fd(session->data);
- int err;

- if (perf_data__is_pipe(session->data)) {
- data_offset = 0;
- } else {
- data_offset = lseek(fd, 0, SEEK_CUR);
- if (data_offset == -1)
- return -errno;
- }
+ if (!spe->data_queued) {
+ struct auxtrace_buffer *buffer;
+ off_t data_offset;
+ int fd = perf_data__fd(session->data);
+ int err;

- err = auxtrace_queues__add_event(&spe->queues, session, event,
- data_offset, &buffer);
- if (err)
- return err;
-
- /* Dump here now we have copied a piped trace out of the pipe */
- if (dump_trace) {
- if (auxtrace_buffer__get_data(buffer, fd)) {
- arm_spe_dump_event(spe, buffer->data,
- buffer->size);
- auxtrace_buffer__put_data(buffer);
+ if (perf_data__is_pipe(session->data)) {
+ data_offset = 0;
+ } else {
+ data_offset = lseek(fd, 0, SEEK_CUR);
+ if (data_offset == -1)
+ return -errno;
+ }
+
+ err = auxtrace_queues__add_event(&spe->queues, session, event,
+ data_offset, &buffer);
+ if (err)
+ return err;
+
+ /* Dump here now we have copied a piped trace out of the pipe */
+ if (dump_trace) {
+ if (auxtrace_buffer__get_data(buffer, fd)) {
+ arm_spe_dump_event(spe, buffer->data,
+ buffer->size);
+ auxtrace_buffer__put_data(buffer);
+ }
}
}

@@ -140,6 +610,25 @@ static int arm_spe_process_auxtrace_event(struct perf_session *session,
static int arm_spe_flush(struct perf_session *session __maybe_unused,
struct perf_tool *tool __maybe_unused)
{
+ struct arm_spe *spe = container_of(session->auxtrace, struct arm_spe,
+ auxtrace);
+ int ret;
+
+ if (dump_trace)
+ return 0;
+
+ if (!tool->ordered_events)
+ return -EINVAL;
+
+ ret = arm_spe__update_queues(spe);
+ if (ret < 0)
+ return ret;
+
+ if (spe->timeless_decoding)
+ return arm_spe_process_timeless_queues(spe, -1,
+ MAX_TIMESTAMP - 1);
+
+ return arm_spe_process_queues(spe, MAX_TIMESTAMP);
return 0;
}

@@ -149,6 +638,9 @@ static void arm_spe_free_queue(void *priv)

if (!speq)
return;
+ thread__zput(speq->thread);
+ arm_spe_decoder_free(speq->decoder);
+ zfree(&speq->event_buf);
free(speq);
}

@@ -189,6 +681,137 @@ static void arm_spe_print_info(u64 *arr)
fprintf(stdout, arm_spe_info_fmts[ARM_SPE_PMU_TYPE], arr[ARM_SPE_PMU_TYPE]);
}

+struct arm_spe_synth {
+ struct perf_tool dummy_tool;
+ struct perf_session *session;
+};
+
+static int arm_spe_event_synth(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample __maybe_unused,
+ struct machine *machine __maybe_unused)
+{
+ struct arm_spe_synth *arm_spe_synth =
+ container_of(tool, struct arm_spe_synth, dummy_tool);
+
+ return perf_session__deliver_synth_event(arm_spe_synth->session,
+ event, NULL);
+}
+
+static int arm_spe_synth_event(struct perf_session *session,
+ struct perf_event_attr *attr, u64 id)
+{
+ struct arm_spe_synth arm_spe_synth;
+
+ memset(&arm_spe_synth, 0, sizeof(struct arm_spe_synth));
+ arm_spe_synth.session = session;
+
+ return perf_event__synthesize_attr(&arm_spe_synth.dummy_tool, attr, 1,
+ &id, arm_spe_event_synth);
+}
+
+static void arm_spe_set_event_name(struct perf_evlist *evlist, u64 id,
+ const char *name)
+{
+ struct perf_evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel) {
+ if (evsel->id && evsel->id[0] == id) {
+ if (evsel->name)
+ zfree(&evsel->name);
+ evsel->name = strdup(name);
+ break;
+ }
+ }
+}
+
+static int
+arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
+{
+ struct perf_evlist *evlist = session->evlist;
+ struct perf_evsel *evsel;
+ struct perf_event_attr attr;
+ bool found = false;
+ u64 id;
+ int err;
+
+ evlist__for_each_entry(evlist, evsel) {
+ if (evsel->attr.type == spe->pmu_type) {
+ found = true;
+ break;
+ }
+ }
+
+ if (!found) {
+ pr_debug("No selected events with CoreSight Trace data\n");
+ return 0;
+ }
+
+ memset(&attr, 0, sizeof(struct perf_event_attr));
+ attr.size = sizeof(struct perf_event_attr);
+ attr.type = PERF_TYPE_HARDWARE;
+ attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
+ attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+ PERF_SAMPLE_PERIOD;
+ if (spe->timeless_decoding)
+ attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+ else
+ attr.sample_type |= PERF_SAMPLE_TIME;
+
+ attr.exclude_user = evsel->attr.exclude_user;
+ attr.exclude_kernel = evsel->attr.exclude_kernel;
+ attr.exclude_hv = evsel->attr.exclude_hv;
+ attr.exclude_host = evsel->attr.exclude_host;
+ attr.exclude_guest = evsel->attr.exclude_guest;
+ attr.sample_id_all = evsel->attr.sample_id_all;
+ attr.read_format = evsel->attr.read_format;
+
+ /* create new id val to be a fixed offset from evsel id */
+ id = evsel->id[0] + 1000000000;
+
+ if (!id)
+ id = 1;
+
+ /* spe events set */
+ if (spe->synth_opts.llc_miss) {
+ spe->sample_llc_miss = true;
+
+ /* llc-miss */
+ err = arm_spe_synth_event(session, &attr, id);
+ if (err)
+ return err;
+ spe->llc_miss_id = id;
+ arm_spe_set_event_name(evlist, id, "llc-miss");
+ id += 1;
+ }
+
+ if (spe->synth_opts.tlb_miss) {
+ spe->sample_tlb_miss = true;
+
+ /* tlb-miss */
+ err = arm_spe_synth_event(session, &attr, id);
+ if (err)
+ return err;
+ spe->tlb_miss_id = id;
+ arm_spe_set_event_name(evlist, id, "tlb-miss");
+ id += 1;
+ }
+
+ if (spe->synth_opts.branch_miss) {
+ spe->sample_branch_miss = true;
+
+ /* branch-miss */
+ err = arm_spe_synth_event(session, &attr, id);
+ if (err)
+ return err;
+ spe->branch_miss_id = id;
+ arm_spe_set_event_name(evlist, id, "branch-miss");
+ id += 1;
+ }
+
+ return 0;
+}
+
int arm_spe_process_auxtrace_info(union perf_event *event,
struct perf_session *session)
{
@@ -197,6 +820,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
struct arm_spe *spe;
int err;

+
if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
min_sz)
return -EINVAL;
@@ -214,6 +838,7 @@ int arm_spe_process_auxtrace_info(union perf_event *event,
spe->auxtrace_type = auxtrace_info->type;
spe->pmu_type = auxtrace_info->priv[ARM_SPE_PMU_TYPE];

+ spe->timeless_decoding = arm_spe__is_timeless_decoding(spe);
spe->auxtrace.process_event = arm_spe_process_event;
spe->auxtrace.process_auxtrace_event = arm_spe_process_auxtrace_event;
spe->auxtrace.flush_events = arm_spe_flush;
@@ -223,8 +848,30 @@ int arm_spe_process_auxtrace_info(union perf_event *event,

arm_spe_print_info(&auxtrace_info->priv[0]);

+ if (dump_trace)
+ return 0;
+
+ if (session->arm_spe_synth_opts && session->arm_spe_synth_opts->set)
+ spe->synth_opts = *session->arm_spe_synth_opts;
+ else
+ arm_spe_synth_opts__set_default(&spe->synth_opts);
+
+ err = arm_spe_synth_events(spe, session);
+ if (err)
+ goto err_free_queues;
+
+ err = auxtrace_queues__process_index(&spe->queues, session);
+ if (err)
+ goto err_free_queues;
+
+ if (spe->queues.populated)
+ spe->data_queued = true;
+
return 0;

+err_free_queues:
+ auxtrace_queues__free(&spe->queues);
+ session->auxtrace = NULL;
err_free:
free(spe);
return err;
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index ec0af36..3884cc4 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1137,6 +1137,51 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
return -EINVAL;
}

+void arm_spe_synth_opts__set_default(struct arm_spe_synth_opts *synth_opts)
+{
+ synth_opts->llc_miss = true;
+ synth_opts->tlb_miss = true;
+ synth_opts->branch_miss = true;
+}
+
+int arm_spe_parse_synth_opts(const struct option *opt, const char *str,
+ int unset __maybe_unused)
+{
+ struct arm_spe_synth_opts *synth_opts = opt->value;
+ const char *p;
+
+ synth_opts->set = true;
+
+ if (!str) {
+ arm_spe_synth_opts__set_default(synth_opts);
+ return 0;
+ }
+
+ for (p = str; *p;) {
+ switch (*p++) {
+ case 'l':
+ synth_opts->llc_miss = true;
+ break;
+ case 't':
+ synth_opts->tlb_miss = true;
+ break;
+ case 'b':
+ synth_opts->branch_miss = true;
+ break;
+ case ' ':
+ case ',':
+ break;
+ default:
+ goto out_err;
+ }
+ }
+
+ return 0;
+
+out_err:
+ pr_err("Bad ARM SPE Tracing options '%s'\n", str);
+ return -EINVAL;
+}
static const char * const auxtrace_error_type_name[] = {
[PERF_AUXTRACE_ERROR_ITRACE] = "instruction trace",
};
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index e9b4c5e..7697788 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -105,6 +105,20 @@ struct itrace_synth_opts {
};

/**
+ * struct arm_spe_synth_opts - ARM SPE tracing synthesis options.
+ * @set: indicates whether or not options have been set
+ * @llc_miss: whether to synthesize last level cache miss events
+ * @tlb_miss: whether to synthesize TLB miss events
+ * @branch_miss: whether to synthesize Branch miss events
+ */
+struct arm_spe_synth_opts {
+ bool set;
+ bool llc_miss;
+ bool tlb_miss;
+ bool branch_miss;
+};
+
+/**
* struct auxtrace_index_entry - indexes a AUX area tracing event within a
* perf.data file.
* @file_offset: offset within the perf.data file
@@ -531,6 +545,10 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
bool no_sample);

+int arm_spe_parse_synth_opts(const struct option *opt, const char *str,
+ int unset);
+void arm_spe_synth_opts__set_default(struct arm_spe_synth_opts *synth_opts);
+
size_t perf_event__fprintf_auxtrace_error(union perf_event *event, FILE *fp);
void perf_session__auxtrace_error_inc(struct perf_session *session,
union perf_event *event);
@@ -670,6 +688,15 @@ int itrace_parse_synth_opts(const struct option *opt __maybe_unused,
}

static inline
+int arm_spe_parse_synth_opts(const struct option *opt __maybe_unused,
+ const char *str __maybe_unused,
+ int unset __maybe_unused)
+{
+ pr_err("ARM SPE area tracing not supported\n");
+ return -EINVAL;
+}
+
+static inline
int auxtrace_parse_snapshot_options(struct auxtrace_record *itr __maybe_unused,
struct record_opts *opts __maybe_unused,
const char *str)
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 863dbad..ccaed68 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -19,6 +19,7 @@ struct thread;

struct auxtrace;
struct itrace_synth_opts;
+struct arm_spe_synth_opts;

struct perf_session {
struct perf_header header;
@@ -26,6 +27,7 @@ struct perf_session {
struct perf_evlist *evlist;
struct auxtrace *auxtrace;
struct itrace_synth_opts *itrace_synth_opts;
+ struct arm_spe_synth_opts *arm_spe_synth_opts;
struct list_head auxtrace_index;
struct trace_event tevent;
struct time_conv_event time_conv;
--
2.7.4

2019-08-02 14:28:23

by Tan Xiaojun

[permalink] [raw]
Subject: [RFC PATCH 3/3] perf report: add --spe options for arm-spe

The previous patch added support in "perf report" for some arm-spe
events(llc-miss, tlb-miss, branch-miss). This patch adds their help
instructions.

Signed-off-by: Tan Xiaojun <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 987261d..d998d4b 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -445,6 +445,15 @@ include::itrace.txt[]

To disable decoding entirely, use --no-itrace.

+--spe::
+ Options for decoding arm-spe tracing data. The options are:
+
+ l synthesize llc miss events
+ t synthesize tlb miss events
+ b synthesize branch miss events
+
+ The default is all events i.e. the same as --spe=ltb
+
--full-source-path::
Show the full path for source files for srcline output.

--
2.7.4

2019-08-08 21:17:23

by Jeremy Linton

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

Hi,

First thanks for posting this!

I ran this on our DAWN platform and it does what it says. Its a pretty
reasonable start, but I get -1's in the command row rather than "dd" (or
similar) and this also results in [unknown] for the shared object and
most userspace addresses. This is quite possibly something I'm not doing
right, but I didn't spend a lot of time testing/debugging it.

I did a quick glance at the code to, and had a couple comments, although
I'm not a perf tool expert.


On 8/2/19 4:40 AM, Tan Xiaojun wrote:
> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
> Profiling Extensions (SPE) support") is merged, "perf record" and
> "perf report --dump-raw-trace" have been supported. However, the
> raw data that is dumped cannot be used without parsing.
>
> This patch is to improve the "perf report" support for spe, and
> further process the data. Currently, support for the three events
> of llc-miss, tlb-miss, and branch-miss is added.
>
> Example usage:
>
> --------------------------------------------------------------------
> ...
> 37.84% 37.84% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
> 16.22% 16.22% dd [kernel.kallsyms] [k] copy_page
> 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma
> 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap
> 5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range
> 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x
> 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data
> 2.70% 2.70% dd [kernel.kallsyms] [k] __remove_shared_vm_struct.isra.1
> 2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free
> 2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
> 2.70% 2.70% dd dd [.] 0x000000000000d9d8
> 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object
> 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork
> 2.70% 2.70% dd libc-2.28.so [.] _dl_addr
>
> 12.50% 12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry
> 12.50% 12.50% dd [kernel.kallsyms] [k] kmem_cache_free
> 12.50% 12.50% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
> 12.50% 12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
> 12.50% 12.50% dd dd [.] 0x000000000000d9d8
> 12.50% 12.50% dd libc-2.28.so [.] __unregister_atfork
> 12.50% 12.50% dd libc-2.28.so [.] _nl_intern_locale_data
> 12.50% 12.50% dd libc-2.28.so [.] vfprintf
>
> 16.67% 16.67% dd libc-2.28.so [.] read_alias_file
> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user
> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user
> 8.33% 8.33% dd [kernel.kallsyms] [k] lookup_fast
> 8.33% 8.33% dd [kernel.kallsyms] [k] strncpy_from_user
> 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x
> 8.33% 8.33% dd ld-2.28.so [.] check_match
> 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l
> 8.33% 8.33% dd libc-2.28.so [.] _dl_addr
> 8.33% 8.33% dd libc-2.28.so [.] _int_malloc
> 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data
>
> --------------------------------------------------------------------
>
> After that, more analysis and processing of the raw data of spe
> will be done.
>
> Signed-off-by: Tan Xiaojun <[email protected]>
> ---
> tools/perf/builtin-report.c | 5 +
> tools/perf/util/arm-spe-decoder/Build | 2 +-
> tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 214 ++++++
> tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 51 ++
> .../util/arm-spe-decoder/arm-spe-pkt-decoder.h | 2 +
> tools/perf/util/arm-spe.c | 715 ++++++++++++++++++++-
> tools/perf/util/auxtrace.c | 45 ++
> tools/perf/util/auxtrace.h | 27 +
> tools/perf/util/session.h | 2 +
> 9 files changed, 1028 insertions(+), 35 deletions(-)
> create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index abf0b9b..fadc8eb 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -1007,6 +1007,7 @@ int cmd_report(int argc, const char **argv)
> {
> struct perf_session *session;
> struct itrace_synth_opts itrace_synth_opts = { .set = 0, };
> + struct arm_spe_synth_opts arm_spe_synth_opts;
> struct stat st;
> bool has_br_stack = false;
> int branch_mode = -1;
> @@ -1165,6 +1166,9 @@ int cmd_report(int argc, const char **argv)
> OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
> "Instruction Tracing options\n" ITRACE_HELP,
> itrace_parse_synth_opts),
> + OPT_CALLBACK_OPTARG(0, "spe", &arm_spe_synth_opts, NULL, "spe opts",
> + "ARM SPE Tracing options",
> + arm_spe_parse_synth_opts),
> OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename,
> "Show full source file name path for source lines"),
> OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
> @@ -1266,6 +1270,7 @@ int cmd_report(int argc, const char **argv)
> }
>
> session->itrace_synth_opts = &itrace_synth_opts;
> + session->arm_spe_synth_opts = &arm_spe_synth_opts;
>
> report.session = session;
>
> diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
> index 16efbc2..f8dae13 100644
> --- a/tools/perf/util/arm-spe-decoder/Build
> +++ b/tools/perf/util/arm-spe-decoder/Build
> @@ -1 +1 @@
> -perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
> +perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> new file mode 100644
> index 0000000..8008375
> --- /dev/null
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> @@ -0,0 +1,214 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * arm_spe_decoder.c: ARM SPE support
> + */
> +
> +#ifndef _GNU_SOURCE
> +#define _GNU_SOURCE
> +#endif
> +#include <stdlib.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <stdint.h>
> +#include <inttypes.h>
> +#include <linux/compiler.h>
> +#include <linux/zalloc.h>
> +
> +#include "../util.h"
> +#include "../auxtrace.h"
> +
> +#include "arm-spe-pkt-decoder.h"
> +#include "arm-spe-decoder.h"
> +
> +struct arm_spe_decoder {
> + int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
> + void *data;
> + struct arm_spe_state state;
> + const unsigned char *buf;
> + size_t len;
> + uint64_t pos;
> + struct arm_spe_pkt packet;
> + int pkt_step;
> + int pkt_len;
> + int last_packet_type;
> +
> + uint64_t last_ip;
> + uint64_t ip;
> + uint64_t timestamp;
> + uint64_t sample_timestamp;
> + const unsigned char *next_buf;
> + size_t next_len;
> + unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
> +};
> +
> +static uint64_t arm_spe_calc_ip(uint64_t payload)
> +{
> + uint64_t ip = (payload & ~(0xffULL << 56));
> +
> + /* fill high 8 bits for kernel virtual address */
> + if (ip & 0x1000000000000ULL)

It might be better to use VA_START here if possible.

> + ip |= (uint64_t)0xff00000000000000ULL;
> +
> + return ip;
> +}
> +
> +struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
> +{
> + struct arm_spe_decoder *decoder;
> +
> + if (!params->get_trace)
> + return NULL;
> +
> + decoder = zalloc(sizeof(struct arm_spe_decoder));
> + if (!decoder)
> + return NULL;
> +
> + decoder->get_trace = params->get_trace;
> + decoder->data = params->data;
> +
> + return decoder;
> +}
> +
> +void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
> +{
> + free(decoder);
> +}
> +
> +static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
> +{
> + decoder->pkt_len = 1;
> + decoder->pkt_step = 1;
> + pr_debug("ERROR: Bad packet\n");
> +
> + return -EBADMSG;
> +}
> +
> +
> +static int arm_spe_get_data(struct arm_spe_decoder *decoder)
> +{
> + struct arm_spe_buffer buffer = { .buf = 0, };
> + int ret;
> +
> + decoder->pkt_step = 0;
> +
> + pr_debug("Getting more data\n");
> + ret = decoder->get_trace(&buffer, decoder->data);
> + if (ret)
> + return ret;
> +
> + decoder->buf = buffer.buf;
> + decoder->len = buffer.len;
> + if (!decoder->len) {
> + pr_debug("No more data\n");
> + return -ENODATA;
> + }
> +
> + return 0;
> +}
> +
> +static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
> +{
> + return arm_spe_get_data(decoder);
> +}
> +
> +static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
> +{
> + int ret;
> +
> + decoder->last_packet_type = decoder->packet.type;
> +
> + do {
> + decoder->pos += decoder->pkt_step;
> + decoder->buf += decoder->pkt_step;
> + decoder->len -= decoder->pkt_step;
> +
> +
> + if (!decoder->len) {
> + ret = arm_spe_get_next_data(decoder);
> + if (ret)
> + return ret;
> + }
> +
> + ret = arm_spe_get_packet(decoder->buf, decoder->len,
> + &decoder->packet);
> + if (ret <= 0)
> + return arm_spe_bad_packet(decoder);
> +
> + decoder->pkt_len = ret;
> + decoder->pkt_step = ret;
> + } while (decoder->packet.type == ARM_SPE_PAD);
> +
> + return 0;
> +}
> +
> +static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
> +{
> + int err;
> + int idx;
> + uint64_t payload;
> +
> + while (1) {
> + err = arm_spe_get_next_packet(decoder);
> + if (err)
> + return err;
> +
> + idx = decoder->packet.index;
> + payload = decoder->packet.payload;
> +
> + switch (decoder->packet.type) {
> + case ARM_SPE_TIMESTAMP:
> + decoder->sample_timestamp = payload;
> + return 0;
> + case ARM_SPE_END:
> + decoder->sample_timestamp = 0;
> + return 0;
> + case ARM_SPE_ADDRESS:
> + decoder->ip = arm_spe_calc_ip(payload);
> + if (idx == 0)
> + decoder->state.from_ip = decoder->ip;
> + else if (idx == 1)
> + decoder->state.to_ip = decoder->ip;
> + break;
> + case ARM_SPE_COUNTER:
> + break;
> + case ARM_SPE_CONTEXT:
> + break;
> + case ARM_SPE_OP_TYPE:
> + break;
> + case ARM_SPE_EVENTS:
> + if (payload & 0x20)
> + decoder->state.type |= ARM_SPE_TLB_MISS;
> + if (payload & 0x80)
> + decoder->state.type |= ARM_SPE_BRANCH_MISS;
> + if (idx > 1 && (payload & 0x200))
> + decoder->state.type |= ARM_SPE_LLC_MISS;
> +
> + break;
> + case ARM_SPE_DATA_SOURCE:
> + break;
> + case ARM_SPE_BAD:
> + break;
> + case ARM_SPE_PAD:
> + break;
> + default:
> + pr_err("Get Packet Error!\n");
> + return -ENOSYS;
> + }
> + }
> +}

This code looks very similar to arm_spe_pkt_desc(), I can't help but
think they should be consolidated in some way. If nothing else the magic
0x20, 0x80, etc ARM_SPE_EVENTS should be defined somewhere and shared.


> +
> +const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
> +{
> + int err;
> +
> + decoder->state.type = 0;
> +
> + err = arm_spe_walk_trace(decoder);
> + if (err)
> + decoder->state.err = err;
> +
> + decoder->state.timestamp = decoder->sample_timestamp;
> +
> + return &decoder->state;

(trimming remainder)

2019-08-09 06:13:53

by Tan Xiaojun

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

On 2019/8/9 5:00, Jeremy Linton wrote:
> Hi,
>
> First thanks for posting this!
>
> I ran this on our DAWN platform and it does what it says. Its a pretty reasonable start, but I get -1's in the command row rather than "dd" (or similar) and this also results in [unknown] for the shared object and most userspace addresses. This is quite possibly something I'm not doing right, but I didn't spend a lot of time testing/debugging it.
>
> I did a quick glance at the code to, and had a couple comments, although I'm not a perf tool expert.
>

Hi,

Thank you for your reply.

I have only recently started working on this aspect of the perf tool, so your reply is very important to me.

I need to be sorry, my example here is not complete, until you said that I found that I only posted a part of the example. The complete example is as follows:

Example usage:

# perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero of=/dev/null count=10000
# perf report

--------------------------------------------------------------------
...
# Samples: 37 of event 'llc-miss'
# Event count (approx.): 37
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ....................................
#
37.84% 37.84% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
16.22% 16.22% dd [kernel.kallsyms] [k] copy_page
5.41% 5.41% dd [kernel.kallsyms] [k] find_vma
5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap
5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range
5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x
5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data
2.70% 2.70% dd [kernel.kallsyms] [k] __remove_shared_vm_struct.isra.1
2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free
2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
2.70% 2.70% dd dd [.] 0x000000000000d9d8
2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object
2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork
2.70% 2.70% dd libc-2.28.so [.] _dl_addr


# Samples: 8 of event 'tlb-miss'
# Event count (approx.): 8
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. .................................
#
12.50% 12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry
12.50% 12.50% dd [kernel.kallsyms] [k] kmem_cache_free
12.50% 12.50% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
12.50% 12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
12.50% 12.50% dd dd [.] 0x000000000000d9d8
12.50% 12.50% dd libc-2.28.so [.] __unregister_atfork
12.50% 12.50% dd libc-2.28.so [.] _nl_intern_locale_data
12.50% 12.50% dd libc-2.28.so [.] vfprintf


# Samples: 12 of event 'branch-miss'
# Event count (approx.): 12
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ..........................
#
16.67% 16.67% dd libc-2.28.so [.] read_alias_file
8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user
8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user
8.33% 8.33% dd [kernel.kallsyms] [k] lookup_fast
8.33% 8.33% dd [kernel.kallsyms] [k] strncpy_from_user
8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x
8.33% 8.33% dd ld-2.28.so [.] check_match
8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l
8.33% 8.33% dd libc-2.28.so [.] _dl_addr
8.33% 8.33% dd libc-2.28.so [.] _int_malloc
8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data



>
> On 8/2/19 4:40 AM, Tan Xiaojun wrote:
>> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
>> Profiling Extensions (SPE) support") is merged, "perf record" and
>> "perf report --dump-raw-trace" have been supported. However, the
>> raw data that is dumped cannot be used without parsing.
>>
>> This patch is to improve the "perf report" support for spe, and
>> further process the data. Currently, support for the three events
>> of llc-miss, tlb-miss, and branch-miss is added.
>>
>> Example usage:
>>
>> --------------------------------------------------------------------
>> ...
>>      37.84%    37.84%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>      16.22%    16.22%  dd       [kernel.kallsyms]  [k] copy_page
>>       5.41%     5.41%  dd       [kernel.kallsyms]  [k] find_vma
>>       5.41%     5.41%  dd       [kernel.kallsyms]  [k] perf_event_mmap
>>       5.41%     5.41%  dd       [kernel.kallsyms]  [k] zap_pte_range
>>       5.41%     5.41%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
>>       5.41%     5.41%  dd       libc-2.28.so       [.] _nl_intern_locale_data
>>       2.70%     2.70%  dd       [kernel.kallsyms]  [k] __remove_shared_vm_struct.isra.1
>>       2.70%     2.70%  dd       [kernel.kallsyms]  [k] kmem_cache_free
>>       2.70%     2.70%  dd       [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>       2.70%     2.70%  dd       dd                 [.] 0x000000000000d9d8
>>       2.70%     2.70%  dd       ld-2.28.so         [.] _dl_relocate_object
>>       2.70%     2.70%  dd       libc-2.28.so       [.] __unregister_atfork
>>       2.70%     2.70%  dd       libc-2.28.so       [.] _dl_addr
>>
>>      12.50%    12.50%  dd       [kernel.kallsyms]  [k] __audit_syscall_entry
>>      12.50%    12.50%  dd       [kernel.kallsyms]  [k] kmem_cache_free
>>      12.50%    12.50%  dd       [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>      12.50%    12.50%  dd       [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>      12.50%    12.50%  dd       dd                 [.] 0x000000000000d9d8
>>      12.50%    12.50%  dd       libc-2.28.so       [.] __unregister_atfork
>>      12.50%    12.50%  dd       libc-2.28.so       [.] _nl_intern_locale_data
>>      12.50%    12.50%  dd       libc-2.28.so       [.] vfprintf
>>
>>      16.67%    16.67%  dd       libc-2.28.so       [.] read_alias_file
>>       8.33%     8.33%  dd       [kernel.kallsyms]  [k] __arch_copy_from_user
>>       8.33%     8.33%  dd       [kernel.kallsyms]  [k] __arch_copy_to_user
>>       8.33%     8.33%  dd       [kernel.kallsyms]  [k] lookup_fast
>>       8.33%     8.33%  dd       [kernel.kallsyms]  [k] strncpy_from_user
>>       8.33%     8.33%  dd       ld-2.28.so         [.] _dl_lookup_symbol_x
>>       8.33%     8.33%  dd       ld-2.28.so         [.] check_match
>>       8.33%     8.33%  dd       libc-2.28.so       [.] __GI___printf_fp_l
>>       8.33%     8.33%  dd       libc-2.28.so       [.] _dl_addr
>>       8.33%     8.33%  dd       libc-2.28.so       [.] _int_malloc
>>       8.33%     8.33%  dd       libc-2.28.so       [.] _nl_intern_locale_data
>>
>> --------------------------------------------------------------------
>>
>> After that, more analysis and processing of the raw data of spe
>> will be done.
>>
>> Signed-off-by: Tan Xiaojun <[email protected]>
>> ---
>>   tools/perf/builtin-report.c                        |   5 +
>>   tools/perf/util/arm-spe-decoder/Build              |   2 +-
>>   tools/perf/util/arm-spe-decoder/arm-spe-decoder.c  | 214 ++++++
>>   tools/perf/util/arm-spe-decoder/arm-spe-decoder.h  |  51 ++
>>   .../util/arm-spe-decoder/arm-spe-pkt-decoder.h     |   2 +
>>   tools/perf/util/arm-spe.c                          | 715 ++++++++++++++++++++-
>>   tools/perf/util/auxtrace.c                         |  45 ++
>>   tools/perf/util/auxtrace.h                         |  27 +
>>   tools/perf/util/session.h                          |   2 +
>>   9 files changed, 1028 insertions(+), 35 deletions(-)
>>   create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>   create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>>
>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>> index abf0b9b..fadc8eb 100644
>> --- a/tools/perf/builtin-report.c
>> +++ b/tools/perf/builtin-report.c
>> @@ -1007,6 +1007,7 @@ int cmd_report(int argc, const char **argv)
>>   {
>>       struct perf_session *session;
>>       struct itrace_synth_opts itrace_synth_opts = { .set = 0, };
>> +    struct arm_spe_synth_opts arm_spe_synth_opts;
>>       struct stat st;
>>       bool has_br_stack = false;
>>       int branch_mode = -1;
>> @@ -1165,6 +1166,9 @@ int cmd_report(int argc, const char **argv)
>>       OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
>>                   "Instruction Tracing options\n" ITRACE_HELP,
>>                   itrace_parse_synth_opts),
>> +    OPT_CALLBACK_OPTARG(0, "spe", &arm_spe_synth_opts, NULL, "spe opts",
>> +                "ARM SPE Tracing options",
>> +                arm_spe_parse_synth_opts),
>>       OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename,
>>               "Show full source file name path for source lines"),
>>       OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
>> @@ -1266,6 +1270,7 @@ int cmd_report(int argc, const char **argv)
>>       }
>>         session->itrace_synth_opts = &itrace_synth_opts;
>> +    session->arm_spe_synth_opts = &arm_spe_synth_opts;
>>         report.session = session;
>>   diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
>> index 16efbc2..f8dae13 100644
>> --- a/tools/perf/util/arm-spe-decoder/Build
>> +++ b/tools/perf/util/arm-spe-decoder/Build
>> @@ -1 +1 @@
>> -perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
>> +perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>> new file mode 100644
>> index 0000000..8008375
>> --- /dev/null
>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>> @@ -0,0 +1,214 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * arm_spe_decoder.c: ARM SPE support
>> + */
>> +
>> +#ifndef _GNU_SOURCE
>> +#define _GNU_SOURCE
>> +#endif
>> +#include <stdlib.h>
>> +#include <stdbool.h>
>> +#include <string.h>
>> +#include <errno.h>
>> +#include <stdint.h>
>> +#include <inttypes.h>
>> +#include <linux/compiler.h>
>> +#include <linux/zalloc.h>
>> +
>> +#include "../util.h"
>> +#include "../auxtrace.h"
>> +
>> +#include "arm-spe-pkt-decoder.h"
>> +#include "arm-spe-decoder.h"
>> +
>> +struct arm_spe_decoder {
>> +    int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
>> +    void *data;
>> +    struct arm_spe_state state;
>> +    const unsigned char *buf;
>> +    size_t len;
>> +    uint64_t pos;
>> +    struct arm_spe_pkt packet;
>> +    int pkt_step;
>> +    int pkt_len;
>> +    int last_packet_type;
>> +
>> +    uint64_t last_ip;
>> +    uint64_t ip;
>> +    uint64_t timestamp;
>> +    uint64_t sample_timestamp;
>> +    const unsigned char *next_buf;
>> +    size_t next_len;
>> +    unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
>> +};
>> +
>> +static uint64_t arm_spe_calc_ip(uint64_t payload)
>> +{
>> +    uint64_t ip = (payload & ~(0xffULL << 56));
>> +
>> +    /* fill high 8 bits for kernel virtual address */
>> +    if (ip & 0x1000000000000ULL)
>
> It might be better to use VA_START here if possible.
>

Yes, it's better, but I don't know how to use VA_START in user mode code. So I wrote it directly.

>> +        ip |= (uint64_t)0xff00000000000000ULL;
>> +
>> +    return ip;
>> +}
>> +
>> +struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
>> +{
>> +    struct arm_spe_decoder *decoder;
>> +
>> +    if (!params->get_trace)
>> +        return NULL;
>> +
>> +    decoder = zalloc(sizeof(struct arm_spe_decoder));
>> +    if (!decoder)
>> +        return NULL;
>> +
>> +    decoder->get_trace          = params->get_trace;
>> +    decoder->data               = params->data;
>> +
>> +    return decoder;
>> +}
>> +
>> +void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
>> +{
>> +    free(decoder);
>> +}
>> +
>> +static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
>> +{
>> +    decoder->pkt_len = 1;
>> +    decoder->pkt_step = 1;
>> +    pr_debug("ERROR: Bad packet\n");
>> +
>> +    return -EBADMSG;
>> +}
>> +
>> +
>> +static int arm_spe_get_data(struct arm_spe_decoder *decoder)
>> +{
>> +    struct arm_spe_buffer buffer = { .buf = 0, };
>> +    int ret;
>> +
>> +    decoder->pkt_step = 0;
>> +
>> +    pr_debug("Getting more data\n");
>> +    ret = decoder->get_trace(&buffer, decoder->data);
>> +    if (ret)
>> +        return ret;
>> +
>> +    decoder->buf = buffer.buf;
>> +    decoder->len = buffer.len;
>> +    if (!decoder->len) {
>> +        pr_debug("No more data\n");
>> +        return -ENODATA;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
>> +{
>> +    return arm_spe_get_data(decoder);
>> +}
>> +
>> +static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
>> +{
>> +    int ret;
>> +
>> +    decoder->last_packet_type = decoder->packet.type;
>> +
>> +    do {
>> +        decoder->pos += decoder->pkt_step;
>> +        decoder->buf += decoder->pkt_step;
>> +        decoder->len -= decoder->pkt_step;
>> +
>> +
>> +        if (!decoder->len) {
>> +            ret = arm_spe_get_next_data(decoder);
>> +            if (ret)
>> +                return ret;
>> +        }
>> +
>> +        ret = arm_spe_get_packet(decoder->buf, decoder->len,
>> +                &decoder->packet);
>> +        if (ret <= 0)
>> +            return arm_spe_bad_packet(decoder);
>> +
>> +        decoder->pkt_len = ret;
>> +        decoder->pkt_step = ret;
>> +    } while (decoder->packet.type == ARM_SPE_PAD);
>> +
>> +    return 0;
>> +}
>> +
>> +static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
>> +{
>> +    int err;
>> +    int idx;
>> +    uint64_t payload;
>> +
>> +    while (1) {
>> +        err = arm_spe_get_next_packet(decoder);
>> +        if (err)
>> +            return err;
>> +
>> +        idx = decoder->packet.index;
>> +        payload = decoder->packet.payload;
>> +
>> +        switch (decoder->packet.type) {
>> +        case ARM_SPE_TIMESTAMP:
>> +            decoder->sample_timestamp = payload;
>> +            return 0;
>> +        case ARM_SPE_END:
>> +            decoder->sample_timestamp = 0;
>> +            return 0;
>> +        case ARM_SPE_ADDRESS:
>> +            decoder->ip = arm_spe_calc_ip(payload);
>> +            if (idx == 0)
>> +                decoder->state.from_ip = decoder->ip;
>> +            else if (idx == 1)
>> +                decoder->state.to_ip = decoder->ip;
>> +            break;
>> +        case ARM_SPE_COUNTER:
>> +            break;
>> +        case ARM_SPE_CONTEXT:
>> +            break;
>> +        case ARM_SPE_OP_TYPE:
>> +            break;
>> +        case ARM_SPE_EVENTS:
>> +            if (payload & 0x20)
>> +                decoder->state.type |= ARM_SPE_TLB_MISS;
>> +            if (payload & 0x80)
>> +                decoder->state.type |= ARM_SPE_BRANCH_MISS;
>> +            if (idx > 1 && (payload & 0x200))
>> +                decoder->state.type |= ARM_SPE_LLC_MISS;
>> +
>> +            break;
>> +        case ARM_SPE_DATA_SOURCE:
>> +            break;
>> +        case ARM_SPE_BAD:
>> +            break;
>> +        case ARM_SPE_PAD:
>> +            break;
>> +        default:
>> +            pr_err("Get Packet Error!\n");
>> +            return -ENOSYS;
>> +        }
>> +    }
>> +}
>
> This code looks very similar to  arm_spe_pkt_desc(), I can't help but think they should be consolidated in some way. If nothing else the magic 0x20, 0x80, etc ARM_SPE_EVENTS should be defined somewhere and shared.
>

Yes, I wrote it with reference to it. What you said makes sense. I will try to modify it later.

Xiaojun.

>
>> +
>> +const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
>> +{
>> +    int err;
>> +
>> +    decoder->state.type = 0;
>> +
>> +    err = arm_spe_walk_trace(decoder);
>> +    if (err)
>> +        decoder->state.err = err;
>> +
>> +    decoder->state.timestamp = decoder->sample_timestamp;
>> +
>> +    return &decoder->state;
>
> (trimming remainder)
>
>
> .
>


2019-08-21 14:56:19

by James Clark

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] perf report: add --spe options for arm-spe

Hi,

I also had a look at this and had a question about the --spe option.
It seems that whatever options I give it, the output is the same:

perf report
And
perf report --spe=t

Both give the same result:

# Samples: 4 of event 'llc-miss'
# Event count (approx.): 4
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ..........................
#
...
# Samples: 0 of event 'tlb-miss'
# Event count (approx.): 0
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ............. ......
#

# Samples: 83 of event 'branch-miss'
# Event count (approx.): 83
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. .........................
#
...

I would have expected it to not include the branch and LLC sections for the second
command with --spe=t.

And that leads me to another point. Does it make sense to have this option as a post
processing step? SPE already has support for filtering events at collection time with
the PMSFCR_EL1 register.

Should we try to make the interface more like PEBS, where you specify which events you
are interested in doing precise tracing on like this?

perf record -e branch-misses:pp

And then perf could use the modifier to configure SPE so that it only records branch
misses? The benefits of this would be keeping the user interface for precise tracing
similar between platforms.

Thanks
James

On 02/08/2019 10:40, Tan Xiaojun wrote:
> The previous patch added support in "perf report" for some arm-spe
> events(llc-miss, tlb-miss, branch-miss). This patch adds their help
> instructions.
>
> Signed-off-by: Tan Xiaojun <[email protected]>
> ---
> tools/perf/Documentation/perf-report.txt | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
> index 987261d..d998d4b 100644
> --- a/tools/perf/Documentation/perf-report.txt
> +++ b/tools/perf/Documentation/perf-report.txt
> @@ -445,6 +445,15 @@ include::itrace.txt[]
>
> To disable decoding entirely, use --no-itrace.
>
> +--spe::
> + Options for decoding arm-spe tracing data. The options are:
> +
> + l synthesize llc miss events
> + t synthesize tlb miss events
> + b synthesize branch miss events
> +
> + The default is all events i.e. the same as --spe=ltb
> +
> --full-source-path::
> Show the full path for source files for srcline output.
>
>

2019-08-22 02:49:46

by Tan Xiaojun

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] perf report: add --spe options for arm-spe

On 2019/8/21 20:38, James Clark wrote:
> Hi,
>
> I also had a look at this and had a question about the --spe option.
> It seems that whatever options I give it, the output is the same:
>
> perf report
> And
> perf report --spe=t
>
> Both give the same result:
>
> # Samples: 4 of event 'llc-miss'
> # Event count (approx.): 4
> #
> # Children Self Command Shared Object Symbol
> # ........ ........ ....... ................. ..........................
> #
> ...
> # Samples: 0 of event 'tlb-miss'
> # Event count (approx.): 0
> #
> # Children Self Command Shared Object Symbol
> # ........ ........ ....... ............. ......
> #
>
> # Samples: 83 of event 'branch-miss'
> # Event count (approx.): 83
> #
> # Children Self Command Shared Object Symbol
> # ........ ........ ....... ................. .........................
> #
> ...
>
> I would have expected it to not include the branch and LLC sections for the second
> command with --spe=t.
>

Hi,

Sorry, this should be a bug in my code.

> And that leads me to another point. Does it make sense to have this option as a post
> processing step? SPE already has support for filtering events at collection time with
> the PMSFCR_EL1 register.
>
> Should we try to make the interface more like PEBS, where you specify which events you
> are interested in doing precise tracing on like this?
>
> perf record -e branch-misses:pp
>
> And then perf could use the modifier to configure SPE so that it only records branch
> misses? The benefits of this would be keeping the user interface for precise tracing
> similar between platforms.
>

Good suggestion. And I need to spend some time thinking about how to implement it.

Thank you for your reply.
Xiaojun.

> Thanks
> James
>
> On 02/08/2019 10:40, Tan Xiaojun wrote:
>> The previous patch added support in "perf report" for some arm-spe
>> events(llc-miss, tlb-miss, branch-miss). This patch adds their help
>> instructions.
>>
>> Signed-off-by: Tan Xiaojun <[email protected]>
>> ---
>> tools/perf/Documentation/perf-report.txt | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
>> index 987261d..d998d4b 100644
>> --- a/tools/perf/Documentation/perf-report.txt
>> +++ b/tools/perf/Documentation/perf-report.txt
>> @@ -445,6 +445,15 @@ include::itrace.txt[]
>>
>> To disable decoding entirely, use --no-itrace.
>>
>> +--spe::
>> + Options for decoding arm-spe tracing data. The options are:
>> +
>> + l synthesize llc miss events
>> + t synthesize tlb miss events
>> + b synthesize branch miss events
>> +
>> + The default is all events i.e. the same as --spe=ltb
>> +
>> --full-source-path::
>> Show the full path for source files for srcline output.
>>
>>


2019-10-04 13:48:15

by James Clark

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

Hi Xiaojun,

I wanted to ask if you are still working on this?

I've noticed that it doesn't apply cleanly to perf/core anymore and I was working on re-basing it.
Would you be interested in me posting my progress?

I was also interested in decoding the "data source" of events and displaying that information. Does this
clash with any of your current work?


Thanks
James

On 09/08/2019 07:12, Tan Xiaojun wrote:
> On 2019/8/9 5:00, Jeremy Linton wrote:
>> Hi,
>>
>> First thanks for posting this!
>>
>> I ran this on our DAWN platform and it does what it says. Its a pretty reasonable start, but I get -1's in the command row rather than "dd" (or similar) and this also results in [unknown] for the shared object and most userspace addresses. This is quite possibly something I'm not doing right, but I didn't spend a lot of time testing/debugging it.
>>
>> I did a quick glance at the code to, and had a couple comments, although I'm not a perf tool expert.
>>
>
> Hi,
>
> Thank you for your reply.
>
> I have only recently started working on this aspect of the perf tool, so your reply is very important to me.
>
> I need to be sorry, my example here is not complete, until you said that I found that I only posted a part of the example. The complete example is as follows:
>
> Example usage:
>
> # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero of=/dev/null count=10000
> # perf report
>
> --------------------------------------------------------------------
> ...
> # Samples: 37 of event 'llc-miss'
> # Event count (approx.): 37
> #
> # Children Self Command Shared Object Symbol
> # ........ ........ ....... ................. ....................................
> #
> 37.84% 37.84% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
> 16.22% 16.22% dd [kernel.kallsyms] [k] copy_page
> 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma
> 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap
> 5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range
> 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x
> 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data
> 2.70% 2.70% dd [kernel.kallsyms] [k] __remove_shared_vm_struct.isra.1
> 2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free
> 2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
> 2.70% 2.70% dd dd [.] 0x000000000000d9d8
> 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object
> 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork
> 2.70% 2.70% dd libc-2.28.so [.] _dl_addr
>
>
> # Samples: 8 of event 'tlb-miss'
> # Event count (approx.): 8
> #
> # Children Self Command Shared Object Symbol
> # ........ ........ ....... ................. .................................
> #
> 12.50% 12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry
> 12.50% 12.50% dd [kernel.kallsyms] [k] kmem_cache_free
> 12.50% 12.50% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
> 12.50% 12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
> 12.50% 12.50% dd dd [.] 0x000000000000d9d8
> 12.50% 12.50% dd libc-2.28.so [.] __unregister_atfork
> 12.50% 12.50% dd libc-2.28.so [.] _nl_intern_locale_data
> 12.50% 12.50% dd libc-2.28.so [.] vfprintf
>
>
> # Samples: 12 of event 'branch-miss'
> # Event count (approx.): 12
> #
> # Children Self Command Shared Object Symbol
> # ........ ........ ....... ................. ..........................
> #
> 16.67% 16.67% dd libc-2.28.so [.] read_alias_file
> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user
> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user
> 8.33% 8.33% dd [kernel.kallsyms] [k] lookup_fast
> 8.33% 8.33% dd [kernel.kallsyms] [k] strncpy_from_user
> 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x
> 8.33% 8.33% dd ld-2.28.so [.] check_match
> 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l
> 8.33% 8.33% dd libc-2.28.so [.] _dl_addr
> 8.33% 8.33% dd libc-2.28.so [.] _int_malloc
> 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data
>
>
>
>>
>> On 8/2/19 4:40 AM, Tan Xiaojun wrote:
>>> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
>>> Profiling Extensions (SPE) support") is merged, "perf record" and
>>> "perf report --dump-raw-trace" have been supported. However, the
>>> raw data that is dumped cannot be used without parsing.
>>>
>>> This patch is to improve the "perf report" support for spe, and
>>> further process the data. Currently, support for the three events
>>> of llc-miss, tlb-miss, and branch-miss is added.
>>>
>>> Example usage:
>>>
>>> --------------------------------------------------------------------
>>> ...
>>> 37.84% 37.84% dd [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>> 16.22% 16.22% dd [kernel.kallsyms]  [k] copy_page
>>> 5.41% 5.41% dd [kernel.kallsyms]  [k] find_vma
>>> 5.41% 5.41% dd [kernel.kallsyms]  [k] perf_event_mmap
>>> 5.41% 5.41% dd [kernel.kallsyms]  [k] zap_pte_range
>>> 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x
>>> 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data
>>> 2.70% 2.70% dd [kernel.kallsyms]  [k] __remove_shared_vm_struct.isra.1
>>> 2.70% 2.70% dd [kernel.kallsyms]  [k] kmem_cache_free
>>> 2.70% 2.70% dd [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>> 2.70% 2.70% dd dd [.] 0x000000000000d9d8
>>> 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object
>>> 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork
>>> 2.70% 2.70% dd libc-2.28.so [.] _dl_addr
>>>
>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] __audit_syscall_entry
>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] kmem_cache_free
>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>> 12.50% 12.50% dd dd [.] 0x000000000000d9d8
>>> 12.50% 12.50% dd libc-2.28.so [.] __unregister_atfork
>>> 12.50% 12.50% dd libc-2.28.so [.] _nl_intern_locale_data
>>> 12.50% 12.50% dd libc-2.28.so [.] vfprintf
>>>
>>> 16.67% 16.67% dd libc-2.28.so [.] read_alias_file
>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] __arch_copy_from_user
>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] __arch_copy_to_user
>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] lookup_fast
>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] strncpy_from_user
>>> 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x
>>> 8.33% 8.33% dd ld-2.28.so [.] check_match
>>> 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l
>>> 8.33% 8.33% dd libc-2.28.so [.] _dl_addr
>>> 8.33% 8.33% dd libc-2.28.so [.] _int_malloc
>>> 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data
>>>
>>> --------------------------------------------------------------------
>>>
>>> After that, more analysis and processing of the raw data of spe
>>> will be done.
>>>
>>> Signed-off-by: Tan Xiaojun <[email protected]>
>>> ---
>>> tools/perf/builtin-report.c |   5 +
>>> tools/perf/util/arm-spe-decoder/Build |   2 +-
>>> tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 214 ++++++
>>> tools/perf/util/arm-spe-decoder/arm-spe-decoder.h |  51 ++
>>> .../util/arm-spe-decoder/arm-spe-pkt-decoder.h |   2 +
>>> tools/perf/util/arm-spe.c | 715 ++++++++++++++++++++-
>>> tools/perf/util/auxtrace.c |  45 ++
>>> tools/perf/util/auxtrace.h |  27 +
>>> tools/perf/util/session.h |   2 +
>>> 9 files changed, 1028 insertions(+), 35 deletions(-)
>>> create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>> create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>>>
>>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>>> index abf0b9b..fadc8eb 100644
>>> --- a/tools/perf/builtin-report.c
>>> +++ b/tools/perf/builtin-report.c
>>> @@ -1007,6 +1007,7 @@ int cmd_report(int argc, const char **argv)
>>> {
>>> struct perf_session *session;
>>> struct itrace_synth_opts itrace_synth_opts = { .set = 0, };
>>> + struct arm_spe_synth_opts arm_spe_synth_opts;
>>> struct stat st;
>>> bool has_br_stack = false;
>>> int branch_mode = -1;
>>> @@ -1165,6 +1166,9 @@ int cmd_report(int argc, const char **argv)
>>> OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
>>> "Instruction Tracing options\n" ITRACE_HELP,
>>> itrace_parse_synth_opts),
>>> + OPT_CALLBACK_OPTARG(0, "spe", &arm_spe_synth_opts, NULL, "spe opts",
>>> + "ARM SPE Tracing options",
>>> + arm_spe_parse_synth_opts),
>>> OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename,
>>> "Show full source file name path for source lines"),
>>> OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
>>> @@ -1266,6 +1270,7 @@ int cmd_report(int argc, const char **argv)
>>> }
>>> session->itrace_synth_opts = &itrace_synth_opts;
>>> + session->arm_spe_synth_opts = &arm_spe_synth_opts;
>>> report.session = session;
>>> diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
>>> index 16efbc2..f8dae13 100644
>>> --- a/tools/perf/util/arm-spe-decoder/Build
>>> +++ b/tools/perf/util/arm-spe-decoder/Build
>>> @@ -1 +1 @@
>>> -perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
>>> +perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
>>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>> new file mode 100644
>>> index 0000000..8008375
>>> --- /dev/null
>>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>> @@ -0,0 +1,214 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * arm_spe_decoder.c: ARM SPE support
>>> + */
>>> +
>>> +#ifndef _GNU_SOURCE
>>> +#define _GNU_SOURCE
>>> +#endif
>>> +#include <stdlib.h>
>>> +#include <stdbool.h>
>>> +#include <string.h>
>>> +#include <errno.h>
>>> +#include <stdint.h>
>>> +#include <inttypes.h>
>>> +#include <linux/compiler.h>
>>> +#include <linux/zalloc.h>
>>> +
>>> +#include "../util.h"
>>> +#include "../auxtrace.h"
>>> +
>>> +#include "arm-spe-pkt-decoder.h"
>>> +#include "arm-spe-decoder.h"
>>> +
>>> +struct arm_spe_decoder {
>>> + int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
>>> + void *data;
>>> + struct arm_spe_state state;
>>> + const unsigned char *buf;
>>> + size_t len;
>>> + uint64_t pos;
>>> + struct arm_spe_pkt packet;
>>> + int pkt_step;
>>> + int pkt_len;
>>> + int last_packet_type;
>>> +
>>> + uint64_t last_ip;
>>> + uint64_t ip;
>>> + uint64_t timestamp;
>>> + uint64_t sample_timestamp;
>>> + const unsigned char *next_buf;
>>> + size_t next_len;
>>> + unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
>>> +};
>>> +
>>> +static uint64_t arm_spe_calc_ip(uint64_t payload)
>>> +{
>>> + uint64_t ip = (payload & ~(0xffULL << 56));
>>> +
>>> + /* fill high 8 bits for kernel virtual address */
>>> + if (ip & 0x1000000000000ULL)
>>
>> It might be better to use VA_START here if possible.
>>
>
> Yes, it's better, but I don't know how to use VA_START in user mode code. So I wrote it directly.
>
>>> + ip |= (uint64_t)0xff00000000000000ULL;
>>> +
>>> + return ip;
>>> +}
>>> +
>>> +struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
>>> +{
>>> + struct arm_spe_decoder *decoder;
>>> +
>>> + if (!params->get_trace)
>>> + return NULL;
>>> +
>>> + decoder = zalloc(sizeof(struct arm_spe_decoder));
>>> + if (!decoder)
>>> + return NULL;
>>> +
>>> + decoder->get_trace = params->get_trace;
>>> + decoder->data = params->data;
>>> +
>>> + return decoder;
>>> +}
>>> +
>>> +void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
>>> +{
>>> + free(decoder);
>>> +}
>>> +
>>> +static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
>>> +{
>>> + decoder->pkt_len = 1;
>>> + decoder->pkt_step = 1;
>>> + pr_debug("ERROR: Bad packet\n");
>>> +
>>> + return -EBADMSG;
>>> +}
>>> +
>>> +
>>> +static int arm_spe_get_data(struct arm_spe_decoder *decoder)
>>> +{
>>> + struct arm_spe_buffer buffer = { .buf = 0, };
>>> + int ret;
>>> +
>>> + decoder->pkt_step = 0;
>>> +
>>> + pr_debug("Getting more data\n");
>>> + ret = decoder->get_trace(&buffer, decoder->data);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + decoder->buf = buffer.buf;
>>> + decoder->len = buffer.len;
>>> + if (!decoder->len) {
>>> + pr_debug("No more data\n");
>>> + return -ENODATA;
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
>>> +{
>>> + return arm_spe_get_data(decoder);
>>> +}
>>> +
>>> +static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
>>> +{
>>> + int ret;
>>> +
>>> + decoder->last_packet_type = decoder->packet.type;
>>> +
>>> + do {
>>> + decoder->pos += decoder->pkt_step;
>>> + decoder->buf += decoder->pkt_step;
>>> + decoder->len -= decoder->pkt_step;
>>> +
>>> +
>>> + if (!decoder->len) {
>>> + ret = arm_spe_get_next_data(decoder);
>>> + if (ret)
>>> + return ret;
>>> + }
>>> +
>>> + ret = arm_spe_get_packet(decoder->buf, decoder->len,
>>> + &decoder->packet);
>>> + if (ret <= 0)
>>> + return arm_spe_bad_packet(decoder);
>>> +
>>> + decoder->pkt_len = ret;
>>> + decoder->pkt_step = ret;
>>> + } while (decoder->packet.type == ARM_SPE_PAD);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
>>> +{
>>> + int err;
>>> + int idx;
>>> + uint64_t payload;
>>> +
>>> + while (1) {
>>> + err = arm_spe_get_next_packet(decoder);
>>> + if (err)
>>> + return err;
>>> +
>>> + idx = decoder->packet.index;
>>> + payload = decoder->packet.payload;
>>> +
>>> + switch (decoder->packet.type) {
>>> + case ARM_SPE_TIMESTAMP:
>>> + decoder->sample_timestamp = payload;
>>> + return 0;
>>> + case ARM_SPE_END:
>>> + decoder->sample_timestamp = 0;
>>> + return 0;
>>> + case ARM_SPE_ADDRESS:
>>> + decoder->ip = arm_spe_calc_ip(payload);
>>> + if (idx == 0)
>>> + decoder->state.from_ip = decoder->ip;
>>> + else if (idx == 1)
>>> + decoder->state.to_ip = decoder->ip;
>>> + break;
>>> + case ARM_SPE_COUNTER:
>>> + break;
>>> + case ARM_SPE_CONTEXT:
>>> + break;
>>> + case ARM_SPE_OP_TYPE:
>>> + break;
>>> + case ARM_SPE_EVENTS:
>>> + if (payload & 0x20)
>>> + decoder->state.type |= ARM_SPE_TLB_MISS;
>>> + if (payload & 0x80)
>>> + decoder->state.type |= ARM_SPE_BRANCH_MISS;
>>> + if (idx > 1 && (payload & 0x200))
>>> + decoder->state.type |= ARM_SPE_LLC_MISS;
>>> +
>>> + break;
>>> + case ARM_SPE_DATA_SOURCE:
>>> + break;
>>> + case ARM_SPE_BAD:
>>> + break;
>>> + case ARM_SPE_PAD:
>>> + break;
>>> + default:
>>> + pr_err("Get Packet Error!\n");
>>> + return -ENOSYS;
>>> + }
>>> + }
>>> +}
>>
>> This code looks very similar to arm_spe_pkt_desc(), I can't help but think they should be consolidated in some way. If nothing else the magic 0x20, 0x80, etc ARM_SPE_EVENTS should be defined somewhere and shared.
>>
>
> Yes, I wrote it with reference to it. What you said makes sense. I will try to modify it later.
>
> Xiaojun.
>
>>
>>> +
>>> +const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
>>> +{
>>> + int err;
>>> +
>>> + decoder->state.type = 0;
>>> +
>>> + err = arm_spe_walk_trace(decoder);
>>> + if (err)
>>> + decoder->state.err = err;
>>> +
>>> + decoder->state.timestamp = decoder->sample_timestamp;
>>> +
>>> + return &decoder->state;
>>
>> (trimming remainder)
>>
>>
>> .
>>
>
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2019-10-08 06:01:24

by Tan Xiaojun

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

On 2019/10/4 21:46, James Clark wrote:
> Hi Xiaojun,
>
> I wanted to ask if you are still working on this?
>
> I've noticed that it doesn't apply cleanly to perf/core anymore and I was working on re-basing it.
> Would you be interested in me posting my progress?
>
> I was also interested in decoding the "data source" of events and displaying that information. Does this
> clash with any of your current work?
>
>
> Thanks
> James
>

Hi, James,

Sorry, I did not respond in time because of the National Day holiday in China.

I am still doing this, but I have been scheduled for other tasks some time ago, so that there is no obvious progress on spe.

By the way, you mentioned before that you want the spe event to be in the form of "event:pp" like pebs. Is that the whole framework should be made similar to pebs? Or is it just a modification to the command format? For the former, this may be a bit difficult. For the latter, there is currently no modification to the record part, so "-c -F, etc." is only for instructions rather than events, so it may be misunderstood by users.

So I haven't figured out how to do. What do you think of this?

Thanks.
Xiaojun.

> On 09/08/2019 07:12, Tan Xiaojun wrote:
>> On 2019/8/9 5:00, Jeremy Linton wrote:
>>> Hi,
>>>
>>> First thanks for posting this!
>>>
>>> I ran this on our DAWN platform and it does what it says. Its a pretty reasonable start, but I get -1's in the command row rather than "dd" (or similar) and this also results in [unknown] for the shared object and most userspace addresses. This is quite possibly something I'm not doing right, but I didn't spend a lot of time testing/debugging it.
>>>
>>> I did a quick glance at the code to, and had a couple comments, although I'm not a perf tool expert.
>>>
>>
>> Hi,
>>
>> Thank you for your reply.
>>
>> I have only recently started working on this aspect of the perf tool, so your reply is very important to me.
>>
>> I need to be sorry, my example here is not complete, until you said that I found that I only posted a part of the example. The complete example is as follows:
>>
>> Example usage:
>>
>> # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero of=/dev/null count=10000
>> # perf report
>>
>> --------------------------------------------------------------------
>> ...
>> # Samples: 37 of event 'llc-miss'
>> # Event count (approx.): 37
>> #
>> # Children Self Command Shared Object Symbol
>> # ........ ........ ....... ................. ....................................
>> #
>> 37.84% 37.84% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
>> 16.22% 16.22% dd [kernel.kallsyms] [k] copy_page
>> 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma
>> 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap
>> 5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range
>> 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x
>> 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data
>> 2.70% 2.70% dd [kernel.kallsyms] [k] __remove_shared_vm_struct.isra.1
>> 2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free
>> 2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
>> 2.70% 2.70% dd dd [.] 0x000000000000d9d8
>> 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object
>> 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork
>> 2.70% 2.70% dd libc-2.28.so [.] _dl_addr
>>
>>
>> # Samples: 8 of event 'tlb-miss'
>> # Event count (approx.): 8
>> #
>> # Children Self Command Shared Object Symbol
>> # ........ ........ ....... ................. .................................
>> #
>> 12.50% 12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry
>> 12.50% 12.50% dd [kernel.kallsyms] [k] kmem_cache_free
>> 12.50% 12.50% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
>> 12.50% 12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
>> 12.50% 12.50% dd dd [.] 0x000000000000d9d8
>> 12.50% 12.50% dd libc-2.28.so [.] __unregister_atfork
>> 12.50% 12.50% dd libc-2.28.so [.] _nl_intern_locale_data
>> 12.50% 12.50% dd libc-2.28.so [.] vfprintf
>>
>>
>> # Samples: 12 of event 'branch-miss'
>> # Event count (approx.): 12
>> #
>> # Children Self Command Shared Object Symbol
>> # ........ ........ ....... ................. ..........................
>> #
>> 16.67% 16.67% dd libc-2.28.so [.] read_alias_file
>> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user
>> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user
>> 8.33% 8.33% dd [kernel.kallsyms] [k] lookup_fast
>> 8.33% 8.33% dd [kernel.kallsyms] [k] strncpy_from_user
>> 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x
>> 8.33% 8.33% dd ld-2.28.so [.] check_match
>> 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l
>> 8.33% 8.33% dd libc-2.28.so [.] _dl_addr
>> 8.33% 8.33% dd libc-2.28.so [.] _int_malloc
>> 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data
>>
>>
>>
>>>
>>> On 8/2/19 4:40 AM, Tan Xiaojun wrote:
>>>> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
>>>> Profiling Extensions (SPE) support") is merged, "perf record" and
>>>> "perf report --dump-raw-trace" have been supported. However, the
>>>> raw data that is dumped cannot be used without parsing.
>>>>
>>>> This patch is to improve the "perf report" support for spe, and
>>>> further process the data. Currently, support for the three events
>>>> of llc-miss, tlb-miss, and branch-miss is added.
>>>>
>>>> Example usage:
>>>>
>>>> --------------------------------------------------------------------
>>>> ...
>>>> 37.84% 37.84% dd [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>>> 16.22% 16.22% dd [kernel.kallsyms]  [k] copy_page
>>>> 5.41% 5.41% dd [kernel.kallsyms]  [k] find_vma
>>>> 5.41% 5.41% dd [kernel.kallsyms]  [k] perf_event_mmap
>>>> 5.41% 5.41% dd [kernel.kallsyms]  [k] zap_pte_range
>>>> 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x
>>>> 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data
>>>> 2.70% 2.70% dd [kernel.kallsyms]  [k] __remove_shared_vm_struct.isra.1
>>>> 2.70% 2.70% dd [kernel.kallsyms]  [k] kmem_cache_free
>>>> 2.70% 2.70% dd [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>>> 2.70% 2.70% dd dd [.] 0x000000000000d9d8
>>>> 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object
>>>> 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork
>>>> 2.70% 2.70% dd libc-2.28.so [.] _dl_addr
>>>>
>>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] __audit_syscall_entry
>>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] kmem_cache_free
>>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>>> 12.50% 12.50% dd dd [.] 0x000000000000d9d8
>>>> 12.50% 12.50% dd libc-2.28.so [.] __unregister_atfork
>>>> 12.50% 12.50% dd libc-2.28.so [.] _nl_intern_locale_data
>>>> 12.50% 12.50% dd libc-2.28.so [.] vfprintf
>>>>
>>>> 16.67% 16.67% dd libc-2.28.so [.] read_alias_file
>>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] __arch_copy_from_user
>>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] __arch_copy_to_user
>>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] lookup_fast
>>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] strncpy_from_user
>>>> 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x
>>>> 8.33% 8.33% dd ld-2.28.so [.] check_match
>>>> 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l
>>>> 8.33% 8.33% dd libc-2.28.so [.] _dl_addr
>>>> 8.33% 8.33% dd libc-2.28.so [.] _int_malloc
>>>> 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data
>>>>
>>>> --------------------------------------------------------------------
>>>>
>>>> After that, more analysis and processing of the raw data of spe
>>>> will be done.
>>>>
>>>> Signed-off-by: Tan Xiaojun <[email protected]>
>>>> ---
>>>> tools/perf/builtin-report.c |   5 +
>>>> tools/perf/util/arm-spe-decoder/Build |   2 +-
>>>> tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 214 ++++++
>>>> tools/perf/util/arm-spe-decoder/arm-spe-decoder.h |  51 ++
>>>> .../util/arm-spe-decoder/arm-spe-pkt-decoder.h |   2 +
>>>> tools/perf/util/arm-spe.c | 715 ++++++++++++++++++++-
>>>> tools/perf/util/auxtrace.c |  45 ++
>>>> tools/perf/util/auxtrace.h |  27 +
>>>> tools/perf/util/session.h |   2 +
>>>> 9 files changed, 1028 insertions(+), 35 deletions(-)
>>>> create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>>> create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>>>>
>>>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>>>> index abf0b9b..fadc8eb 100644
>>>> --- a/tools/perf/builtin-report.c
>>>> +++ b/tools/perf/builtin-report.c
>>>> @@ -1007,6 +1007,7 @@ int cmd_report(int argc, const char **argv)
>>>> {
>>>> struct perf_session *session;
>>>> struct itrace_synth_opts itrace_synth_opts = { .set = 0, };
>>>> + struct arm_spe_synth_opts arm_spe_synth_opts;
>>>> struct stat st;
>>>> bool has_br_stack = false;
>>>> int branch_mode = -1;
>>>> @@ -1165,6 +1166,9 @@ int cmd_report(int argc, const char **argv)
>>>> OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
>>>> "Instruction Tracing options\n" ITRACE_HELP,
>>>> itrace_parse_synth_opts),
>>>> + OPT_CALLBACK_OPTARG(0, "spe", &arm_spe_synth_opts, NULL, "spe opts",
>>>> + "ARM SPE Tracing options",
>>>> + arm_spe_parse_synth_opts),
>>>> OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename,
>>>> "Show full source file name path for source lines"),
>>>> OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
>>>> @@ -1266,6 +1270,7 @@ int cmd_report(int argc, const char **argv)
>>>> }
>>>> session->itrace_synth_opts = &itrace_synth_opts;
>>>> + session->arm_spe_synth_opts = &arm_spe_synth_opts;
>>>> report.session = session;
>>>> diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
>>>> index 16efbc2..f8dae13 100644
>>>> --- a/tools/perf/util/arm-spe-decoder/Build
>>>> +++ b/tools/perf/util/arm-spe-decoder/Build
>>>> @@ -1 +1 @@
>>>> -perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
>>>> +perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
>>>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>>> new file mode 100644
>>>> index 0000000..8008375
>>>> --- /dev/null
>>>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>>> @@ -0,0 +1,214 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +/*
>>>> + * arm_spe_decoder.c: ARM SPE support
>>>> + */
>>>> +
>>>> +#ifndef _GNU_SOURCE
>>>> +#define _GNU_SOURCE
>>>> +#endif
>>>> +#include <stdlib.h>
>>>> +#include <stdbool.h>
>>>> +#include <string.h>
>>>> +#include <errno.h>
>>>> +#include <stdint.h>
>>>> +#include <inttypes.h>
>>>> +#include <linux/compiler.h>
>>>> +#include <linux/zalloc.h>
>>>> +
>>>> +#include "../util.h"
>>>> +#include "../auxtrace.h"
>>>> +
>>>> +#include "arm-spe-pkt-decoder.h"
>>>> +#include "arm-spe-decoder.h"
>>>> +
>>>> +struct arm_spe_decoder {
>>>> + int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
>>>> + void *data;
>>>> + struct arm_spe_state state;
>>>> + const unsigned char *buf;
>>>> + size_t len;
>>>> + uint64_t pos;
>>>> + struct arm_spe_pkt packet;
>>>> + int pkt_step;
>>>> + int pkt_len;
>>>> + int last_packet_type;
>>>> +
>>>> + uint64_t last_ip;
>>>> + uint64_t ip;
>>>> + uint64_t timestamp;
>>>> + uint64_t sample_timestamp;
>>>> + const unsigned char *next_buf;
>>>> + size_t next_len;
>>>> + unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
>>>> +};
>>>> +
>>>> +static uint64_t arm_spe_calc_ip(uint64_t payload)
>>>> +{
>>>> + uint64_t ip = (payload & ~(0xffULL << 56));
>>>> +
>>>> + /* fill high 8 bits for kernel virtual address */
>>>> + if (ip & 0x1000000000000ULL)
>>>
>>> It might be better to use VA_START here if possible.
>>>
>>
>> Yes, it's better, but I don't know how to use VA_START in user mode code. So I wrote it directly.
>>
>>>> + ip |= (uint64_t)0xff00000000000000ULL;
>>>> +
>>>> + return ip;
>>>> +}
>>>> +
>>>> +struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
>>>> +{
>>>> + struct arm_spe_decoder *decoder;
>>>> +
>>>> + if (!params->get_trace)
>>>> + return NULL;
>>>> +
>>>> + decoder = zalloc(sizeof(struct arm_spe_decoder));
>>>> + if (!decoder)
>>>> + return NULL;
>>>> +
>>>> + decoder->get_trace = params->get_trace;
>>>> + decoder->data = params->data;
>>>> +
>>>> + return decoder;
>>>> +}
>>>> +
>>>> +void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + free(decoder);
>>>> +}
>>>> +
>>>> +static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + decoder->pkt_len = 1;
>>>> + decoder->pkt_step = 1;
>>>> + pr_debug("ERROR: Bad packet\n");
>>>> +
>>>> + return -EBADMSG;
>>>> +}
>>>> +
>>>> +
>>>> +static int arm_spe_get_data(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + struct arm_spe_buffer buffer = { .buf = 0, };
>>>> + int ret;
>>>> +
>>>> + decoder->pkt_step = 0;
>>>> +
>>>> + pr_debug("Getting more data\n");
>>>> + ret = decoder->get_trace(&buffer, decoder->data);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + decoder->buf = buffer.buf;
>>>> + decoder->len = buffer.len;
>>>> + if (!decoder->len) {
>>>> + pr_debug("No more data\n");
>>>> + return -ENODATA;
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + return arm_spe_get_data(decoder);
>>>> +}
>>>> +
>>>> +static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + int ret;
>>>> +
>>>> + decoder->last_packet_type = decoder->packet.type;
>>>> +
>>>> + do {
>>>> + decoder->pos += decoder->pkt_step;
>>>> + decoder->buf += decoder->pkt_step;
>>>> + decoder->len -= decoder->pkt_step;
>>>> +
>>>> +
>>>> + if (!decoder->len) {
>>>> + ret = arm_spe_get_next_data(decoder);
>>>> + if (ret)
>>>> + return ret;
>>>> + }
>>>> +
>>>> + ret = arm_spe_get_packet(decoder->buf, decoder->len,
>>>> + &decoder->packet);
>>>> + if (ret <= 0)
>>>> + return arm_spe_bad_packet(decoder);
>>>> +
>>>> + decoder->pkt_len = ret;
>>>> + decoder->pkt_step = ret;
>>>> + } while (decoder->packet.type == ARM_SPE_PAD);
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + int err;
>>>> + int idx;
>>>> + uint64_t payload;
>>>> +
>>>> + while (1) {
>>>> + err = arm_spe_get_next_packet(decoder);
>>>> + if (err)
>>>> + return err;
>>>> +
>>>> + idx = decoder->packet.index;
>>>> + payload = decoder->packet.payload;
>>>> +
>>>> + switch (decoder->packet.type) {
>>>> + case ARM_SPE_TIMESTAMP:
>>>> + decoder->sample_timestamp = payload;
>>>> + return 0;
>>>> + case ARM_SPE_END:
>>>> + decoder->sample_timestamp = 0;
>>>> + return 0;
>>>> + case ARM_SPE_ADDRESS:
>>>> + decoder->ip = arm_spe_calc_ip(payload);
>>>> + if (idx == 0)
>>>> + decoder->state.from_ip = decoder->ip;
>>>> + else if (idx == 1)
>>>> + decoder->state.to_ip = decoder->ip;
>>>> + break;
>>>> + case ARM_SPE_COUNTER:
>>>> + break;
>>>> + case ARM_SPE_CONTEXT:
>>>> + break;
>>>> + case ARM_SPE_OP_TYPE:
>>>> + break;
>>>> + case ARM_SPE_EVENTS:
>>>> + if (payload & 0x20)
>>>> + decoder->state.type |= ARM_SPE_TLB_MISS;
>>>> + if (payload & 0x80)
>>>> + decoder->state.type |= ARM_SPE_BRANCH_MISS;
>>>> + if (idx > 1 && (payload & 0x200))
>>>> + decoder->state.type |= ARM_SPE_LLC_MISS;
>>>> +
>>>> + break;
>>>> + case ARM_SPE_DATA_SOURCE:
>>>> + break;
>>>> + case ARM_SPE_BAD:
>>>> + break;
>>>> + case ARM_SPE_PAD:
>>>> + break;
>>>> + default:
>>>> + pr_err("Get Packet Error!\n");
>>>> + return -ENOSYS;
>>>> + }
>>>> + }
>>>> +}
>>>
>>> This code looks very similar to arm_spe_pkt_desc(), I can't help but think they should be consolidated in some way. If nothing else the magic 0x20, 0x80, etc ARM_SPE_EVENTS should be defined somewhere and shared.
>>>
>>
>> Yes, I wrote it with reference to it. What you said makes sense. I will try to modify it later.
>>
>> Xiaojun.
>>
>>>
>>>> +
>>>> +const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + int err;
>>>> +
>>>> + decoder->state.type = 0;
>>>> +
>>>> + err = arm_spe_walk_trace(decoder);
>>>> + if (err)
>>>> + decoder->state.err = err;
>>>> +
>>>> + decoder->state.timestamp = decoder->sample_timestamp;
>>>> +
>>>> + return &decoder->state;
>>>
>>> (trimming remainder)
>>>
>>>
>>> .
>>>
>>
>>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>


2019-10-09 02:44:12

by Tan Xiaojun

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

On 2019/10/4 21:46, James Clark wrote:
> Hi Xiaojun,
>
> I wanted to ask if you are still working on this?
>
> I've noticed that it doesn't apply cleanly to perf/core anymore and I was working on re-basing it.
> Would you be interested in me posting my progress?
>
> I was also interested in decoding the "data source" of events and displaying that information. Does this
> clash with any of your current work?
>
>
> Thanks
> James
>

Hi, James,

Sorry, I did not respond in time because of the National Day holiday in China.

I am still doing this, but I have been scheduled for other tasks some time ago, so that there is no obvious progress on spe.

By the way, you mentioned before that you want the spe event to be in the form of "event:pp" like pebs. Is that the whole framework should be made similar to pebs? Or is it just a modification to the command format? For the former, this may be a bit difficult. For the latter, there is currently no modification to the record part, so "-c -F, etc." is only for instructions rather than events, so it may be misunderstood by users.

So I haven't figured out how to do. What do you think of this?

Thanks.
Xiaojun.

> On 09/08/2019 07:12, Tan Xiaojun wrote:
>> On 2019/8/9 5:00, Jeremy Linton wrote:
>>> Hi,
>>>
>>> First thanks for posting this!
>>>
>>> I ran this on our DAWN platform and it does what it says. Its a pretty reasonable start, but I get -1's in the command row rather than "dd" (or similar) and this also results in [unknown] for the shared object and most userspace addresses. This is quite possibly something I'm not doing right, but I didn't spend a lot of time testing/debugging it.
>>>
>>> I did a quick glance at the code to, and had a couple comments, although I'm not a perf tool expert.
>>>
>>
>> Hi,
>>
>> Thank you for your reply.
>>
>> I have only recently started working on this aspect of the perf tool, so your reply is very important to me.
>>
>> I need to be sorry, my example here is not complete, until you said that I found that I only posted a part of the example. The complete example is as follows:
>>
>> Example usage:
>>
>> # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero of=/dev/null count=10000
>> # perf report
>>
>> --------------------------------------------------------------------
>> ...
>> # Samples: 37 of event 'llc-miss'
>> # Event count (approx.): 37
>> #
>> # Children Self Command Shared Object Symbol
>> # ........ ........ ....... ................. ....................................
>> #
>> 37.84% 37.84% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
>> 16.22% 16.22% dd [kernel.kallsyms] [k] copy_page
>> 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma
>> 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap
>> 5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range
>> 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x
>> 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data
>> 2.70% 2.70% dd [kernel.kallsyms] [k] __remove_shared_vm_struct.isra.1
>> 2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free
>> 2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
>> 2.70% 2.70% dd dd [.] 0x000000000000d9d8
>> 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object
>> 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork
>> 2.70% 2.70% dd libc-2.28.so [.] _dl_addr
>>
>>
>> # Samples: 8 of event 'tlb-miss'
>> # Event count (approx.): 8
>> #
>> # Children Self Command Shared Object Symbol
>> # ........ ........ ....... ................. .................................
>> #
>> 12.50% 12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry
>> 12.50% 12.50% dd [kernel.kallsyms] [k] kmem_cache_free
>> 12.50% 12.50% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
>> 12.50% 12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
>> 12.50% 12.50% dd dd [.] 0x000000000000d9d8
>> 12.50% 12.50% dd libc-2.28.so [.] __unregister_atfork
>> 12.50% 12.50% dd libc-2.28.so [.] _nl_intern_locale_data
>> 12.50% 12.50% dd libc-2.28.so [.] vfprintf
>>
>>
>> # Samples: 12 of event 'branch-miss'
>> # Event count (approx.): 12
>> #
>> # Children Self Command Shared Object Symbol
>> # ........ ........ ....... ................. ..........................
>> #
>> 16.67% 16.67% dd libc-2.28.so [.] read_alias_file
>> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user
>> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user
>> 8.33% 8.33% dd [kernel.kallsyms] [k] lookup_fast
>> 8.33% 8.33% dd [kernel.kallsyms] [k] strncpy_from_user
>> 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x
>> 8.33% 8.33% dd ld-2.28.so [.] check_match
>> 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l
>> 8.33% 8.33% dd libc-2.28.so [.] _dl_addr
>> 8.33% 8.33% dd libc-2.28.so [.] _int_malloc
>> 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data
>>
>>
>>
>>>
>>> On 8/2/19 4:40 AM, Tan Xiaojun wrote:
>>>> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
>>>> Profiling Extensions (SPE) support") is merged, "perf record" and
>>>> "perf report --dump-raw-trace" have been supported. However, the
>>>> raw data that is dumped cannot be used without parsing.
>>>>
>>>> This patch is to improve the "perf report" support for spe, and
>>>> further process the data. Currently, support for the three events
>>>> of llc-miss, tlb-miss, and branch-miss is added.
>>>>
>>>> Example usage:
>>>>
>>>> --------------------------------------------------------------------
>>>> ...
>>>> 37.84% 37.84% dd [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>>> 16.22% 16.22% dd [kernel.kallsyms]  [k] copy_page
>>>> 5.41% 5.41% dd [kernel.kallsyms]  [k] find_vma
>>>> 5.41% 5.41% dd [kernel.kallsyms]  [k] perf_event_mmap
>>>> 5.41% 5.41% dd [kernel.kallsyms]  [k] zap_pte_range
>>>> 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x
>>>> 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data
>>>> 2.70% 2.70% dd [kernel.kallsyms]  [k] __remove_shared_vm_struct.isra.1
>>>> 2.70% 2.70% dd [kernel.kallsyms]  [k] kmem_cache_free
>>>> 2.70% 2.70% dd [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>>> 2.70% 2.70% dd dd [.] 0x000000000000d9d8
>>>> 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object
>>>> 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork
>>>> 2.70% 2.70% dd libc-2.28.so [.] _dl_addr
>>>>
>>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] __audit_syscall_entry
>>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] kmem_cache_free
>>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>>> 12.50% 12.50% dd dd [.] 0x000000000000d9d8
>>>> 12.50% 12.50% dd libc-2.28.so [.] __unregister_atfork
>>>> 12.50% 12.50% dd libc-2.28.so [.] _nl_intern_locale_data
>>>> 12.50% 12.50% dd libc-2.28.so [.] vfprintf
>>>>
>>>> 16.67% 16.67% dd libc-2.28.so [.] read_alias_file
>>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] __arch_copy_from_user
>>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] __arch_copy_to_user
>>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] lookup_fast
>>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] strncpy_from_user
>>>> 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x
>>>> 8.33% 8.33% dd ld-2.28.so [.] check_match
>>>> 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l
>>>> 8.33% 8.33% dd libc-2.28.so [.] _dl_addr
>>>> 8.33% 8.33% dd libc-2.28.so [.] _int_malloc
>>>> 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data
>>>>
>>>> --------------------------------------------------------------------
>>>>
>>>> After that, more analysis and processing of the raw data of spe
>>>> will be done.
>>>>
>>>> Signed-off-by: Tan Xiaojun <[email protected]>
>>>> ---
>>>> tools/perf/builtin-report.c |   5 +
>>>> tools/perf/util/arm-spe-decoder/Build |   2 +-
>>>> tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 214 ++++++
>>>> tools/perf/util/arm-spe-decoder/arm-spe-decoder.h |  51 ++
>>>> .../util/arm-spe-decoder/arm-spe-pkt-decoder.h |   2 +
>>>> tools/perf/util/arm-spe.c | 715 ++++++++++++++++++++-
>>>> tools/perf/util/auxtrace.c |  45 ++
>>>> tools/perf/util/auxtrace.h |  27 +
>>>> tools/perf/util/session.h |   2 +
>>>> 9 files changed, 1028 insertions(+), 35 deletions(-)
>>>> create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>>> create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>>>>
>>>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>>>> index abf0b9b..fadc8eb 100644
>>>> --- a/tools/perf/builtin-report.c
>>>> +++ b/tools/perf/builtin-report.c
>>>> @@ -1007,6 +1007,7 @@ int cmd_report(int argc, const char **argv)
>>>> {
>>>> struct perf_session *session;
>>>> struct itrace_synth_opts itrace_synth_opts = { .set = 0, };
>>>> + struct arm_spe_synth_opts arm_spe_synth_opts;
>>>> struct stat st;
>>>> bool has_br_stack = false;
>>>> int branch_mode = -1;
>>>> @@ -1165,6 +1166,9 @@ int cmd_report(int argc, const char **argv)
>>>> OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
>>>> "Instruction Tracing options\n" ITRACE_HELP,
>>>> itrace_parse_synth_opts),
>>>> + OPT_CALLBACK_OPTARG(0, "spe", &arm_spe_synth_opts, NULL, "spe opts",
>>>> + "ARM SPE Tracing options",
>>>> + arm_spe_parse_synth_opts),
>>>> OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename,
>>>> "Show full source file name path for source lines"),
>>>> OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
>>>> @@ -1266,6 +1270,7 @@ int cmd_report(int argc, const char **argv)
>>>> }
>>>> session->itrace_synth_opts = &itrace_synth_opts;
>>>> + session->arm_spe_synth_opts = &arm_spe_synth_opts;
>>>> report.session = session;
>>>> diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
>>>> index 16efbc2..f8dae13 100644
>>>> --- a/tools/perf/util/arm-spe-decoder/Build
>>>> +++ b/tools/perf/util/arm-spe-decoder/Build
>>>> @@ -1 +1 @@
>>>> -perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
>>>> +perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
>>>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>>> new file mode 100644
>>>> index 0000000..8008375
>>>> --- /dev/null
>>>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>>> @@ -0,0 +1,214 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +/*
>>>> + * arm_spe_decoder.c: ARM SPE support
>>>> + */
>>>> +
>>>> +#ifndef _GNU_SOURCE
>>>> +#define _GNU_SOURCE
>>>> +#endif
>>>> +#include <stdlib.h>
>>>> +#include <stdbool.h>
>>>> +#include <string.h>
>>>> +#include <errno.h>
>>>> +#include <stdint.h>
>>>> +#include <inttypes.h>
>>>> +#include <linux/compiler.h>
>>>> +#include <linux/zalloc.h>
>>>> +
>>>> +#include "../util.h"
>>>> +#include "../auxtrace.h"
>>>> +
>>>> +#include "arm-spe-pkt-decoder.h"
>>>> +#include "arm-spe-decoder.h"
>>>> +
>>>> +struct arm_spe_decoder {
>>>> + int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
>>>> + void *data;
>>>> + struct arm_spe_state state;
>>>> + const unsigned char *buf;
>>>> + size_t len;
>>>> + uint64_t pos;
>>>> + struct arm_spe_pkt packet;
>>>> + int pkt_step;
>>>> + int pkt_len;
>>>> + int last_packet_type;
>>>> +
>>>> + uint64_t last_ip;
>>>> + uint64_t ip;
>>>> + uint64_t timestamp;
>>>> + uint64_t sample_timestamp;
>>>> + const unsigned char *next_buf;
>>>> + size_t next_len;
>>>> + unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
>>>> +};
>>>> +
>>>> +static uint64_t arm_spe_calc_ip(uint64_t payload)
>>>> +{
>>>> + uint64_t ip = (payload & ~(0xffULL << 56));
>>>> +
>>>> + /* fill high 8 bits for kernel virtual address */
>>>> + if (ip & 0x1000000000000ULL)
>>>
>>> It might be better to use VA_START here if possible.
>>>
>>
>> Yes, it's better, but I don't know how to use VA_START in user mode code. So I wrote it directly.
>>
>>>> + ip |= (uint64_t)0xff00000000000000ULL;
>>>> +
>>>> + return ip;
>>>> +}
>>>> +
>>>> +struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
>>>> +{
>>>> + struct arm_spe_decoder *decoder;
>>>> +
>>>> + if (!params->get_trace)
>>>> + return NULL;
>>>> +
>>>> + decoder = zalloc(sizeof(struct arm_spe_decoder));
>>>> + if (!decoder)
>>>> + return NULL;
>>>> +
>>>> + decoder->get_trace = params->get_trace;
>>>> + decoder->data = params->data;
>>>> +
>>>> + return decoder;
>>>> +}
>>>> +
>>>> +void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + free(decoder);
>>>> +}
>>>> +
>>>> +static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + decoder->pkt_len = 1;
>>>> + decoder->pkt_step = 1;
>>>> + pr_debug("ERROR: Bad packet\n");
>>>> +
>>>> + return -EBADMSG;
>>>> +}
>>>> +
>>>> +
>>>> +static int arm_spe_get_data(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + struct arm_spe_buffer buffer = { .buf = 0, };
>>>> + int ret;
>>>> +
>>>> + decoder->pkt_step = 0;
>>>> +
>>>> + pr_debug("Getting more data\n");
>>>> + ret = decoder->get_trace(&buffer, decoder->data);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + decoder->buf = buffer.buf;
>>>> + decoder->len = buffer.len;
>>>> + if (!decoder->len) {
>>>> + pr_debug("No more data\n");
>>>> + return -ENODATA;
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + return arm_spe_get_data(decoder);
>>>> +}
>>>> +
>>>> +static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + int ret;
>>>> +
>>>> + decoder->last_packet_type = decoder->packet.type;
>>>> +
>>>> + do {
>>>> + decoder->pos += decoder->pkt_step;
>>>> + decoder->buf += decoder->pkt_step;
>>>> + decoder->len -= decoder->pkt_step;
>>>> +
>>>> +
>>>> + if (!decoder->len) {
>>>> + ret = arm_spe_get_next_data(decoder);
>>>> + if (ret)
>>>> + return ret;
>>>> + }
>>>> +
>>>> + ret = arm_spe_get_packet(decoder->buf, decoder->len,
>>>> + &decoder->packet);
>>>> + if (ret <= 0)
>>>> + return arm_spe_bad_packet(decoder);
>>>> +
>>>> + decoder->pkt_len = ret;
>>>> + decoder->pkt_step = ret;
>>>> + } while (decoder->packet.type == ARM_SPE_PAD);
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + int err;
>>>> + int idx;
>>>> + uint64_t payload;
>>>> +
>>>> + while (1) {
>>>> + err = arm_spe_get_next_packet(decoder);
>>>> + if (err)
>>>> + return err;
>>>> +
>>>> + idx = decoder->packet.index;
>>>> + payload = decoder->packet.payload;
>>>> +
>>>> + switch (decoder->packet.type) {
>>>> + case ARM_SPE_TIMESTAMP:
>>>> + decoder->sample_timestamp = payload;
>>>> + return 0;
>>>> + case ARM_SPE_END:
>>>> + decoder->sample_timestamp = 0;
>>>> + return 0;
>>>> + case ARM_SPE_ADDRESS:
>>>> + decoder->ip = arm_spe_calc_ip(payload);
>>>> + if (idx == 0)
>>>> + decoder->state.from_ip = decoder->ip;
>>>> + else if (idx == 1)
>>>> + decoder->state.to_ip = decoder->ip;
>>>> + break;
>>>> + case ARM_SPE_COUNTER:
>>>> + break;
>>>> + case ARM_SPE_CONTEXT:
>>>> + break;
>>>> + case ARM_SPE_OP_TYPE:
>>>> + break;
>>>> + case ARM_SPE_EVENTS:
>>>> + if (payload & 0x20)
>>>> + decoder->state.type |= ARM_SPE_TLB_MISS;
>>>> + if (payload & 0x80)
>>>> + decoder->state.type |= ARM_SPE_BRANCH_MISS;
>>>> + if (idx > 1 && (payload & 0x200))
>>>> + decoder->state.type |= ARM_SPE_LLC_MISS;
>>>> +
>>>> + break;
>>>> + case ARM_SPE_DATA_SOURCE:
>>>> + break;
>>>> + case ARM_SPE_BAD:
>>>> + break;
>>>> + case ARM_SPE_PAD:
>>>> + break;
>>>> + default:
>>>> + pr_err("Get Packet Error!\n");
>>>> + return -ENOSYS;
>>>> + }
>>>> + }
>>>> +}
>>>
>>> This code looks very similar to arm_spe_pkt_desc(), I can't help but think they should be consolidated in some way. If nothing else the magic 0x20, 0x80, etc ARM_SPE_EVENTS should be defined somewhere and shared.
>>>
>>
>> Yes, I wrote it with reference to it. What you said makes sense. I will try to modify it later.
>>
>> Xiaojun.
>>
>>>
>>>> +
>>>> +const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + int err;
>>>> +
>>>> + decoder->state.type = 0;
>>>> +
>>>> + err = arm_spe_walk_trace(decoder);
>>>> + if (err)
>>>> + decoder->state.err = err;
>>>> +
>>>> + decoder->state.timestamp = decoder->sample_timestamp;
>>>> +
>>>> + return &decoder->state;
>>>
>>> (trimming remainder)
>>>
>>>
>>> .
>>>
>>
>>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>


2019-10-09 03:07:57

by Tan Xiaojun

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

On 2019/10/4 21:46, James Clark wrote:
> Hi Xiaojun,
>
> I wanted to ask if you are still working on this?
>
> I've noticed that it doesn't apply cleanly to perf/core anymore and I was working on re-basing it.
> Would you be interested in me posting my progress?
>
> I was also interested in decoding the "data source" of events and displaying that information. Does this
> clash with any of your current work?
>
>
> Thanks
> James
>

(Sorry, you may have received a lot of this email because I am suddenly not on the mail-list, I need to confirm it.)

Hi, James,

Sorry, I did not respond in time because of the National Day holiday in China.

I am still doing this, but I have been scheduled for other tasks some time ago, so that there is no obvious progress on spe.

By the way, you mentioned before that you want the spe event to be in the form of "event:pp" like pebs. Is that the whole framework should be made similar to pebs? Or is it just a modification to the command format? For the former, this may be a bit difficult. For the latter, there is currently no modification to the record part, so "-c -F, etc." is only for instructions rather than events, so it may be misunderstood by users.

So I haven't figured out how to do. What do you think of this?

Thanks.
Xiaojun.

> On 09/08/2019 07:12, Tan Xiaojun wrote:
>> On 2019/8/9 5:00, Jeremy Linton wrote:
>>> Hi,
>>>
>>> First thanks for posting this!
>>>
>>> I ran this on our DAWN platform and it does what it says. Its a pretty reasonable start, but I get -1's in the command row rather than "dd" (or similar) and this also results in [unknown] for the shared object and most userspace addresses. This is quite possibly something I'm not doing right, but I didn't spend a lot of time testing/debugging it.
>>>
>>> I did a quick glance at the code to, and had a couple comments, although I'm not a perf tool expert.
>>>
>>
>> Hi,
>>
>> Thank you for your reply.
>>
>> I have only recently started working on this aspect of the perf tool, so your reply is very important to me.
>>
>> I need to be sorry, my example here is not complete, until you said that I found that I only posted a part of the example. The complete example is as follows:
>>
>> Example usage:
>>
>> # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero of=/dev/null count=10000
>> # perf report
>>
>> --------------------------------------------------------------------
>> ...
>> # Samples: 37 of event 'llc-miss'
>> # Event count (approx.): 37
>> #
>> # Children Self Command Shared Object Symbol
>> # ........ ........ ....... ................. ....................................
>> #
>> 37.84% 37.84% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
>> 16.22% 16.22% dd [kernel.kallsyms] [k] copy_page
>> 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma
>> 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap
>> 5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range
>> 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x
>> 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data
>> 2.70% 2.70% dd [kernel.kallsyms] [k] __remove_shared_vm_struct.isra.1
>> 2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free
>> 2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
>> 2.70% 2.70% dd dd [.] 0x000000000000d9d8
>> 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object
>> 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork
>> 2.70% 2.70% dd libc-2.28.so [.] _dl_addr
>>
>>
>> # Samples: 8 of event 'tlb-miss'
>> # Event count (approx.): 8
>> #
>> # Children Self Command Shared Object Symbol
>> # ........ ........ ....... ................. .................................
>> #
>> 12.50% 12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry
>> 12.50% 12.50% dd [kernel.kallsyms] [k] kmem_cache_free
>> 12.50% 12.50% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64
>> 12.50% 12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19
>> 12.50% 12.50% dd dd [.] 0x000000000000d9d8
>> 12.50% 12.50% dd libc-2.28.so [.] __unregister_atfork
>> 12.50% 12.50% dd libc-2.28.so [.] _nl_intern_locale_data
>> 12.50% 12.50% dd libc-2.28.so [.] vfprintf
>>
>>
>> # Samples: 12 of event 'branch-miss'
>> # Event count (approx.): 12
>> #
>> # Children Self Command Shared Object Symbol
>> # ........ ........ ....... ................. ..........................
>> #
>> 16.67% 16.67% dd libc-2.28.so [.] read_alias_file
>> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user
>> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user
>> 8.33% 8.33% dd [kernel.kallsyms] [k] lookup_fast
>> 8.33% 8.33% dd [kernel.kallsyms] [k] strncpy_from_user
>> 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x
>> 8.33% 8.33% dd ld-2.28.so [.] check_match
>> 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l
>> 8.33% 8.33% dd libc-2.28.so [.] _dl_addr
>> 8.33% 8.33% dd libc-2.28.so [.] _int_malloc
>> 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data
>>
>>
>>
>>>
>>> On 8/2/19 4:40 AM, Tan Xiaojun wrote:
>>>> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
>>>> Profiling Extensions (SPE) support") is merged, "perf record" and
>>>> "perf report --dump-raw-trace" have been supported. However, the
>>>> raw data that is dumped cannot be used without parsing.
>>>>
>>>> This patch is to improve the "perf report" support for spe, and
>>>> further process the data. Currently, support for the three events
>>>> of llc-miss, tlb-miss, and branch-miss is added.
>>>>
>>>> Example usage:
>>>>
>>>> --------------------------------------------------------------------
>>>> ...
>>>> 37.84% 37.84% dd [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>>> 16.22% 16.22% dd [kernel.kallsyms]  [k] copy_page
>>>> 5.41% 5.41% dd [kernel.kallsyms]  [k] find_vma
>>>> 5.41% 5.41% dd [kernel.kallsyms]  [k] perf_event_mmap
>>>> 5.41% 5.41% dd [kernel.kallsyms]  [k] zap_pte_range
>>>> 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x
>>>> 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data
>>>> 2.70% 2.70% dd [kernel.kallsyms]  [k] __remove_shared_vm_struct.isra.1
>>>> 2.70% 2.70% dd [kernel.kallsyms]  [k] kmem_cache_free
>>>> 2.70% 2.70% dd [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>>> 2.70% 2.70% dd dd [.] 0x000000000000d9d8
>>>> 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object
>>>> 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork
>>>> 2.70% 2.70% dd libc-2.28.so [.] _dl_addr
>>>>
>>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] __audit_syscall_entry
>>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] kmem_cache_free
>>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] perf_iterate_ctx.constprop.64
>>>> 12.50% 12.50% dd [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>>> 12.50% 12.50% dd dd [.] 0x000000000000d9d8
>>>> 12.50% 12.50% dd libc-2.28.so [.] __unregister_atfork
>>>> 12.50% 12.50% dd libc-2.28.so [.] _nl_intern_locale_data
>>>> 12.50% 12.50% dd libc-2.28.so [.] vfprintf
>>>>
>>>> 16.67% 16.67% dd libc-2.28.so [.] read_alias_file
>>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] __arch_copy_from_user
>>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] __arch_copy_to_user
>>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] lookup_fast
>>>> 8.33% 8.33% dd [kernel.kallsyms]  [k] strncpy_from_user
>>>> 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x
>>>> 8.33% 8.33% dd ld-2.28.so [.] check_match
>>>> 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l
>>>> 8.33% 8.33% dd libc-2.28.so [.] _dl_addr
>>>> 8.33% 8.33% dd libc-2.28.so [.] _int_malloc
>>>> 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data
>>>>
>>>> --------------------------------------------------------------------
>>>>
>>>> After that, more analysis and processing of the raw data of spe
>>>> will be done.
>>>>
>>>> Signed-off-by: Tan Xiaojun <[email protected]>
>>>> ---
>>>> tools/perf/builtin-report.c |   5 +
>>>> tools/perf/util/arm-spe-decoder/Build |   2 +-
>>>> tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 214 ++++++
>>>> tools/perf/util/arm-spe-decoder/arm-spe-decoder.h |  51 ++
>>>> .../util/arm-spe-decoder/arm-spe-pkt-decoder.h |   2 +
>>>> tools/perf/util/arm-spe.c | 715 ++++++++++++++++++++-
>>>> tools/perf/util/auxtrace.c |  45 ++
>>>> tools/perf/util/auxtrace.h |  27 +
>>>> tools/perf/util/session.h |   2 +
>>>> 9 files changed, 1028 insertions(+), 35 deletions(-)
>>>> create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>>> create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>>>>
>>>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>>>> index abf0b9b..fadc8eb 100644
>>>> --- a/tools/perf/builtin-report.c
>>>> +++ b/tools/perf/builtin-report.c
>>>> @@ -1007,6 +1007,7 @@ int cmd_report(int argc, const char **argv)
>>>> {
>>>> struct perf_session *session;
>>>> struct itrace_synth_opts itrace_synth_opts = { .set = 0, };
>>>> + struct arm_spe_synth_opts arm_spe_synth_opts;
>>>> struct stat st;
>>>> bool has_br_stack = false;
>>>> int branch_mode = -1;
>>>> @@ -1165,6 +1166,9 @@ int cmd_report(int argc, const char **argv)
>>>> OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts",
>>>> "Instruction Tracing options\n" ITRACE_HELP,
>>>> itrace_parse_synth_opts),
>>>> + OPT_CALLBACK_OPTARG(0, "spe", &arm_spe_synth_opts, NULL, "spe opts",
>>>> + "ARM SPE Tracing options",
>>>> + arm_spe_parse_synth_opts),
>>>> OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename,
>>>> "Show full source file name path for source lines"),
>>>> OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
>>>> @@ -1266,6 +1270,7 @@ int cmd_report(int argc, const char **argv)
>>>> }
>>>> session->itrace_synth_opts = &itrace_synth_opts;
>>>> + session->arm_spe_synth_opts = &arm_spe_synth_opts;
>>>> report.session = session;
>>>> diff --git a/tools/perf/util/arm-spe-decoder/Build b/tools/perf/util/arm-spe-decoder/Build
>>>> index 16efbc2..f8dae13 100644
>>>> --- a/tools/perf/util/arm-spe-decoder/Build
>>>> +++ b/tools/perf/util/arm-spe-decoder/Build
>>>> @@ -1 +1 @@
>>>> -perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o
>>>> +perf-$(CONFIG_AUXTRACE) += arm-spe-pkt-decoder.o arm-spe-decoder.o
>>>> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>>> new file mode 100644
>>>> index 0000000..8008375
>>>> --- /dev/null
>>>> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>>>> @@ -0,0 +1,214 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +/*
>>>> + * arm_spe_decoder.c: ARM SPE support
>>>> + */
>>>> +
>>>> +#ifndef _GNU_SOURCE
>>>> +#define _GNU_SOURCE
>>>> +#endif
>>>> +#include <stdlib.h>
>>>> +#include <stdbool.h>
>>>> +#include <string.h>
>>>> +#include <errno.h>
>>>> +#include <stdint.h>
>>>> +#include <inttypes.h>
>>>> +#include <linux/compiler.h>
>>>> +#include <linux/zalloc.h>
>>>> +
>>>> +#include "../util.h"
>>>> +#include "../auxtrace.h"
>>>> +
>>>> +#include "arm-spe-pkt-decoder.h"
>>>> +#include "arm-spe-decoder.h"
>>>> +
>>>> +struct arm_spe_decoder {
>>>> + int (*get_trace)(struct arm_spe_buffer *buffer, void *data);
>>>> + void *data;
>>>> + struct arm_spe_state state;
>>>> + const unsigned char *buf;
>>>> + size_t len;
>>>> + uint64_t pos;
>>>> + struct arm_spe_pkt packet;
>>>> + int pkt_step;
>>>> + int pkt_len;
>>>> + int last_packet_type;
>>>> +
>>>> + uint64_t last_ip;
>>>> + uint64_t ip;
>>>> + uint64_t timestamp;
>>>> + uint64_t sample_timestamp;
>>>> + const unsigned char *next_buf;
>>>> + size_t next_len;
>>>> + unsigned char temp_buf[ARM_SPE_PKT_MAX_SZ];
>>>> +};
>>>> +
>>>> +static uint64_t arm_spe_calc_ip(uint64_t payload)
>>>> +{
>>>> + uint64_t ip = (payload & ~(0xffULL << 56));
>>>> +
>>>> + /* fill high 8 bits for kernel virtual address */
>>>> + if (ip & 0x1000000000000ULL)
>>>
>>> It might be better to use VA_START here if possible.
>>>
>>
>> Yes, it's better, but I don't know how to use VA_START in user mode code. So I wrote it directly.
>>
>>>> + ip |= (uint64_t)0xff00000000000000ULL;
>>>> +
>>>> + return ip;
>>>> +}
>>>> +
>>>> +struct arm_spe_decoder *arm_spe_decoder_new(struct arm_spe_params *params)
>>>> +{
>>>> + struct arm_spe_decoder *decoder;
>>>> +
>>>> + if (!params->get_trace)
>>>> + return NULL;
>>>> +
>>>> + decoder = zalloc(sizeof(struct arm_spe_decoder));
>>>> + if (!decoder)
>>>> + return NULL;
>>>> +
>>>> + decoder->get_trace = params->get_trace;
>>>> + decoder->data = params->data;
>>>> +
>>>> + return decoder;
>>>> +}
>>>> +
>>>> +void arm_spe_decoder_free(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + free(decoder);
>>>> +}
>>>> +
>>>> +static int arm_spe_bad_packet(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + decoder->pkt_len = 1;
>>>> + decoder->pkt_step = 1;
>>>> + pr_debug("ERROR: Bad packet\n");
>>>> +
>>>> + return -EBADMSG;
>>>> +}
>>>> +
>>>> +
>>>> +static int arm_spe_get_data(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + struct arm_spe_buffer buffer = { .buf = 0, };
>>>> + int ret;
>>>> +
>>>> + decoder->pkt_step = 0;
>>>> +
>>>> + pr_debug("Getting more data\n");
>>>> + ret = decoder->get_trace(&buffer, decoder->data);
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + decoder->buf = buffer.buf;
>>>> + decoder->len = buffer.len;
>>>> + if (!decoder->len) {
>>>> + pr_debug("No more data\n");
>>>> + return -ENODATA;
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int arm_spe_get_next_data(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + return arm_spe_get_data(decoder);
>>>> +}
>>>> +
>>>> +static int arm_spe_get_next_packet(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + int ret;
>>>> +
>>>> + decoder->last_packet_type = decoder->packet.type;
>>>> +
>>>> + do {
>>>> + decoder->pos += decoder->pkt_step;
>>>> + decoder->buf += decoder->pkt_step;
>>>> + decoder->len -= decoder->pkt_step;
>>>> +
>>>> +
>>>> + if (!decoder->len) {
>>>> + ret = arm_spe_get_next_data(decoder);
>>>> + if (ret)
>>>> + return ret;
>>>> + }
>>>> +
>>>> + ret = arm_spe_get_packet(decoder->buf, decoder->len,
>>>> + &decoder->packet);
>>>> + if (ret <= 0)
>>>> + return arm_spe_bad_packet(decoder);
>>>> +
>>>> + decoder->pkt_len = ret;
>>>> + decoder->pkt_step = ret;
>>>> + } while (decoder->packet.type == ARM_SPE_PAD);
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int arm_spe_walk_trace(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + int err;
>>>> + int idx;
>>>> + uint64_t payload;
>>>> +
>>>> + while (1) {
>>>> + err = arm_spe_get_next_packet(decoder);
>>>> + if (err)
>>>> + return err;
>>>> +
>>>> + idx = decoder->packet.index;
>>>> + payload = decoder->packet.payload;
>>>> +
>>>> + switch (decoder->packet.type) {
>>>> + case ARM_SPE_TIMESTAMP:
>>>> + decoder->sample_timestamp = payload;
>>>> + return 0;
>>>> + case ARM_SPE_END:
>>>> + decoder->sample_timestamp = 0;
>>>> + return 0;
>>>> + case ARM_SPE_ADDRESS:
>>>> + decoder->ip = arm_spe_calc_ip(payload);
>>>> + if (idx == 0)
>>>> + decoder->state.from_ip = decoder->ip;
>>>> + else if (idx == 1)
>>>> + decoder->state.to_ip = decoder->ip;
>>>> + break;
>>>> + case ARM_SPE_COUNTER:
>>>> + break;
>>>> + case ARM_SPE_CONTEXT:
>>>> + break;
>>>> + case ARM_SPE_OP_TYPE:
>>>> + break;
>>>> + case ARM_SPE_EVENTS:
>>>> + if (payload & 0x20)
>>>> + decoder->state.type |= ARM_SPE_TLB_MISS;
>>>> + if (payload & 0x80)
>>>> + decoder->state.type |= ARM_SPE_BRANCH_MISS;
>>>> + if (idx > 1 && (payload & 0x200))
>>>> + decoder->state.type |= ARM_SPE_LLC_MISS;
>>>> +
>>>> + break;
>>>> + case ARM_SPE_DATA_SOURCE:
>>>> + break;
>>>> + case ARM_SPE_BAD:
>>>> + break;
>>>> + case ARM_SPE_PAD:
>>>> + break;
>>>> + default:
>>>> + pr_err("Get Packet Error!\n");
>>>> + return -ENOSYS;
>>>> + }
>>>> + }
>>>> +}
>>>
>>> This code looks very similar to arm_spe_pkt_desc(), I can't help but think they should be consolidated in some way. If nothing else the magic 0x20, 0x80, etc ARM_SPE_EVENTS should be defined somewhere and shared.
>>>
>>
>> Yes, I wrote it with reference to it. What you said makes sense. I will try to modify it later.
>>
>> Xiaojun.
>>
>>>
>>>> +
>>>> +const struct arm_spe_state *arm_spe_decode(struct arm_spe_decoder *decoder)
>>>> +{
>>>> + int err;
>>>> +
>>>> + decoder->state.type = 0;
>>>> +
>>>> + err = arm_spe_walk_trace(decoder);
>>>> + if (err)
>>>> + decoder->state.err = err;
>>>> +
>>>> + decoder->state.timestamp = decoder->sample_timestamp;
>>>> +
>>>> + return &decoder->state;
>>>
>>> (trimming remainder)
>>>
>>>
>>> .
>>>
>>
>>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
>


2019-10-09 09:51:31

by James Clark

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

Hi Xiaojun,

> By the way, you mentioned before that you want the spe event to be in the form of "event:pp" like pebs. Is that the whole framework should be made similar to pebs? Or is it just a modification to the command format?

We're currently still investigating if it makes sense to modify the Perf event open syscall to use SPE when the "precise_ip" attribute is set. And then synthesize samples using the SPE data when available. This would keep the syscall interface more consistent between architectures.

And if tools other than Perf want more precise data, they don't have to be aware of SPE or any of the implementation defined details of it. For example the 'data source' encoding can be different from one micro architecture to the next. The kernel is probably the best place to handle this.

At the moment, every tool that wants to use the Perf syscall to get precise data on ARM would have to be aware of SPE and implement their own decoding.

> For the former, this may be a bit difficult. For the latter, there is currently no modification to the record part, so "-c -F, etc." is only for instructions rather than events, so it may be misunderstood by users.
>
> So I haven't figured out how to do. What do you think of this?

I think the patch at the moment is a good start to make SPE more accessible. And the changes I mentioned above wouldn't change the fact that the raw SPE data would still be available via the SPE PMU. So I think continuing with the patch as-is for now is the best idea.


James

2019-10-09 11:10:20

by Tan Xiaojun

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

On 2019/10/9 17:48, James Clark wrote:
> Hi Xiaojun,
>
>> By the way, you mentioned before that you want the spe event to be in the form of "event:pp" like pebs. Is that the whole framework should be made similar to pebs? Or is it just a modification to the command format?
>
> We're currently still investigating if it makes sense to modify the Perf event open syscall to use SPE when the "precise_ip" attribute is set. And then synthesize samples using the SPE data when available. This would keep the syscall interface more consistent between architectures.
>
> And if tools other than Perf want more precise data, they don't have to be aware of SPE or any of the implementation defined details of it. For example the 'data source' encoding can be different from one micro architecture to the next. The kernel is probably the best place to handle this.
>
> At the moment, every tool that wants to use the Perf syscall to get precise data on ARM would have to be aware of SPE and implement their own decoding.
>

Hi James,

What do you mean when the user specifies "event:pp", if the SPE is available, configure and record the spe data directly via the perf event open syscall?
(perf.data itself is the same as using -e arm_spe_0//xxx?)

OK. If I have not misunderstood, I think I know how to do it.
Thank you.

>> For the former, this may be a bit difficult. For the latter, there is currently no modification to the record part, so "-c -F, etc." is only for instructions rather than events, so it may be misunderstood by users.
>>
>> So I haven't figured out how to do. What do you think of this?
>
> I think the patch at the moment is a good start to make SPE more accessible. And the changes I mentioned above wouldn't change the fact that the raw SPE data would still be available via the SPE PMU. So I think continuing with the patch as-is for now is the best idea.
>

Yes. I agree.

Xiaojun.

>
> James
>
>


2019-10-09 11:53:05

by Tan Xiaojun

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

On 2019/10/9 19:09, Tan Xiaojun wrote:
> On 2019/10/9 17:48, James Clark wrote:
>> Hi Xiaojun,
>>
>>> By the way, you mentioned before that you want the spe event to be in the form of "event:pp" like pebs. Is that the whole framework should be made similar to pebs? Or is it just a modification to the command format?
>>
>> We're currently still investigating if it makes sense to modify the Perf event open syscall to use SPE when the "precise_ip" attribute is set. And then synthesize samples using the SPE data when available. This would keep the syscall interface more consistent between architectures.
>>
>> And if tools other than Perf want more precise data, they don't have to be aware of SPE or any of the implementation defined details of it. For example the 'data source' encoding can be different from one micro architecture to the next. The kernel is probably the best place to handle this.
>>
>> At the moment, every tool that wants to use the Perf syscall to get precise data on ARM would have to be aware of SPE and implement their own decoding.
>>
>
> Hi James,
>
> What do you mean when the user specifies "event:pp", if the SPE is available, configure and record the spe data directly via the perf event open syscall?
> (perf.data itself is the same as using -e arm_spe_0//xxx?)

I mean, for the perf record, if the user does not add ":pp" to these events, the original process is taken, and if ":pp" is added, the spe process is taken.

Xiaojun.

>
> OK. If I have not misunderstood, I think I know how to do it.
> Thank you.
>
>>> For the former, this may be a bit difficult. For the latter, there is currently no modification to the record part, so "-c -F, etc." is only for instructions rather than events, so it may be misunderstood by users.
>>>
>>> So I haven't figured out how to do. What do you think of this?
>>
>> I think the patch at the moment is a good start to make SPE more accessible. And the changes I mentioned above wouldn't change the fact that the raw SPE data would still be available via the SPE PMU. So I think continuing with the patch as-is for now is the best idea.
>>
>
> Yes. I agree.
>
> Xiaojun.
>
>>
>> James
>>
>>
>


2019-10-16 14:15:08

by James Clark

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

Hi Xiaojun,

>>
>> What do you mean when the user specifies "event:pp", if the SPE is available, configure and record the spe data directly via the perf event open syscall?
>> (perf.data itself is the same as using -e arm_spe_0//xxx?)
>
> I mean, for the perf record, if the user does not add ":pp" to these events, the original process is taken, and if ":pp" is added, the spe process is taken.
>

Yes we think this is the best way to do it considering that SPE has been implemented as a separate PMU and it will be very difficult to do it in the Kernel when the precise_ip attribute is set.

I think doing everything in userspace is easiest. This will at least mean that users of Perf don't have to be aware of the details of SPE to get precise sample data.

So if the user specifies "event:p" when SPE is available, the SPE PMU is automatically configured data is recorded. If the user also specifies -e arm_spe_0//xxx and wants to do some manual configuration, then that could override the automatic configuration.


James

2019-10-18 05:39:17

by Tan Xiaojun

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

On 2019/10/16 18:12, James Clark wrote:
> Hi Xiaojun,
>
>>>
>>> What do you mean when the user specifies "event:pp", if the SPE is available, configure and record the spe data directly via the perf event open syscall?
>>> (perf.data itself is the same as using -e arm_spe_0//xxx?)
>>
>> I mean, for the perf record, if the user does not add ":pp" to these events, the original process is taken, and if ":pp" is added, the spe process is taken.
>>
>
> Yes we think this is the best way to do it considering that SPE has been implemented as a separate PMU and it will be very difficult to do it in the Kernel when the precise_ip attribute is set.
>
> I think doing everything in userspace is easiest. This will at least mean that users of Perf don't have to be aware of the details of SPE to get precise sample data.
>
> So if the user specifies "event:p" when SPE is available, the SPE PMU is automatically configured data is recorded. If the user also specifies -e arm_spe_0//xxx and wants to do some manual configuration, then that could override the automatic configuration.
>
>
> James
>
>
>

OK. I got it.

I found a bug in the test. If I specify cpu_list(use -a or -C) when logging spe data, some events with "pid:0 tid:0" is logged. This is obviously wrong.

I want to solve this problem, but I haven't found out what went wrong.

--------------------------------------------------------------
[root@server121 perf]# perf record -e arm_spe_0/branch_filter=1,ts_enable=1,pa_enable=1,load_filter=1,jitter=0,store_filter=1,min_latency=0/ -a
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 7.925 MB perf.data ]
[root@server121 perf]# perf report -D > spe_dump.out
[root@server121 perf]# vim spe_dump.out

--------------------------------------------------------------
...
0xd0330 [0x30]: event: 12
.
. ... raw event: size 48 bytes
. 0000: 0c 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00 ......0.........
. 0010: 00 00 00 00 00 00 00 00 f8 d9 fe bd f7 08 02 00 ................
. 0020: 00 00 00 00 00 00 00 00 4c bc 14 00 00 00 00 00 ........L.......

0 572810090961400 0xd0330 [0x30]: PERF_RECORD_ITRACE_START pid: 0 tid: 0

0xd0438 [0x30]: event: 12
.
. ... raw event: size 48 bytes
. 0000: 0c 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00 ......0.........
. 0010: 00 00 00 00 00 00 00 00 d8 ef fe bd f7 08 02 00 ................
. 0020: 01 00 00 00 00 00 00 00 4d bc 14 00 00 00 00 00 ........M.......

1 572810090967000 0xd0438 [0x30]: PERF_RECORD_ITRACE_START pid: 0 tid: 0
...
--------------------------------------------------------------

Thanks.
Xiaojun.

2019-10-18 09:43:15

by Tan Xiaojun

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

On 2019/10/17 9:51, Tan Xiaojun wrote:
> On 2019/10/16 18:12, James Clark wrote:
>> Hi Xiaojun,
>>
>>>>
>>>> What do you mean when the user specifies "event:pp", if the SPE is available, configure and record the spe data directly via the perf event open syscall?
>>>> (perf.data itself is the same as using -e arm_spe_0//xxx?)
>>>
>>> I mean, for the perf record, if the user does not add ":pp" to these events, the original process is taken, and if ":pp" is added, the spe process is taken.
>>>
>>
>> Yes we think this is the best way to do it considering that SPE has been implemented as a separate PMU and it will be very difficult to do it in the Kernel when the precise_ip attribute is set.
>>
>> I think doing everything in userspace is easiest. This will at least mean that users of Perf don't have to be aware of the details of SPE to get precise sample data.
>>
>> So if the user specifies "event:p" when SPE is available, the SPE PMU is automatically configured data is recorded. If the user also specifies -e arm_spe_0//xxx and wants to do some manual configuration, then that could override the automatic configuration.
>>
>>
>> James
>>
>>
>>
>
> OK. I got it.
>
> I found a bug in the test. If I specify cpu_list(use -a or -C) when logging spe data, some events with "pid:0 tid:0" is logged. This is obviously wrong.
>
> I want to solve this problem, but I haven't found out what went wrong.
>
> --------------------------------------------------------------
> [root@server121 perf]# perf record -e arm_spe_0/branch_filter=1,ts_enable=1,pa_enable=1,load_filter=1,jitter=0,store_filter=1,min_latency=0/ -a

Sorry, it should add "--all-user" here, and finally there will still be some "pid:0" events in spe_dump.out.
(And if kernel event is included, then "pid:0" is not a problem)

This causes the pc address of some spe sampled data to be untranslated because the wrong pid/tid is obtained from here.

Thanks.
Xiaojun.

> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 7.925 MB perf.data ]
> [root@server121 perf]# perf report -D > spe_dump.out
> [root@server121 perf]# vim spe_dump.out
>
> --------------------------------------------------------------
> ...
> 0xd0330 [0x30]: event: 12
> .
> . ... raw event: size 48 bytes
> . 0000: 0c 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00 ......0.........
> . 0010: 00 00 00 00 00 00 00 00 f8 d9 fe bd f7 08 02 00 ................
> . 0020: 00 00 00 00 00 00 00 00 4c bc 14 00 00 00 00 00 ........L.......
>
> 0 572810090961400 0xd0330 [0x30]: PERF_RECORD_ITRACE_START pid: 0 tid: 0
>
> 0xd0438 [0x30]: event: 12
> .
> . ... raw event: size 48 bytes
> . 0000: 0c 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00 ......0.........
> . 0010: 00 00 00 00 00 00 00 00 d8 ef fe bd f7 08 02 00 ................
> . 0020: 01 00 00 00 00 00 00 00 4d bc 14 00 00 00 00 00 ........M.......
>
> 1 572810090967000 0xd0438 [0x30]: PERF_RECORD_ITRACE_START pid: 0 tid: 0
> ...
> --------------------------------------------------------------
>
> Thanks.
> Xiaojun.
>
>
> .
>