2020-11-06 09:50:59

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 0/9] perf mem/c2c: Support AUX trace

This patch series is v4 for support perf mem/c2c AUX trace.

Comparing to patch set v3, this patch set adds back the patch 06/09 for
introducing the itrace option '-M', this allows to synthenize memory
event from the AUX trace data.

Since perf mem/c2c tools are focus on memory profiling, this patch set
makes itrace memory event as default for perf mem/c2c, the tool will
tell AUX trace decoder that it is _ONLY_ interested in memory event
rather than other itrace events. So patches 07, 08 have updated
'itrace_synth_opts' for this.

This patch has dropped the memory type 'ldst' and keeps to use the old
types 'load' and 'store', if user passes the type 'load,store', the tool
can automatically use PERF_MEM_EVENTS__LOAD_STORE event if arch supports
it, otherwise, rolls back to enable both LOAD and STORE events. So we
don't need to do any change for tool's usage.

This patch set has been applied clearly on the perf/core branch with the
latest commit 7b3bcedf5ee5 ("perf scripting python: Avoid declaring
function pointers with a visibility attribute").

This patch set has been verified on both x86 and Arm64.

On x86, below commands have been tested:

# perf c2c record -- ~/false_sharing.exe 2
# perf c2c record -e ldlat-loads -- ~/false_sharing.exe 2
# perf c2c record -e ldlat-stores -- ~/false_sharing.exe 2
# perf mem record -- ~/false_sharing.exe 2
# perf mem record -t load -- ~/false_sharing.exe 2
# perf mem record -t store -- ~/false_sharing.exe 2
# perf mem record -t load,store -- ~/false_sharing.exe 2
# perf mem record -e ldlat-loads -- ~/false_sharing.exe 2
# perf mem record -e ldlat-stores -- ~/false_sharing.exe 2

On Arm64, below commands have been tested:

# perf c2c record -- ~/false_sharing.exe 2
# perf c2c record -e spe-load -- ~/false_sharing.exe 2
# perf c2c record -e spe-store -- ~/false_sharing.exe 2
# perf c2c record -e spe-ldst -- ~/false_sharing.exe 2
# perf mem record -- ~/false_sharing.exe 2
# perf mem record -t load -- ~/false_sharing.exe 2
# perf mem record -t store -- ~/false_sharing.exe 2
# perf mem record -t load,store -- ~/false_sharing.exe 2
# perf mem record -e spe-load -- ~/false_sharing.exe 2
# perf mem record -e spe-store -- ~/false_sharing.exe 2
# perf mem record -e spe-ldst -- ~/false_sharing.exe 2

Changes from v3:
* Added back the patch 06/09 for introducing the itrace option '-M'
(Jiri);
* Added 'itrace_synth_opts' for memory event (Jiri);
* Dropped type 'ldst' so don't change any usages for tools (Ian);
* Dropped the patch "perf mem: Return NULL for event 'ldst' on
PowerPC" due type 'ldst' is not added anymore (Ian);
* Added patch 04/09 "perf c2c: Support memory event
PERF_MEM_EVENTS__LOAD_STORE", so can convert the load/store requests
to event PERF_MEM_EVENTS__LOAD_STORE (James Clark).


Leo Yan (9):
perf mem: Search event name with more flexible path
perf mem: Introduce weak function perf_mem_events__ptr()
perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE
perf c2c: Support memory event PERF_MEM_EVENTS__LOAD_STORE
perf mem: Only initialize memory event for recording
perf auxtrace: Add itrace option '-M' for memory events
perf mem: Support AUX trace
perf c2c: Support AUX trace
perf mem: Support Arm SPE events

tools/perf/Documentation/itrace.txt | 1 +
tools/perf/arch/arm64/util/Build | 2 +-
tools/perf/arch/arm64/util/mem-events.c | 37 ++++++++++++++++
tools/perf/builtin-c2c.c | 39 ++++++++++++++---
tools/perf/builtin-mem.c | 56 +++++++++++++++++++------
tools/perf/util/auxtrace.c | 4 ++
tools/perf/util/auxtrace.h | 2 +
tools/perf/util/mem-events.c | 45 +++++++++++++++-----
tools/perf/util/mem-events.h | 3 +-
9 files changed, 158 insertions(+), 31 deletions(-)
create mode 100644 tools/perf/arch/arm64/util/mem-events.c

--
2.17.1


2020-11-06 09:51:09

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 2/9] perf mem: Introduce weak function perf_mem_events__ptr()

Different architectures might use different event or different event
parameters for memory profiling, this patch introduces weak function
perf_mem_events__ptr(), which allows to return back architecture
specific memory event.

Since the variable 'perf_mem_events' can be only accessed by the
function perf_mem_events__ptr(), marks the variable as 'static', this
allows the architectures to define its own memory event array.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 18 ++++++++++++------
tools/perf/builtin-mem.c | 21 ++++++++++++++-------
tools/perf/util/mem-events.c | 26 +++++++++++++++++++-------
tools/perf/util/mem-events.h | 2 +-
4 files changed, 46 insertions(+), 21 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index d5bea5d3cd51..4d1a08e38233 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2867,6 +2867,7 @@ static int perf_c2c__record(int argc, const char **argv)
int ret;
bool all_user = false, all_kernel = false;
bool event_set = false;
+ struct perf_mem_event *e;
struct option options[] = {
OPT_CALLBACK('e', "event", &event_set, "event",
"event selector. Use 'perf c2c record -e list' to list available events",
@@ -2894,11 +2895,15 @@ static int perf_c2c__record(int argc, const char **argv)
rec_argv[i++] = "record";

if (!event_set) {
- perf_mem_events[PERF_MEM_EVENTS__LOAD].record = true;
- perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ e->record = true;
+
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+ e->record = true;
}

- if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ if (e->record)
rec_argv[i++] = "-W";

rec_argv[i++] = "-d";
@@ -2906,12 +2911,13 @@ static int perf_c2c__record(int argc, const char **argv)
rec_argv[i++] = "--sample-cpu";

for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
- if (!perf_mem_events[j].record)
+ e = perf_mem_events__ptr(j);
+ if (!e->record)
continue;

- if (!perf_mem_events[j].supported) {
+ if (!e->supported) {
pr_err("failed: event '%s' not supported\n",
- perf_mem_events[j].name);
+ perf_mem_events__name(j));
free(rec_argv);
return -1;
}
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 3523279af6af..9a7df8d01296 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -64,6 +64,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
const char **rec_argv;
int ret;
bool all_user = false, all_kernel = false;
+ struct perf_mem_event *e;
struct option options[] = {
OPT_CALLBACK('e', "event", &mem, "event",
"event selector. use 'perf mem record -e list' to list available events",
@@ -86,13 +87,18 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)

rec_argv[i++] = "record";

- if (mem->operation & MEM_OPERATION_LOAD)
- perf_mem_events[PERF_MEM_EVENTS__LOAD].record = true;
+ if (mem->operation & MEM_OPERATION_LOAD) {
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ e->record = true;
+ }

- if (mem->operation & MEM_OPERATION_STORE)
- perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
+ if (mem->operation & MEM_OPERATION_STORE) {
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+ e->record = true;
+ }

- if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ if (e->record)
rec_argv[i++] = "-W";

rec_argv[i++] = "-d";
@@ -101,10 +107,11 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
rec_argv[i++] = "--phys-data";

for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
- if (!perf_mem_events[j].record)
+ e = perf_mem_events__ptr(j);
+ if (!e->record)
continue;

- if (!perf_mem_events[j].supported) {
+ if (!e->supported) {
pr_err("failed: event '%s' not supported\n",
perf_mem_events__name(j));
free(rec_argv);
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 35c8d175a9d2..7a5a0d699e27 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -17,7 +17,7 @@ unsigned int perf_mem_events__loads_ldlat = 30;

#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }

-struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
E("ldlat-loads", "cpu/mem-loads,ldlat=%u/P", "cpu/events/mem-loads"),
E("ldlat-stores", "cpu/mem-stores/P", "cpu/events/mem-stores"),
};
@@ -28,19 +28,31 @@ struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
static char mem_loads_name[100];
static bool mem_loads_name__init;

+struct perf_mem_event * __weak perf_mem_events__ptr(int i)
+{
+ if (i >= PERF_MEM_EVENTS__MAX)
+ return NULL;
+
+ return &perf_mem_events[i];
+}
+
char * __weak perf_mem_events__name(int i)
{
+ struct perf_mem_event *e = perf_mem_events__ptr(i);
+
+ if (!e)
+ return NULL;
+
if (i == PERF_MEM_EVENTS__LOAD) {
if (!mem_loads_name__init) {
mem_loads_name__init = true;
scnprintf(mem_loads_name, sizeof(mem_loads_name),
- perf_mem_events[i].name,
- perf_mem_events__loads_ldlat);
+ e->name, perf_mem_events__loads_ldlat);
}
return mem_loads_name;
}

- return (char *)perf_mem_events[i].name;
+ return (char *)e->name;
}

int perf_mem_events__parse(const char *str)
@@ -61,7 +73,7 @@ int perf_mem_events__parse(const char *str)

while (tok) {
for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
- struct perf_mem_event *e = &perf_mem_events[j];
+ struct perf_mem_event *e = perf_mem_events__ptr(j);

if (strstr(e->tag, tok))
e->record = found = true;
@@ -90,7 +102,7 @@ int perf_mem_events__init(void)

for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
char path[PATH_MAX];
- struct perf_mem_event *e = &perf_mem_events[j];
+ struct perf_mem_event *e = perf_mem_events__ptr(j);
struct stat st;

scnprintf(path, PATH_MAX, "%s/devices/%s",
@@ -108,7 +120,7 @@ void perf_mem_events__list(void)
int j;

for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
- struct perf_mem_event *e = &perf_mem_events[j];
+ struct perf_mem_event *e = perf_mem_events__ptr(j);

fprintf(stderr, "%-13s%-*s%s\n",
e->tag,
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 904dad34f7f7..726a9c8103e4 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -31,13 +31,13 @@ enum {
PERF_MEM_EVENTS__MAX,
};

-extern struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX];
extern unsigned int perf_mem_events__loads_ldlat;

int perf_mem_events__parse(const char *str);
int perf_mem_events__init(void);

char *perf_mem_events__name(int i);
+struct perf_mem_event *perf_mem_events__ptr(int i);

void perf_mem_events__list(void);

--
2.17.1

2020-11-06 09:51:14

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 3/9] perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE

On the architectures with perf memory profiling, two types of hardware
events have been supported: load and store; if want to profile memory
for both load and store operations, the tool will use these two events
at the same time, the usage is:

# perf mem record -t load,store -- uname

But this cannot be applied for AUX tracing event, the same PMU event can
be used to only trace memory load, or only memory store, or trace for
both memory load and store.

This patch introduces a new event PERF_MEM_EVENTS__LOAD_STORE, which is
used to support the event which can record both memory load and store
operations.

When user specifies memory operation type as 'load,store', or doesn't
set type so use 'load,store' as default, if the arch supports the event
PERF_MEM_EVENTS__LOAD_STORE, the tool will convert the required
operations to this single event; otherwise, if the arch doesn't support
PERF_MEM_EVENTS__LOAD_STORE, the tool rolls back to enable both events
PERF_MEM_EVENTS__LOAD and PERF_MEM_EVENTS__STORE, which keeps the same
behaviour with before.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-mem.c | 24 ++++++++++++++++++------
tools/perf/util/mem-events.c | 13 ++++++++++++-
tools/perf/util/mem-events.h | 1 +
3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 9a7df8d01296..21ebe0f47e64 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -87,14 +87,26 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)

rec_argv[i++] = "record";

- if (mem->operation & MEM_OPERATION_LOAD) {
- e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
- e->record = true;
- }
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD_STORE);

- if (mem->operation & MEM_OPERATION_STORE) {
- e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+ /*
+ * The load and store operations are required, use the event
+ * PERF_MEM_EVENTS__LOAD_STORE if it is supported.
+ */
+ if (e->tag &&
+ (mem->operation & MEM_OPERATION_LOAD) &&
+ (mem->operation & MEM_OPERATION_STORE)) {
e->record = true;
+ } else {
+ if (mem->operation & MEM_OPERATION_LOAD) {
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ e->record = true;
+ }
+
+ if (mem->operation & MEM_OPERATION_STORE) {
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+ e->record = true;
+ }
}

e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 7a5a0d699e27..19007e463b8a 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -20,6 +20,7 @@ unsigned int perf_mem_events__loads_ldlat = 30;
static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
E("ldlat-loads", "cpu/mem-loads,ldlat=%u/P", "cpu/events/mem-loads"),
E("ldlat-stores", "cpu/mem-stores/P", "cpu/events/mem-stores"),
+ E(NULL, NULL, NULL),
};
#undef E

@@ -75,6 +76,9 @@ int perf_mem_events__parse(const char *str)
for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
struct perf_mem_event *e = perf_mem_events__ptr(j);

+ if (!e->tag)
+ continue;
+
if (strstr(e->tag, tok))
e->record = found = true;
}
@@ -105,6 +109,13 @@ int perf_mem_events__init(void)
struct perf_mem_event *e = perf_mem_events__ptr(j);
struct stat st;

+ /*
+ * If the event entry isn't valid, skip initialization
+ * and "e->supported" will keep false.
+ */
+ if (!e->tag)
+ continue;
+
scnprintf(path, PATH_MAX, "%s/devices/%s",
mnt, e->sysfs_name);

@@ -123,7 +134,7 @@ void perf_mem_events__list(void)
struct perf_mem_event *e = perf_mem_events__ptr(j);

fprintf(stderr, "%-13s%-*s%s\n",
- e->tag,
+ e->tag ?: "",
verbose > 0 ? 25 : 0,
verbose > 0 ? perf_mem_events__name(j) : "",
e->supported ? ": available" : "");
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 726a9c8103e4..5ef178278909 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -28,6 +28,7 @@ struct mem_info {
enum {
PERF_MEM_EVENTS__LOAD,
PERF_MEM_EVENTS__STORE,
+ PERF_MEM_EVENTS__LOAD_STORE,
PERF_MEM_EVENTS__MAX,
};

--
2.17.1

2020-11-06 09:51:21

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 4/9] perf c2c: Support memory event PERF_MEM_EVENTS__LOAD_STORE

When user doesn't specify event name, perf c2c tool enables both the
load and store events, and this leads to failure for opening the
duplicate PMU device for AUX trace.

After the memory event PERF_MEM_EVENTS__LOAD_STORE is introduced, when
the user doesn't specify event name, this patch converts the required
operation to PERF_MEM_EVENTS__LOAD_STORE if the arch supports it.
Otherwise, the tool still rolls back to enable events
PERF_MEM_EVENTS__LOAD and PERF_MEM_EVENTS__STORE.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 4d1a08e38233..98ae33eac6cc 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2895,11 +2895,20 @@ static int perf_c2c__record(int argc, const char **argv)
rec_argv[i++] = "record";

if (!event_set) {
- e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
- e->record = true;
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD_STORE);
+ /*
+ * The load and store operations are required, use the event
+ * PERF_MEM_EVENTS__LOAD_STORE if it is supported.
+ */
+ if (e->tag) {
+ e->record = true;
+ } else {
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ e->record = true;

- e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
- e->record = true;
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+ e->record = true;
+ }
}

e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
--
2.17.1

2020-11-06 09:52:23

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 5/9] perf mem: Only initialize memory event for recording

It's needless to initialize memory events for reporting, this patch
moves memory event initialization for only recording. Furthermore,
the change allows to parse perf data on cross platforms, e.g. perf
tool can report result properly even the machine doesn't support
the memory events.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
---
tools/perf/builtin-mem.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 21ebe0f47e64..72ce4b8fbb0f 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -77,6 +77,11 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
OPT_END()
};

+ if (perf_mem_events__init()) {
+ pr_err("failed: memory events not supported\n");
+ return -1;
+ }
+
argc = parse_options(argc, argv, options, record_mem_usage,
PARSE_OPT_KEEP_UNKNOWN);

@@ -441,11 +446,6 @@ int cmd_mem(int argc, const char **argv)
NULL
};

- if (perf_mem_events__init()) {
- pr_err("failed: memory events not supported\n");
- return -1;
- }
-
argc = parse_options_subcommand(argc, argv, mem_options, mem_subcommands,
mem_usage, PARSE_OPT_KEEP_UNKNOWN);

--
2.17.1

2020-11-06 09:52:54

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 1/9] perf mem: Search event name with more flexible path

Perf tool searches memory event name under the folder
'/sys/devices/cpu/events/', this leads to the limitation for selection
memory profiling event which must be under this folder. Thus it's
impossible to use any other event as memory event which is not under
this specific folder, e.g. Arm SPE hardware event is not located in
'/sys/devices/cpu/events/' so it cannot be enabled for memory profiling.

This patch changes to search folder from '/sys/devices/cpu/events/' to
'/sys/devices', so it give flexibility to find events which can be used
for memory profiling.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
---
tools/perf/util/mem-events.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index ea0af0bc4314..35c8d175a9d2 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -18,8 +18,8 @@ unsigned int perf_mem_events__loads_ldlat = 30;
#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }

struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
- E("ldlat-loads", "cpu/mem-loads,ldlat=%u/P", "mem-loads"),
- E("ldlat-stores", "cpu/mem-stores/P", "mem-stores"),
+ E("ldlat-loads", "cpu/mem-loads,ldlat=%u/P", "cpu/events/mem-loads"),
+ E("ldlat-stores", "cpu/mem-stores/P", "cpu/events/mem-stores"),
};
#undef E

@@ -93,7 +93,7 @@ int perf_mem_events__init(void)
struct perf_mem_event *e = &perf_mem_events[j];
struct stat st;

- scnprintf(path, PATH_MAX, "%s/devices/cpu/events/%s",
+ scnprintf(path, PATH_MAX, "%s/devices/%s",
mnt, e->sysfs_name);

if (!stat(path, &st))
--
2.17.1

2020-11-06 09:52:59

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 8/9] perf c2c: Support AUX trace

This patch adds the AUX callbacks in session structure, so support
AUX trace for "perf c2c" tool; make itrace memory event as default for
"perf c2c", this tells the AUX trace decoder to synthesize samples and
can be used for statistics.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 98ae33eac6cc..c5babeaa3b38 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -369,6 +369,10 @@ static struct perf_c2c c2c = {
.exit = perf_event__process_exit,
.fork = perf_event__process_fork,
.lost = perf_event__process_lost,
+ .attr = perf_event__process_attr,
+ .auxtrace_info = perf_event__process_auxtrace_info,
+ .auxtrace = perf_event__process_auxtrace,
+ .auxtrace_error = perf_event__process_auxtrace_error,
.ordered_events = true,
.ordering_requires_timestamps = true,
},
@@ -2678,6 +2682,12 @@ static int setup_coalesce(const char *coalesce, bool no_source)

static int perf_c2c__report(int argc, const char **argv)
{
+ struct itrace_synth_opts itrace_synth_opts = {
+ .set = true,
+ .mem = true, /* Only enable memory event */
+ .default_no_sample = true,
+ };
+
struct perf_session *session;
struct ui_progress prog;
struct perf_data data = {
@@ -2757,6 +2767,8 @@ static int perf_c2c__report(int argc, const char **argv)
goto out;
}

+ session->itrace_synth_opts = &itrace_synth_opts;
+
err = setup_nodes(session);
if (err) {
pr_err("Failed setup nodes\n");
--
2.17.1

2020-11-06 09:53:42

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 6/9] perf auxtrace: Add itrace option '-M' for memory events

This patch is to add itrace option '-M' to synthesize memory event.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/Documentation/itrace.txt | 1 +
tools/perf/util/auxtrace.c | 4 ++++
tools/perf/util/auxtrace.h | 2 ++
3 files changed, 7 insertions(+)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index d3740c8f399b..079cdfabb352 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -11,6 +11,7 @@
d create a debug log
f synthesize first level cache events
m synthesize last level cache events
+ M synthesize memory events
t synthesize TLB events
a synthesize remote access events
g synthesize a call chain (use with i or x)
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 42a85c86421d..62e7f6c5f8b5 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1333,6 +1333,7 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
synth_opts->flc = true;
synth_opts->llc = true;
synth_opts->tlb = true;
+ synth_opts->mem = true;
synth_opts->remote_access = true;

if (no_sample) {
@@ -1554,6 +1555,9 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
case 'a':
synth_opts->remote_access = true;
break;
+ case 'M':
+ synth_opts->mem = true;
+ break;
case 'q':
synth_opts->quick += 1;
break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 951d2d14cf24..7e5c9e1552bd 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -88,6 +88,7 @@ enum itrace_period_type {
* @llc: whether to synthesize last level cache events
* @tlb: whether to synthesize TLB events
* @remote_access: whether to synthesize remote access events
+ * @mem: whether to synthesize memory events
* @callchain_sz: maximum callchain size
* @last_branch_sz: branch context size
* @period: 'instructions' events period
@@ -126,6 +127,7 @@ struct itrace_synth_opts {
bool llc;
bool tlb;
bool remote_access;
+ bool mem;
unsigned int callchain_sz;
unsigned int last_branch_sz;
unsigned long long period;
--
2.17.1

2020-11-06 09:53:53

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 7/9] perf mem: Support AUX trace

Perf memory tool doesn't support AUX trace data so it cannot receive the
hardware tracing data. On Arm64, although it doesn't support PMU events
for memory load and store, but Arm SPE is a good candidate for memory
profiling, the hardware tracer can record memory accessing operations
with affiliated information (e.g. physical address and virtual address
for accessing, cache levels, TLB walking, latency, etc).

To allow "perf mem" tool to support AUX trace, this patch adds the AUX
callbacks for session structure; make itrace memory event as default for
"perf mem", this tells the AUX trace decoder to synthesize memory
samples.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-mem.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 72ce4b8fbb0f..fdfbff7592f4 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -7,6 +7,7 @@
#include "perf.h"

#include <subcmd/parse-options.h>
+#include "util/auxtrace.h"
#include "util/trace-event.h"
#include "util/tool.h"
#include "util/session.h"
@@ -255,6 +256,12 @@ static int process_sample_event(struct perf_tool *tool,

static int report_raw_events(struct perf_mem *mem)
{
+ struct itrace_synth_opts itrace_synth_opts = {
+ .set = true,
+ .mem = true, /* Only enable memory event */
+ .default_no_sample = true,
+ };
+
struct perf_data data = {
.path = input_name,
.mode = PERF_DATA_MODE_READ,
@@ -267,6 +274,8 @@ static int report_raw_events(struct perf_mem *mem)
if (IS_ERR(session))
return PTR_ERR(session);

+ session->itrace_synth_opts = &itrace_synth_opts;
+
if (mem->cpu_list) {
ret = perf_session__cpu_bitmap(session, mem->cpu_list,
mem->cpu_bitmap);
@@ -410,8 +419,12 @@ int cmd_mem(int argc, const char **argv)
.comm = perf_event__process_comm,
.lost = perf_event__process_lost,
.fork = perf_event__process_fork,
+ .attr = perf_event__process_attr,
.build_id = perf_event__process_build_id,
.namespaces = perf_event__process_namespaces,
+ .auxtrace_info = perf_event__process_auxtrace_info,
+ .auxtrace = perf_event__process_auxtrace,
+ .auxtrace_error = perf_event__process_auxtrace_error,
.ordered_events = true,
},
.input_name = "perf.data",
--
2.17.1

2020-11-06 09:54:22

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 9/9] perf mem: Support Arm SPE events

This patch adds Arm SPE events for perf memory profiling:

'spe-load': event for only recording memory load ops;
'spe-store': event for only recording memory store ops;
'spe-ldst': event for recording memory load and store ops.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/arch/arm64/util/Build | 2 +-
tools/perf/arch/arm64/util/mem-events.c | 37 +++++++++++++++++++++++++
2 files changed, 38 insertions(+), 1 deletion(-)
create mode 100644 tools/perf/arch/arm64/util/mem-events.c

diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build
index 8d2b9bcfffca..ead2f2275eee 100644
--- a/tools/perf/arch/arm64/util/Build
+++ b/tools/perf/arch/arm64/util/Build
@@ -10,4 +10,4 @@ perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
perf-$(CONFIG_AUXTRACE) += ../../arm/util/pmu.o \
../../arm/util/auxtrace.o \
../../arm/util/cs-etm.o \
- arm-spe.o
+ arm-spe.o mem-events.o
diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
new file mode 100644
index 000000000000..2a2497372671
--- /dev/null
+++ b/tools/perf/arch/arm64/util/mem-events.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "map_symbol.h"
+#include "mem-events.h"
+
+#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
+
+static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+ E("spe-load", "arm_spe_0/ts_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0"),
+ E("spe-store", "arm_spe_0/ts_enable=1,load_filter=0,store_filter=1/", "arm_spe_0"),
+ E("spe-ldst", "arm_spe_0/ts_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0"),
+};
+
+static char mem_ev_name[100];
+
+struct perf_mem_event *perf_mem_events__ptr(int i)
+{
+ if (i >= PERF_MEM_EVENTS__MAX)
+ return NULL;
+
+ return &perf_mem_events[i];
+}
+
+char *perf_mem_events__name(int i)
+{
+ struct perf_mem_event *e = perf_mem_events__ptr(i);
+
+ if (i >= PERF_MEM_EVENTS__MAX)
+ return NULL;
+
+ if (i == PERF_MEM_EVENTS__LOAD || i == PERF_MEM_EVENTS__LOAD_STORE)
+ scnprintf(mem_ev_name, sizeof(mem_ev_name),
+ e->name, perf_mem_events__loads_ldlat);
+ else /* PERF_MEM_EVENTS__STORE */
+ scnprintf(mem_ev_name, sizeof(mem_ev_name), e->name);
+
+ return mem_ev_name;
+}
--
2.17.1

2020-11-11 12:40:04

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v4 0/9] perf mem/c2c: Support AUX trace

On Fri, Nov 06, 2020 at 05:48:44PM +0800, Leo Yan wrote:

SNIP

> Changes from v3:
> * Added back the patch 06/09 for introducing the itrace option '-M'
> (Jiri);
> * Added 'itrace_synth_opts' for memory event (Jiri);
> * Dropped type 'ldst' so don't change any usages for tools (Ian);
> * Dropped the patch "perf mem: Return NULL for event 'ldst' on
> PowerPC" due type 'ldst' is not added anymore (Ian);
> * Added patch 04/09 "perf c2c: Support memory event
> PERF_MEM_EVENTS__LOAD_STORE", so can convert the load/store requests
> to event PERF_MEM_EVENTS__LOAD_STORE (James Clark).
>
>
> Leo Yan (9):
> perf mem: Search event name with more flexible path
> perf mem: Introduce weak function perf_mem_events__ptr()
> perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE
> perf c2c: Support memory event PERF_MEM_EVENTS__LOAD_STORE
> perf mem: Only initialize memory event for recording
> perf auxtrace: Add itrace option '-M' for memory events
> perf mem: Support AUX trace
> perf c2c: Support AUX trace
> perf mem: Support Arm SPE events

Acked-by: Jiri Olsa <[email protected]>

thanks,
jirka