2023-12-07 19:24:04

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 0/5] Clean up perf mem

From: Kan Liang <[email protected]>

Changes since V1:
- Fix strcmp of PMU name checking (Ravi)
- Fix "/," typo (Ian)
- Rename several functions with perf_pmu__mem_events prefix. (Ian)
- Fold the header removal patch into the patch where the cleanups made.
(Arnaldo)
- Add reviewed-by and tested-by from Ian and Ravi

As discussed in the below thread, the patch set is to clean up perf mem.
https://lore.kernel.org/lkml/[email protected]/

Introduce generic functions perf_mem_events__ptr(),
perf_mem_events__name() ,and is_mem_loads_aux_event() to replace the
ARCH specific ones.
Simplify the perf_mem_event__supported().

Only keeps the ARCH-specific perf_mem_events array in the corresponding
mem-events.c for each ARCH.

There is no functional change.

The patch set touches almost all the ARCHs, Intel, AMD, ARM, Power and
etc. But I can only test it on two Intel platforms.
Please give it try, if you have machines with other ARCHs.

Here are the test results:
Intel hybrid machine:

$perf mem record -e list
ldlat-loads : available
ldlat-stores : available

$perf mem record -e ldlat-loads -v --ldlat 50
calling: record -e cpu_atom/mem-loads,ldlat=50/P -e cpu_core/mem-loads,ldlat=50/P

$perf mem record -v
calling: record -e cpu_atom/mem-loads,ldlat=30/P -e cpu_atom/mem-stores/P -e cpu_core/mem-loads,ldlat=30/P -e cpu_core/mem-stores/P

$perf mem record -t store -v
calling: record -e cpu_atom/mem-stores/P -e cpu_core/mem-stores/P


Intel SPR:
$perf mem record -e list
ldlat-loads : available
ldlat-stores : available

$perf mem record -e ldlat-loads -v --ldlat 50
calling: record -e {cpu/mem-loads-aux/,cpu/mem-loads,ldlat=50/}:P

$perf mem record -v
calling: record -e {cpu/mem-loads-aux/,cpu/mem-loads,ldlat=30/}:P -e cpu/mem-stores/P

$perf mem record -t store -v
calling: record -e cpu/mem-stores/P

Kan Liang (5):
perf mem: Add mem_events into the supported perf_pmu
perf mem: Clean up perf_mem_events__ptr()
perf mem: Clean up perf_mem_events__name()
perf mem: Clean up perf_mem_event__supported()
perf mem: Clean up is_mem_loads_aux_event()

tools/perf/arch/arm64/util/mem-events.c | 36 +----
tools/perf/arch/arm64/util/pmu.c | 6 +
tools/perf/arch/powerpc/util/mem-events.c | 13 +-
tools/perf/arch/powerpc/util/mem-events.h | 7 +
tools/perf/arch/powerpc/util/pmu.c | 11 ++
tools/perf/arch/s390/util/pmu.c | 3 +
tools/perf/arch/x86/util/mem-events.c | 99 ++----------
tools/perf/arch/x86/util/pmu.c | 11 ++
tools/perf/builtin-c2c.c | 28 +++-
tools/perf/builtin-mem.c | 28 +++-
tools/perf/util/mem-events.c | 181 +++++++++++++---------
tools/perf/util/mem-events.h | 15 +-
tools/perf/util/pmu.c | 4 +-
tools/perf/util/pmu.h | 7 +
14 files changed, 233 insertions(+), 216 deletions(-)
create mode 100644 tools/perf/arch/powerpc/util/mem-events.h
create mode 100644 tools/perf/arch/powerpc/util/pmu.c

--
2.35.1


2023-12-07 19:24:06

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 1/5] perf mem: Add mem_events into the supported perf_pmu

From: Kan Liang <[email protected]>

With the mem_events, perf doesn't need to read sysfs for each PMU to
find the mem-events-supported PMU. The patch also makes it possible to
clean up the related __weak functions later.

The patch is only to add the mem_events into the perf_pmu for all ARCHs.
It will be used in the later cleanup patches.

Reviewed-by: Ian Rogers <[email protected]>
Tested-by: Ravi Bangoria <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/arch/arm64/util/mem-events.c | 4 ++--
tools/perf/arch/arm64/util/mem-events.h | 7 +++++++
tools/perf/arch/arm64/util/pmu.c | 6 ++++++
tools/perf/arch/s390/util/pmu.c | 3 +++
tools/perf/arch/x86/util/mem-events.c | 4 ++--
tools/perf/arch/x86/util/mem-events.h | 9 +++++++++
tools/perf/arch/x86/util/pmu.c | 7 +++++++
tools/perf/util/mem-events.c | 2 +-
tools/perf/util/mem-events.h | 1 +
tools/perf/util/pmu.c | 4 +++-
tools/perf/util/pmu.h | 7 +++++++
11 files changed, 48 insertions(+), 6 deletions(-)
create mode 100644 tools/perf/arch/arm64/util/mem-events.h
create mode 100644 tools/perf/arch/x86/util/mem-events.h

diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
index 3bcc5c7035c2..aaa4804922b4 100644
--- a/tools/perf/arch/arm64/util/mem-events.c
+++ b/tools/perf/arch/arm64/util/mem-events.c
@@ -4,7 +4,7 @@

#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }

-static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
E("spe-load", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0"),
E("spe-store", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0"),
E("spe-ldst", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0"),
@@ -17,7 +17,7 @@ struct perf_mem_event *perf_mem_events__ptr(int i)
if (i >= PERF_MEM_EVENTS__MAX)
return NULL;

- return &perf_mem_events[i];
+ return &perf_mem_events_arm[i];
}

const char *perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
diff --git a/tools/perf/arch/arm64/util/mem-events.h b/tools/perf/arch/arm64/util/mem-events.h
new file mode 100644
index 000000000000..5fc50be4be38
--- /dev/null
+++ b/tools/perf/arch/arm64/util/mem-events.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ARM64_MEM_EVENTS_H
+#define _ARM64_MEM_EVENTS_H
+
+extern struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX];
+
+#endif /* _ARM64_MEM_EVENTS_H */
diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
index 2a4eab2d160e..06ec9b838807 100644
--- a/tools/perf/arch/arm64/util/pmu.c
+++ b/tools/perf/arch/arm64/util/pmu.c
@@ -8,6 +8,12 @@
#include <api/fs/fs.h>
#include <math.h>

+void perf_pmu__arch_init(struct perf_pmu *pmu)
+{
+ if (!strcmp(pmu->name, "arm_spe_0"))
+ pmu->mem_events = perf_mem_events_arm;
+}
+
const struct pmu_metrics_table *pmu_metrics_table__find(void)
{
struct perf_pmu *pmu;
diff --git a/tools/perf/arch/s390/util/pmu.c b/tools/perf/arch/s390/util/pmu.c
index 886c30e001fa..225d7dc2379c 100644
--- a/tools/perf/arch/s390/util/pmu.c
+++ b/tools/perf/arch/s390/util/pmu.c
@@ -19,4 +19,7 @@ void perf_pmu__arch_init(struct perf_pmu *pmu)
!strcmp(pmu->name, S390_PMUPAI_EXT) ||
!strcmp(pmu->name, S390_PMUCPUM_CF))
pmu->selectable = true;
+
+ if (pmu->is_core)
+ pmu->mem_events = perf_mem_events;
}
diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
index 191b372f9a2d..2b81d229982c 100644
--- a/tools/perf/arch/x86/util/mem-events.c
+++ b/tools/perf/arch/x86/util/mem-events.c
@@ -16,13 +16,13 @@ static char mem_stores_name[100];

#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }

-static struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = {
+struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = {
E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "%s/events/mem-loads"),
E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores"),
E(NULL, NULL, NULL),
};

-static struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
+struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
E(NULL, NULL, NULL),
E(NULL, NULL, NULL),
E("mem-ldst", "ibs_op//", "ibs_op"),
diff --git a/tools/perf/arch/x86/util/mem-events.h b/tools/perf/arch/x86/util/mem-events.h
new file mode 100644
index 000000000000..3959e427f482
--- /dev/null
+++ b/tools/perf/arch/x86/util/mem-events.h
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _X86_MEM_EVENTS_H
+#define _X86_MEM_EVENTS_H
+
+extern struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX];
+
+extern struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX];
+
+#endif /* _X86_MEM_EVENTS_H */
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
index 469555ae9b3c..cd22e80e5657 100644
--- a/tools/perf/arch/x86/util/pmu.c
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -15,6 +15,7 @@
#include "../../../util/pmu.h"
#include "../../../util/fncache.h"
#include "../../../util/pmus.h"
+#include "mem-events.h"
#include "env.h"

void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
@@ -30,6 +31,12 @@ void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
pmu->selectable = true;
}
#endif
+
+ if (x86__is_amd_cpu()) {
+ if (!strcmp(pmu->name, "ibs_op"))
+ pmu->mem_events = perf_mem_events_amd;
+ } else if (pmu->is_core)
+ pmu->mem_events = perf_mem_events_intel;
}

int perf_pmus__num_mem_pmus(void)
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 3a2e3687878c..0a8f415f5efe 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -19,7 +19,7 @@ unsigned int perf_mem_events__loads_ldlat = 30;

#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }

-static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
E("ldlat-loads", "cpu/mem-loads,ldlat=%u/P", "cpu/events/mem-loads"),
E("ldlat-stores", "cpu/mem-stores/P", "cpu/events/mem-stores"),
E(NULL, NULL, NULL),
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index b40ad6ea93fc..8c5694b2d0b0 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -34,6 +34,7 @@ enum {
};

extern unsigned int perf_mem_events__loads_ldlat;
+extern struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX];

int perf_mem_events__parse(const char *str);
int perf_mem_events__init(void);
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 3c9609944a2f..3d4373b8ab63 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -986,8 +986,10 @@ static int pmu_max_precise(int dirfd, struct perf_pmu *pmu)
}

void __weak
-perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
+perf_pmu__arch_init(struct perf_pmu *pmu)
{
+ if (pmu->is_core)
+ pmu->mem_events = perf_mem_events;
}

struct perf_pmu *perf_pmu__lookup(struct list_head *pmus, int dirfd, const char *name)
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 424c3fee0949..e35d985206db 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -10,6 +10,8 @@
#include <stdio.h>
#include "parse-events.h"
#include "pmu-events/pmu-events.h"
+#include "map_symbol.h"
+#include "mem-events.h"

struct evsel_config_term;
struct perf_cpu_map;
@@ -162,6 +164,11 @@ struct perf_pmu {
*/
bool exclude_guest;
} missing_features;
+
+ /**
+ * @mem_events: List of the supported mem events
+ */
+ struct perf_mem_event *mem_events;
};

/** @perf_pmu__fake: A special global PMU used for testing. */
--
2.35.1

2023-12-07 19:24:14

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 4/5] perf mem: Clean up perf_mem_event__supported()

From: Kan Liang <[email protected]>

For some ARCHs, e.g., ARM and AMD, to get the availability of the
mem-events, perf checks the existence of a specific PMU. For the other
ARCHs, e.g., Intel and Power, perf has to check the existence of some
specific events.

The current perf only iterates the mem-events-supported PMUs. It's not
required to check the existence of a specific PMU anymore.

Rename sysfs_name to event_name, which stores the specific mem-events.
Perf only needs to check those events for the availability of the
mem-events.

Rename perf_mem_event__supported to perf_pmu__mem_events_supported.

Reviewed-by: Ian Rogers <[email protected]>
Tested-by: Ravi Bangoria <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/arch/arm64/util/mem-events.c | 8 ++++----
tools/perf/arch/powerpc/util/mem-events.c | 8 ++++----
tools/perf/arch/x86/util/mem-events.c | 20 ++++++++++----------
tools/perf/util/mem-events.c | 22 ++++++++++++----------
tools/perf/util/mem-events.h | 2 +-
5 files changed, 31 insertions(+), 29 deletions(-)

diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
index eb2ef84f0fc8..590dddd6b0ab 100644
--- a/tools/perf/arch/arm64/util/mem-events.c
+++ b/tools/perf/arch/arm64/util/mem-events.c
@@ -2,10 +2,10 @@
#include "map_symbol.h"
#include "mem-events.h"

-#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
+#define E(t, n, s, l, a) { .tag = t, .name = n, .event_name = s, .ldlat = l, .aux_event = a }

struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
- E("spe-load", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0", true, 0),
- E("spe-store", "%s/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0", false, 0),
- E("spe-ldst", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0", true, 0),
+ E("spe-load", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", NULL, true, 0),
+ E("spe-store", "%s/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", NULL, false, 0),
+ E("spe-ldst", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", NULL, true, 0),
};
diff --git a/tools/perf/arch/powerpc/util/mem-events.c b/tools/perf/arch/powerpc/util/mem-events.c
index b7883e38950f..72a6ac2b52f5 100644
--- a/tools/perf/arch/powerpc/util/mem-events.c
+++ b/tools/perf/arch/powerpc/util/mem-events.c
@@ -2,10 +2,10 @@
#include "map_symbol.h"
#include "mem-events.h"

-#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
+#define E(t, n, s, l, a) { .tag = t, .name = n, .event_name = s, .ldlat = l, .aux_event = a }

struct perf_mem_event perf_mem_events_power[PERF_MEM_EVENTS__MAX] = {
- E("ldlat-loads", "%s/mem-loads/", "cpu/events/mem-loads", false, 0),
- E("ldlat-stores", "%s/mem-stores/", "cpu/events/mem-stores", false, 0),
- E(NULL, NULL, NULL, false, 0),
+ E("ldlat-loads", "%s/mem-loads/", "mem-loads", false, 0),
+ E("ldlat-stores", "%s/mem-stores/", "mem-stores", false, 0),
+ E(NULL, NULL, NULL, false, 0),
};
diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
index f0e66a0151a0..b776d849fc64 100644
--- a/tools/perf/arch/x86/util/mem-events.c
+++ b/tools/perf/arch/x86/util/mem-events.c
@@ -9,24 +9,24 @@

#define MEM_LOADS_AUX 0x8203

-#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
+#define E(t, n, s, l, a) { .tag = t, .name = n, .event_name = s, .ldlat = l, .aux_event = a }

struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = {
- E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "%s/events/mem-loads", true, 0),
- E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores", false, 0),
- E(NULL, NULL, NULL, false, 0),
+ E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "mem-loads", true, 0),
+ E("ldlat-stores", "%s/mem-stores/P", "mem-stores", false, 0),
+ E(NULL, NULL, NULL, false, 0),
};

struct perf_mem_event perf_mem_events_intel_aux[PERF_MEM_EVENTS__MAX] = {
- E("ldlat-loads", "{%s/mem-loads-aux/,%s/mem-loads,ldlat=%u/}:P", "%s/events/mem-loads", true, MEM_LOADS_AUX),
- E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores", false, 0),
- E(NULL, NULL, NULL, false, 0),
+ E("ldlat-loads", "{%s/mem-loads-aux/,%s/mem-loads,ldlat=%u/}:P", "mem-loads", true, MEM_LOADS_AUX),
+ E("ldlat-stores", "%s/mem-stores/P", "mem-stores", false, 0),
+ E(NULL, NULL, NULL, false, 0),
};

struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
- E(NULL, NULL, NULL, false, 0),
- E(NULL, NULL, NULL, false, 0),
- E("mem-ldst", "%s//", "ibs_op", false, 0),
+ E(NULL, NULL, NULL, false, 0),
+ E(NULL, NULL, NULL, false, 0),
+ E("mem-ldst", "%s//", NULL, false, 0),
};

bool is_mem_loads_aux_event(struct evsel *leader)
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index c9a40b64e538..0d174f161034 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -17,12 +17,12 @@

unsigned int perf_mem_events__loads_ldlat = 30;

-#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
+#define E(t, n, s, l, a) { .tag = t, .name = n, .event_name = s, .ldlat = l, .aux_event = a }

struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
- E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "cpu/events/mem-loads", true, 0),
- E("ldlat-stores", "%s/mem-stores/P", "cpu/events/mem-stores", false, 0),
- E(NULL, NULL, NULL, false, 0),
+ E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "mem-loads", true, 0),
+ E("ldlat-stores", "%s/mem-stores/P", "mem-stores", false, 0),
+ E(NULL, NULL, NULL, false, 0),
};
#undef E

@@ -147,15 +147,17 @@ int perf_pmu__mem_events_parse(struct perf_pmu *pmu, const char *str)
return -1;
}

-static bool perf_mem_event__supported(const char *mnt, struct perf_pmu *pmu,
+static bool perf_pmu__mem_events_supported(const char *mnt, struct perf_pmu *pmu,
struct perf_mem_event *e)
{
- char sysfs_name[100];
char path[PATH_MAX];
struct stat st;

- scnprintf(sysfs_name, sizeof(sysfs_name), e->sysfs_name, pmu->name);
- scnprintf(path, PATH_MAX, "%s/devices/%s", mnt, sysfs_name);
+ if (!e->event_name)
+ return true;
+
+ scnprintf(path, PATH_MAX, "%s/devices/%s/events/%s", mnt, pmu->name, e->event_name);
+
return !stat(path, &st);
}

@@ -178,7 +180,7 @@ int perf_pmu__mem_events_init(struct perf_pmu *pmu)
if (!e->tag)
continue;

- e->supported |= perf_mem_event__supported(mnt, pmu, e);
+ e->supported |= perf_pmu__mem_events_supported(mnt, pmu, e);
if (e->supported)
found = true;
}
@@ -230,7 +232,7 @@ int perf_mem_events__record_args(const char **rec_argv, int *argv_nr,
} else {
const char *s = perf_pmu__mem_events_name(j, pmu);

- if (!perf_mem_event__supported(mnt, pmu, e))
+ if (!perf_pmu__mem_events_supported(mnt, pmu, e))
continue;

rec_argv[i++] = "-e";
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 79d342768d12..f817a507b106 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -18,7 +18,7 @@ struct perf_mem_event {
u32 aux_event;
const char *tag;
const char *name;
- const char *sysfs_name;
+ const char *event_name;
};

struct mem_info {
--
2.35.1

2023-12-07 19:24:25

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 5/5] perf mem: Clean up is_mem_loads_aux_event()

From: Kan Liang <[email protected]>

The aux_event can be retrieved from the perf_pmu now. Implement a
generic support.

Reviewed-by: Ian Rogers <[email protected]>
Tested-by: Ravi Bangoria <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/arch/x86/util/mem-events.c | 23 ++++-------------------
tools/perf/util/mem-events.c | 14 ++++++++++++--
2 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
index b776d849fc64..62df03e91c7e 100644
--- a/tools/perf/arch/x86/util/mem-events.c
+++ b/tools/perf/arch/x86/util/mem-events.c
@@ -1,11 +1,9 @@
// SPDX-License-Identifier: GPL-2.0
-#include "util/pmu.h"
-#include "util/pmus.h"
-#include "util/env.h"
-#include "map_symbol.h"
-#include "mem-events.h"
#include "linux/string.h"
-#include "env.h"
+#include "util/map_symbol.h"
+#include "util/mem-events.h"
+#include "mem-events.h"
+

#define MEM_LOADS_AUX 0x8203

@@ -28,16 +26,3 @@ struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
E(NULL, NULL, NULL, false, 0),
E("mem-ldst", "%s//", NULL, false, 0),
};
-
-bool is_mem_loads_aux_event(struct evsel *leader)
-{
- struct perf_pmu *pmu = perf_pmus__find("cpu");
-
- if (!pmu)
- pmu = perf_pmus__find("cpu_core");
-
- if (pmu && !perf_pmu__have_event(pmu, "mem-loads-aux"))
- return false;
-
- return leader->core.attr.config == MEM_LOADS_AUX;
-}
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 0d174f161034..d418320e52e3 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -103,9 +103,19 @@ static const char *perf_pmu__mem_events_name(int i, struct perf_pmu *pmu)
return NULL;
}

-__weak bool is_mem_loads_aux_event(struct evsel *leader __maybe_unused)
+bool is_mem_loads_aux_event(struct evsel *leader)
{
- return false;
+ struct perf_pmu *pmu = leader->pmu;
+ struct perf_mem_event *e;
+
+ if (!pmu || !pmu->mem_events)
+ return false;
+
+ e = &pmu->mem_events[PERF_MEM_EVENTS__LOAD];
+ if (!e->aux_event)
+ return false;
+
+ return leader->core.attr.config == e->aux_event;
}

int perf_pmu__mem_events_parse(struct perf_pmu *pmu, const char *str)
--
2.35.1

2023-12-07 19:24:35

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 2/5] perf mem: Clean up perf_mem_events__ptr()

From: Kan Liang <[email protected]>

The mem_events can be retrieved from the struct perf_pmu now. An ARCH
specific perf_mem_events__ptr() is not required anymore. Remove all of
them.

The Intel hybrid has multiple mem-events-supported PMUs. But they share
the same mem_events. Other ARCHs only support one mem-events-supported
PMU. In the configuration, it's good enough to only configure the
mem_events for one PMU. Add perf_mem_events_find_pmu() which returns the
first mem-events-supported PMU.

In the perf_mem_events__init(), the perf_pmus__scan() is not required
anymore. It avoids checking the sysfs for every PMU on the system.

Make the perf_mem_events__record_args() more generic. Remove the
perf_mem_events__print_unsupport_hybrid().

Since pmu is added as a new parameter, rename perf_mem_events__ptr() to
perf_pmu__mem_events_ptr(). Several other functions also do a similar
rename.

Reviewed-by: Ian Rogers <[email protected]>
Tested-by: Ravi Bangoria <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/arch/arm64/util/mem-events.c | 10 +--
tools/perf/arch/x86/util/mem-events.c | 18 ++---
tools/perf/builtin-c2c.c | 28 +++++--
tools/perf/builtin-mem.c | 28 +++++--
tools/perf/util/mem-events.c | 103 ++++++++++++------------
tools/perf/util/mem-events.h | 9 ++-
6 files changed, 104 insertions(+), 92 deletions(-)

diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
index aaa4804922b4..2602e8688727 100644
--- a/tools/perf/arch/arm64/util/mem-events.c
+++ b/tools/perf/arch/arm64/util/mem-events.c
@@ -12,17 +12,9 @@ struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {

static char mem_ev_name[100];

-struct perf_mem_event *perf_mem_events__ptr(int i)
-{
- if (i >= PERF_MEM_EVENTS__MAX)
- return NULL;
-
- return &perf_mem_events_arm[i];
-}
-
const char *perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
{
- struct perf_mem_event *e = perf_mem_events__ptr(i);
+ struct perf_mem_event *e = &perf_mem_events_arm[i];

if (i >= PERF_MEM_EVENTS__MAX)
return NULL;
diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
index 2b81d229982c..5fb41d50118d 100644
--- a/tools/perf/arch/x86/util/mem-events.c
+++ b/tools/perf/arch/x86/util/mem-events.c
@@ -28,17 +28,6 @@ struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
E("mem-ldst", "ibs_op//", "ibs_op"),
};

-struct perf_mem_event *perf_mem_events__ptr(int i)
-{
- if (i >= PERF_MEM_EVENTS__MAX)
- return NULL;
-
- if (x86__is_amd_cpu())
- return &perf_mem_events_amd[i];
-
- return &perf_mem_events_intel[i];
-}
-
bool is_mem_loads_aux_event(struct evsel *leader)
{
struct perf_pmu *pmu = perf_pmus__find("cpu");
@@ -54,7 +43,12 @@ bool is_mem_loads_aux_event(struct evsel *leader)

const char *perf_mem_events__name(int i, const char *pmu_name)
{
- struct perf_mem_event *e = perf_mem_events__ptr(i);
+ struct perf_mem_event *e;
+
+ if (x86__is_amd_cpu())
+ e = &perf_mem_events_amd[i];
+ else
+ e = &perf_mem_events_intel[i];

if (!e)
return NULL;
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index a4cf9de7a7b5..e5b7dc7a80e3 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -3215,12 +3215,19 @@ static int parse_record_events(const struct option *opt,
const char *str, int unset __maybe_unused)
{
bool *event_set = (bool *) opt->value;
+ struct perf_pmu *pmu;
+
+ pmu = perf_mem_events_find_pmu();
+ if (!pmu) {
+ pr_err("failed: there is no PMU that supports perf c2c\n");
+ exit(-1);
+ }

if (!strcmp(str, "list")) {
- perf_mem_events__list();
+ perf_pmu__mem_events_list(pmu);
exit(0);
}
- if (perf_mem_events__parse(str))
+ if (perf_pmu__mem_events_parse(pmu, str))
exit(-1);

*event_set = true;
@@ -3245,6 +3252,7 @@ static int perf_c2c__record(int argc, const char **argv)
bool all_user = false, all_kernel = false;
bool event_set = false;
struct perf_mem_event *e;
+ struct perf_pmu *pmu;
struct option options[] = {
OPT_CALLBACK('e', "event", &event_set, "event",
"event selector. Use 'perf c2c record -e list' to list available events",
@@ -3256,7 +3264,13 @@ static int perf_c2c__record(int argc, const char **argv)
OPT_END()
};

- if (perf_mem_events__init()) {
+ pmu = perf_mem_events_find_pmu();
+ if (!pmu) {
+ pr_err("failed: no PMU supports the memory events\n");
+ return -1;
+ }
+
+ if (perf_pmu__mem_events_init(pmu)) {
pr_err("failed: memory events not supported\n");
return -1;
}
@@ -3280,7 +3294,7 @@ static int perf_c2c__record(int argc, const char **argv)
rec_argv[i++] = "record";

if (!event_set) {
- e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD_STORE);
+ e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__LOAD_STORE);
/*
* The load and store operations are required, use the event
* PERF_MEM_EVENTS__LOAD_STORE if it is supported.
@@ -3289,15 +3303,15 @@ static int perf_c2c__record(int argc, const char **argv)
e->record = true;
rec_argv[i++] = "-W";
} else {
- e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__LOAD);
e->record = true;

- e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+ e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__STORE);
e->record = true;
}
}

- e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__LOAD);
if (e->record)
rec_argv[i++] = "-W";

diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 51499c20da01..ef64bae77ca7 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -43,12 +43,19 @@ static int parse_record_events(const struct option *opt,
const char *str, int unset __maybe_unused)
{
struct perf_mem *mem = *(struct perf_mem **)opt->value;
+ struct perf_pmu *pmu;
+
+ pmu = perf_mem_events_find_pmu();
+ if (!pmu) {
+ pr_err("failed: there is no PMU that supports perf mem\n");
+ exit(-1);
+ }

if (!strcmp(str, "list")) {
- perf_mem_events__list();
+ perf_pmu__mem_events_list(pmu);
exit(0);
}
- if (perf_mem_events__parse(str))
+ if (perf_pmu__mem_events_parse(pmu, str))
exit(-1);

mem->operation = 0;
@@ -72,6 +79,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
int ret;
bool all_user = false, all_kernel = false;
struct perf_mem_event *e;
+ struct perf_pmu *pmu;
struct option options[] = {
OPT_CALLBACK('e', "event", &mem, "event",
"event selector. use 'perf mem record -e list' to list available events",
@@ -84,7 +92,13 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
OPT_END()
};

- if (perf_mem_events__init()) {
+ pmu = perf_mem_events_find_pmu();
+ if (!pmu) {
+ pr_err("failed: no PMU supports the memory events\n");
+ return -1;
+ }
+
+ if (perf_pmu__mem_events_init(pmu)) {
pr_err("failed: memory events not supported\n");
return -1;
}
@@ -113,7 +127,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)

rec_argv[i++] = "record";

- e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD_STORE);
+ e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__LOAD_STORE);

/*
* The load and store operations are required, use the event
@@ -126,17 +140,17 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
rec_argv[i++] = "-W";
} else {
if (mem->operation & MEM_OPERATION_LOAD) {
- e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__LOAD);
e->record = true;
}

if (mem->operation & MEM_OPERATION_STORE) {
- e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+ e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__STORE);
e->record = true;
}
}

- e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ e = perf_pmu__mem_events_ptr(pmu, PERF_MEM_EVENTS__LOAD);
if (e->record)
rec_argv[i++] = "-W";

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 0a8f415f5efe..27a33dc44964 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -29,17 +29,42 @@ struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
static char mem_loads_name[100];
static bool mem_loads_name__init;

-struct perf_mem_event * __weak perf_mem_events__ptr(int i)
+struct perf_mem_event *perf_pmu__mem_events_ptr(struct perf_pmu *pmu, int i)
{
- if (i >= PERF_MEM_EVENTS__MAX)
+ if (i >= PERF_MEM_EVENTS__MAX || !pmu)
return NULL;

- return &perf_mem_events[i];
+ return &pmu->mem_events[i];
+}
+
+static struct perf_pmu *perf_pmus__scan_mem(struct perf_pmu *pmu)
+{
+ while ((pmu = perf_pmus__scan(pmu)) != NULL) {
+ if (pmu->mem_events)
+ return pmu;
+ }
+ return NULL;
+}
+
+struct perf_pmu *perf_mem_events_find_pmu(void)
+{
+ /*
+ * The current perf mem doesn't support per-PMU configuration.
+ * The exact same configuration is applied to all the
+ * mem_events supported PMUs.
+ * Return the first mem_events supported PMU.
+ *
+ * Notes: The only case which may support multiple mem_events
+ * supported PMUs is Intel hybrid. The exact same mem_events
+ * is shared among the PMUs. Only configure the first PMU
+ * is good enough as well.
+ */
+ return perf_pmus__scan_mem(NULL);
}

const char * __weak perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
{
- struct perf_mem_event *e = perf_mem_events__ptr(i);
+ struct perf_mem_event *e = &perf_mem_events[i];

if (!e)
return NULL;
@@ -61,7 +86,7 @@ __weak bool is_mem_loads_aux_event(struct evsel *leader __maybe_unused)
return false;
}

-int perf_mem_events__parse(const char *str)
+int perf_pmu__mem_events_parse(struct perf_pmu *pmu, const char *str)
{
char *tok, *saveptr = NULL;
bool found = false;
@@ -79,7 +104,7 @@ int perf_mem_events__parse(const char *str)

while (tok) {
for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
- struct perf_mem_event *e = perf_mem_events__ptr(j);
+ struct perf_mem_event *e = perf_pmu__mem_events_ptr(pmu, j);

if (!e->tag)
continue;
@@ -112,7 +137,7 @@ static bool perf_mem_event__supported(const char *mnt, struct perf_pmu *pmu,
return !stat(path, &st);
}

-int perf_mem_events__init(void)
+int perf_pmu__mem_events_init(struct perf_pmu *pmu)
{
const char *mnt = sysfs__mount();
bool found = false;
@@ -122,8 +147,7 @@ int perf_mem_events__init(void)
return -ENOENT;

for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
- struct perf_mem_event *e = perf_mem_events__ptr(j);
- struct perf_pmu *pmu = NULL;
+ struct perf_mem_event *e = perf_pmu__mem_events_ptr(pmu, j);

/*
* If the event entry isn't valid, skip initialization
@@ -132,29 +156,20 @@ int perf_mem_events__init(void)
if (!e->tag)
continue;

- /*
- * Scan all PMUs not just core ones, since perf mem/c2c on
- * platforms like AMD uses IBS OP PMU which is independent
- * of core PMU.
- */
- while ((pmu = perf_pmus__scan(pmu)) != NULL) {
- e->supported |= perf_mem_event__supported(mnt, pmu, e);
- if (e->supported) {
- found = true;
- break;
- }
- }
+ e->supported |= perf_mem_event__supported(mnt, pmu, e);
+ if (e->supported)
+ found = true;
}

return found ? 0 : -ENOENT;
}

-void perf_mem_events__list(void)
+void perf_pmu__mem_events_list(struct perf_pmu *pmu)
{
int j;

for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
- struct perf_mem_event *e = perf_mem_events__ptr(j);
+ struct perf_mem_event *e = perf_pmu__mem_events_ptr(pmu, j);

fprintf(stderr, "%-*s%-*s%s",
e->tag ? 13 : 0,
@@ -165,50 +180,32 @@ void perf_mem_events__list(void)
}
}

-static void perf_mem_events__print_unsupport_hybrid(struct perf_mem_event *e,
- int idx)
-{
- const char *mnt = sysfs__mount();
- struct perf_pmu *pmu = NULL;
-
- while ((pmu = perf_pmus__scan(pmu)) != NULL) {
- if (!perf_mem_event__supported(mnt, pmu, e)) {
- pr_err("failed: event '%s' not supported\n",
- perf_mem_events__name(idx, pmu->name));
- }
- }
-}
-
int perf_mem_events__record_args(const char **rec_argv, int *argv_nr,
char **rec_tmp, int *tmp_nr)
{
const char *mnt = sysfs__mount();
+ struct perf_pmu *pmu = NULL;
int i = *argv_nr, k = 0;
struct perf_mem_event *e;

- for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
- e = perf_mem_events__ptr(j);
- if (!e->record)
- continue;

- if (perf_pmus__num_mem_pmus() == 1) {
- if (!e->supported) {
- pr_err("failed: event '%s' not supported\n",
- perf_mem_events__name(j, NULL));
- return -1;
- }
+ while ((pmu = perf_pmus__scan_mem(pmu)) != NULL) {
+ for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
+ e = perf_pmu__mem_events_ptr(pmu, j);

- rec_argv[i++] = "-e";
- rec_argv[i++] = perf_mem_events__name(j, NULL);
- } else {
- struct perf_pmu *pmu = NULL;
+ if (!e->record)
+ continue;

if (!e->supported) {
- perf_mem_events__print_unsupport_hybrid(e, j);
+ pr_err("failed: event '%s' not supported\n",
+ perf_mem_events__name(j, pmu->name));
return -1;
}

- while ((pmu = perf_pmus__scan(pmu)) != NULL) {
+ if (perf_pmus__num_mem_pmus() == 1) {
+ rec_argv[i++] = "-e";
+ rec_argv[i++] = perf_mem_events__name(j, NULL);
+ } else {
const char *s = perf_mem_events__name(j, pmu->name);

if (!perf_mem_event__supported(mnt, pmu, e))
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 8c5694b2d0b0..0ad301a2e424 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -36,14 +36,15 @@ enum {
extern unsigned int perf_mem_events__loads_ldlat;
extern struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX];

-int perf_mem_events__parse(const char *str);
-int perf_mem_events__init(void);
+int perf_pmu__mem_events_parse(struct perf_pmu *pmu, const char *str);
+int perf_pmu__mem_events_init(struct perf_pmu *pmu);

const char *perf_mem_events__name(int i, const char *pmu_name);
-struct perf_mem_event *perf_mem_events__ptr(int i);
+struct perf_mem_event *perf_pmu__mem_events_ptr(struct perf_pmu *pmu, int i);
+struct perf_pmu *perf_mem_events_find_pmu(void);
bool is_mem_loads_aux_event(struct evsel *leader);

-void perf_mem_events__list(void);
+void perf_pmu__mem_events_list(struct perf_pmu *pmu);
int perf_mem_events__record_args(const char **rec_argv, int *argv_nr,
char **rec_tmp, int *tmp_nr);

--
2.35.1

2023-12-07 19:24:42

by Liang, Kan

[permalink] [raw]
Subject: [PATCH V2 3/5] perf mem: Clean up perf_mem_events__name()

From: Kan Liang <[email protected]>

Introduce a generic perf_mem_events__name(). Remove the ARCH-specific
one.

The mem_load events may have a different format. Add ldlat and aux_event
in the struct perf_mem_event to indicate the format and the extra aux
event.

Add perf_mem_events_intel_aux[] to support the extra mem_load_aux event.

Rename perf_mem_events__name to perf_pmu__mem_events_name.

Tested-by: Ravi Bangoria <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
tools/perf/arch/arm64/util/mem-events.c | 26 ++-------
tools/perf/arch/powerpc/util/mem-events.c | 13 ++---
tools/perf/arch/powerpc/util/mem-events.h | 7 +++
tools/perf/arch/powerpc/util/pmu.c | 11 ++++
tools/perf/arch/x86/util/mem-events.c | 70 +++++------------------
tools/perf/arch/x86/util/mem-events.h | 1 +
tools/perf/arch/x86/util/pmu.c | 8 ++-
tools/perf/util/mem-events.c | 56 ++++++++++++------
tools/perf/util/mem-events.h | 3 +-
9 files changed, 89 insertions(+), 106 deletions(-)
create mode 100644 tools/perf/arch/powerpc/util/mem-events.h
create mode 100644 tools/perf/arch/powerpc/util/pmu.c

diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
index 2602e8688727..eb2ef84f0fc8 100644
--- a/tools/perf/arch/arm64/util/mem-events.c
+++ b/tools/perf/arch/arm64/util/mem-events.c
@@ -2,28 +2,10 @@
#include "map_symbol.h"
#include "mem-events.h"

-#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
+#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }

struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
- E("spe-load", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0"),
- E("spe-store", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0"),
- E("spe-ldst", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0"),
+ E("spe-load", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0", true, 0),
+ E("spe-store", "%s/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0", false, 0),
+ E("spe-ldst", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0", true, 0),
};
-
-static char mem_ev_name[100];
-
-const char *perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
-{
- struct perf_mem_event *e = &perf_mem_events_arm[i];
-
- if (i >= PERF_MEM_EVENTS__MAX)
- return NULL;
-
- if (i == PERF_MEM_EVENTS__LOAD || i == PERF_MEM_EVENTS__LOAD_STORE)
- scnprintf(mem_ev_name, sizeof(mem_ev_name),
- e->name, perf_mem_events__loads_ldlat);
- else /* PERF_MEM_EVENTS__STORE */
- scnprintf(mem_ev_name, sizeof(mem_ev_name), e->name);
-
- return mem_ev_name;
-}
diff --git a/tools/perf/arch/powerpc/util/mem-events.c b/tools/perf/arch/powerpc/util/mem-events.c
index 78b986e5268d..b7883e38950f 100644
--- a/tools/perf/arch/powerpc/util/mem-events.c
+++ b/tools/perf/arch/powerpc/util/mem-events.c
@@ -2,11 +2,10 @@
#include "map_symbol.h"
#include "mem-events.h"

-/* PowerPC does not support 'ldlat' parameter. */
-const char *perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
-{
- if (i == PERF_MEM_EVENTS__LOAD)
- return "cpu/mem-loads/";
+#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }

- return "cpu/mem-stores/";
-}
+struct perf_mem_event perf_mem_events_power[PERF_MEM_EVENTS__MAX] = {
+ E("ldlat-loads", "%s/mem-loads/", "cpu/events/mem-loads", false, 0),
+ E("ldlat-stores", "%s/mem-stores/", "cpu/events/mem-stores", false, 0),
+ E(NULL, NULL, NULL, false, 0),
+};
diff --git a/tools/perf/arch/powerpc/util/mem-events.h b/tools/perf/arch/powerpc/util/mem-events.h
new file mode 100644
index 000000000000..6acc3d1b6873
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/mem-events.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _POWER_MEM_EVENTS_H
+#define _POWER_MEM_EVENTS_H
+
+extern struct perf_mem_event perf_mem_events_power[PERF_MEM_EVENTS__MAX];
+
+#endif /* _POWER_MEM_EVENTS_H */
diff --git a/tools/perf/arch/powerpc/util/pmu.c b/tools/perf/arch/powerpc/util/pmu.c
new file mode 100644
index 000000000000..168173f88ddb
--- /dev/null
+++ b/tools/perf/arch/powerpc/util/pmu.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <string.h>
+
+#include "../../../util/pmu.h"
+
+void perf_pmu__arch_init(struct perf_pmu *pmu)
+{
+ if (pmu->is_core)
+ pmu->mem_events = perf_mem_events_power;
+}
diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
index 5fb41d50118d..f0e66a0151a0 100644
--- a/tools/perf/arch/x86/util/mem-events.c
+++ b/tools/perf/arch/x86/util/mem-events.c
@@ -7,25 +7,26 @@
#include "linux/string.h"
#include "env.h"

-static char mem_loads_name[100];
-static bool mem_loads_name__init;
-static char mem_stores_name[100];
-
#define MEM_LOADS_AUX 0x8203
-#define MEM_LOADS_AUX_NAME "{%s/mem-loads-aux/,%s/mem-loads,ldlat=%u/}:P"

-#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
+#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }

struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = {
- E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "%s/events/mem-loads"),
- E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores"),
- E(NULL, NULL, NULL),
+ E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "%s/events/mem-loads", true, 0),
+ E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores", false, 0),
+ E(NULL, NULL, NULL, false, 0),
+};
+
+struct perf_mem_event perf_mem_events_intel_aux[PERF_MEM_EVENTS__MAX] = {
+ E("ldlat-loads", "{%s/mem-loads-aux/,%s/mem-loads,ldlat=%u/}:P", "%s/events/mem-loads", true, MEM_LOADS_AUX),
+ E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores", false, 0),
+ E(NULL, NULL, NULL, false, 0),
};

struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
- E(NULL, NULL, NULL),
- E(NULL, NULL, NULL),
- E("mem-ldst", "ibs_op//", "ibs_op"),
+ E(NULL, NULL, NULL, false, 0),
+ E(NULL, NULL, NULL, false, 0),
+ E("mem-ldst", "%s//", "ibs_op", false, 0),
};

bool is_mem_loads_aux_event(struct evsel *leader)
@@ -40,48 +41,3 @@ bool is_mem_loads_aux_event(struct evsel *leader)

return leader->core.attr.config == MEM_LOADS_AUX;
}
-
-const char *perf_mem_events__name(int i, const char *pmu_name)
-{
- struct perf_mem_event *e;
-
- if (x86__is_amd_cpu())
- e = &perf_mem_events_amd[i];
- else
- e = &perf_mem_events_intel[i];
-
- if (!e)
- return NULL;
-
- if (i == PERF_MEM_EVENTS__LOAD) {
- if (mem_loads_name__init && !pmu_name)
- return mem_loads_name;
-
- if (!pmu_name) {
- mem_loads_name__init = true;
- pmu_name = "cpu";
- }
-
- if (perf_pmus__have_event(pmu_name, "mem-loads-aux")) {
- scnprintf(mem_loads_name, sizeof(mem_loads_name),
- MEM_LOADS_AUX_NAME, pmu_name, pmu_name,
- perf_mem_events__loads_ldlat);
- } else {
- scnprintf(mem_loads_name, sizeof(mem_loads_name),
- e->name, pmu_name,
- perf_mem_events__loads_ldlat);
- }
- return mem_loads_name;
- }
-
- if (i == PERF_MEM_EVENTS__STORE) {
- if (!pmu_name)
- pmu_name = "cpu";
-
- scnprintf(mem_stores_name, sizeof(mem_stores_name),
- e->name, pmu_name);
- return mem_stores_name;
- }
-
- return e->name;
-}
diff --git a/tools/perf/arch/x86/util/mem-events.h b/tools/perf/arch/x86/util/mem-events.h
index 3959e427f482..f55c8d3b7d59 100644
--- a/tools/perf/arch/x86/util/mem-events.h
+++ b/tools/perf/arch/x86/util/mem-events.h
@@ -3,6 +3,7 @@
#define _X86_MEM_EVENTS_H

extern struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX];
+extern struct perf_mem_event perf_mem_events_intel_aux[PERF_MEM_EVENTS__MAX];

extern struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX];

diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
index cd22e80e5657..0f49ff13cfe2 100644
--- a/tools/perf/arch/x86/util/pmu.c
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -35,8 +35,12 @@ void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
if (x86__is_amd_cpu()) {
if (!strcmp(pmu->name, "ibs_op"))
pmu->mem_events = perf_mem_events_amd;
- } else if (pmu->is_core)
- pmu->mem_events = perf_mem_events_intel;
+ } else if (pmu->is_core) {
+ if (perf_pmu__have_event(pmu, "mem-loads-aux"))
+ pmu->mem_events = perf_mem_events_intel_aux;
+ else
+ pmu->mem_events = perf_mem_events_intel;
+ }
}

int perf_pmus__num_mem_pmus(void)
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 27a33dc44964..c9a40b64e538 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -17,17 +17,17 @@

unsigned int perf_mem_events__loads_ldlat = 30;

-#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
+#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }

struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
- E("ldlat-loads", "cpu/mem-loads,ldlat=%u/P", "cpu/events/mem-loads"),
- E("ldlat-stores", "cpu/mem-stores/P", "cpu/events/mem-stores"),
- E(NULL, NULL, NULL),
+ E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "cpu/events/mem-loads", true, 0),
+ E("ldlat-stores", "%s/mem-stores/P", "cpu/events/mem-stores", false, 0),
+ E(NULL, NULL, NULL, false, 0),
};
#undef E

static char mem_loads_name[100];
-static bool mem_loads_name__init;
+static char mem_stores_name[100];

struct perf_mem_event *perf_pmu__mem_events_ptr(struct perf_pmu *pmu, int i)
{
@@ -62,23 +62,45 @@ struct perf_pmu *perf_mem_events_find_pmu(void)
return perf_pmus__scan_mem(NULL);
}

-const char * __weak perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
+static const char *perf_pmu__mem_events_name(int i, struct perf_pmu *pmu)
{
- struct perf_mem_event *e = &perf_mem_events[i];
+ struct perf_mem_event *e = &pmu->mem_events[i];

if (!e)
return NULL;

- if (i == PERF_MEM_EVENTS__LOAD) {
- if (!mem_loads_name__init) {
- mem_loads_name__init = true;
- scnprintf(mem_loads_name, sizeof(mem_loads_name),
- e->name, perf_mem_events__loads_ldlat);
+ if (i == PERF_MEM_EVENTS__LOAD || i == PERF_MEM_EVENTS__LOAD_STORE) {
+ if (e->ldlat) {
+ if (!e->aux_event) {
+ /* ARM and Most of Intel */
+ scnprintf(mem_loads_name, sizeof(mem_loads_name),
+ e->name, pmu->name,
+ perf_mem_events__loads_ldlat);
+ } else {
+ /* Intel with mem-loads-aux event */
+ scnprintf(mem_loads_name, sizeof(mem_loads_name),
+ e->name, pmu->name, pmu->name,
+ perf_mem_events__loads_ldlat);
+ }
+ } else {
+ if (!e->aux_event) {
+ /* AMD and POWER */
+ scnprintf(mem_loads_name, sizeof(mem_loads_name),
+ e->name, pmu->name);
+ } else
+ return NULL;
}
+
return mem_loads_name;
}

- return e->name;
+ if (i == PERF_MEM_EVENTS__STORE) {
+ scnprintf(mem_stores_name, sizeof(mem_stores_name),
+ e->name, pmu->name);
+ return mem_stores_name;
+ }
+
+ return NULL;
}

__weak bool is_mem_loads_aux_event(struct evsel *leader __maybe_unused)
@@ -175,7 +197,7 @@ void perf_pmu__mem_events_list(struct perf_pmu *pmu)
e->tag ? 13 : 0,
e->tag ? : "",
e->tag && verbose > 0 ? 25 : 0,
- e->tag && verbose > 0 ? perf_mem_events__name(j, NULL) : "",
+ e->tag && verbose > 0 ? perf_pmu__mem_events_name(j, pmu) : "",
e->supported ? ": available\n" : "");
}
}
@@ -198,15 +220,15 @@ int perf_mem_events__record_args(const char **rec_argv, int *argv_nr,

if (!e->supported) {
pr_err("failed: event '%s' not supported\n",
- perf_mem_events__name(j, pmu->name));
+ perf_pmu__mem_events_name(j, pmu));
return -1;
}

if (perf_pmus__num_mem_pmus() == 1) {
rec_argv[i++] = "-e";
- rec_argv[i++] = perf_mem_events__name(j, NULL);
+ rec_argv[i++] = perf_pmu__mem_events_name(j, pmu);
} else {
- const char *s = perf_mem_events__name(j, pmu->name);
+ const char *s = perf_pmu__mem_events_name(j, pmu);

if (!perf_mem_event__supported(mnt, pmu, e))
continue;
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 0ad301a2e424..79d342768d12 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -14,6 +14,8 @@
struct perf_mem_event {
bool record;
bool supported;
+ bool ldlat;
+ u32 aux_event;
const char *tag;
const char *name;
const char *sysfs_name;
@@ -39,7 +41,6 @@ extern struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX];
int perf_pmu__mem_events_parse(struct perf_pmu *pmu, const char *str);
int perf_pmu__mem_events_init(struct perf_pmu *pmu);

-const char *perf_mem_events__name(int i, const char *pmu_name);
struct perf_mem_event *perf_pmu__mem_events_ptr(struct perf_pmu *pmu, int i);
struct perf_pmu *perf_mem_events_find_pmu(void);
bool is_mem_loads_aux_event(struct evsel *leader);
--
2.35.1

2023-12-07 20:32:16

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] Clean up perf mem

Em Thu, Dec 07, 2023 at 11:23:33AM -0800, [email protected] escreveu:
> From: Kan Liang <[email protected]>
>
> Changes since V1:
> - Fix strcmp of PMU name checking (Ravi)
> - Fix "/," typo (Ian)
> - Rename several functions with perf_pmu__mem_events prefix. (Ian)
> - Fold the header removal patch into the patch where the cleanups made.
> (Arnaldo)
> - Add reviewed-by and tested-by from Ian and Ravi

It would be good to have a Tested-by from people working in all the
architectures affectes, like we got from Ravi for AMD, can we get those?

I'm applying it locally for test building, will push to
perf-tools-next/tmp.perf-tools-next for a while, so there is some time
to test.

ARM64 (Leo?) and ppc, for PPC... humm Ravi did it, who could test it now?

- Arnaldo

> As discussed in the below thread, the patch set is to clean up perf mem.
> https://lore.kernel.org/lkml/[email protected]/
>
> Introduce generic functions perf_mem_events__ptr(),
> perf_mem_events__name() ,and is_mem_loads_aux_event() to replace the
> ARCH specific ones.
> Simplify the perf_mem_event__supported().
>
> Only keeps the ARCH-specific perf_mem_events array in the corresponding
> mem-events.c for each ARCH.
>
> There is no functional change.
>
> The patch set touches almost all the ARCHs, Intel, AMD, ARM, Power and
> etc. But I can only test it on two Intel platforms.
> Please give it try, if you have machines with other ARCHs.
>
> Here are the test results:
> Intel hybrid machine:
>
> $perf mem record -e list
> ldlat-loads : available
> ldlat-stores : available
>
> $perf mem record -e ldlat-loads -v --ldlat 50
> calling: record -e cpu_atom/mem-loads,ldlat=50/P -e cpu_core/mem-loads,ldlat=50/P
>
> $perf mem record -v
> calling: record -e cpu_atom/mem-loads,ldlat=30/P -e cpu_atom/mem-stores/P -e cpu_core/mem-loads,ldlat=30/P -e cpu_core/mem-stores/P
>
> $perf mem record -t store -v
> calling: record -e cpu_atom/mem-stores/P -e cpu_core/mem-stores/P
>
>
> Intel SPR:
> $perf mem record -e list
> ldlat-loads : available
> ldlat-stores : available
>
> $perf mem record -e ldlat-loads -v --ldlat 50
> calling: record -e {cpu/mem-loads-aux/,cpu/mem-loads,ldlat=50/}:P
>
> $perf mem record -v
> calling: record -e {cpu/mem-loads-aux/,cpu/mem-loads,ldlat=30/}:P -e cpu/mem-stores/P
>
> $perf mem record -t store -v
> calling: record -e cpu/mem-stores/P
>
> Kan Liang (5):
> perf mem: Add mem_events into the supported perf_pmu
> perf mem: Clean up perf_mem_events__ptr()
> perf mem: Clean up perf_mem_events__name()
> perf mem: Clean up perf_mem_event__supported()
> perf mem: Clean up is_mem_loads_aux_event()
>
> tools/perf/arch/arm64/util/mem-events.c | 36 +----
> tools/perf/arch/arm64/util/pmu.c | 6 +
> tools/perf/arch/powerpc/util/mem-events.c | 13 +-
> tools/perf/arch/powerpc/util/mem-events.h | 7 +
> tools/perf/arch/powerpc/util/pmu.c | 11 ++
> tools/perf/arch/s390/util/pmu.c | 3 +
> tools/perf/arch/x86/util/mem-events.c | 99 ++----------
> tools/perf/arch/x86/util/pmu.c | 11 ++
> tools/perf/builtin-c2c.c | 28 +++-
> tools/perf/builtin-mem.c | 28 +++-
> tools/perf/util/mem-events.c | 181 +++++++++++++---------
> tools/perf/util/mem-events.h | 15 +-
> tools/perf/util/pmu.c | 4 +-
> tools/perf/util/pmu.h | 7 +
> 14 files changed, 233 insertions(+), 216 deletions(-)
> create mode 100644 tools/perf/arch/powerpc/util/mem-events.h
> create mode 100644 tools/perf/arch/powerpc/util/pmu.c
>
> --
> 2.35.1
>

--

- Arnaldo

2023-12-08 00:01:40

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] perf mem: Clean up perf_mem_events__name()

On Thu, Dec 7, 2023 at 11:24 AM <[email protected]> wrote:
>
> From: Kan Liang <[email protected]>
>
> Introduce a generic perf_mem_events__name(). Remove the ARCH-specific
> one.
>
> The mem_load events may have a different format. Add ldlat and aux_event
> in the struct perf_mem_event to indicate the format and the extra aux
> event.
>
> Add perf_mem_events_intel_aux[] to support the extra mem_load_aux event.
>
> Rename perf_mem_events__name to perf_pmu__mem_events_name.
>
> Tested-by: Ravi Bangoria <[email protected]>
> Signed-off-by: Kan Liang <[email protected]>

Reviewed-by: Ian Rogers <[email protected]>

Thanks,
Ian

> ---
> tools/perf/arch/arm64/util/mem-events.c | 26 ++-------
> tools/perf/arch/powerpc/util/mem-events.c | 13 ++---
> tools/perf/arch/powerpc/util/mem-events.h | 7 +++
> tools/perf/arch/powerpc/util/pmu.c | 11 ++++
> tools/perf/arch/x86/util/mem-events.c | 70 +++++------------------
> tools/perf/arch/x86/util/mem-events.h | 1 +
> tools/perf/arch/x86/util/pmu.c | 8 ++-
> tools/perf/util/mem-events.c | 56 ++++++++++++------
> tools/perf/util/mem-events.h | 3 +-
> 9 files changed, 89 insertions(+), 106 deletions(-)
> create mode 100644 tools/perf/arch/powerpc/util/mem-events.h
> create mode 100644 tools/perf/arch/powerpc/util/pmu.c
>
> diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
> index 2602e8688727..eb2ef84f0fc8 100644
> --- a/tools/perf/arch/arm64/util/mem-events.c
> +++ b/tools/perf/arch/arm64/util/mem-events.c
> @@ -2,28 +2,10 @@
> #include "map_symbol.h"
> #include "mem-events.h"
>
> -#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
> +#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
>
> struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
> - E("spe-load", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0"),
> - E("spe-store", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0"),
> - E("spe-ldst", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0"),
> + E("spe-load", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0", true, 0),
> + E("spe-store", "%s/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0", false, 0),
> + E("spe-ldst", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0", true, 0),
> };
> -
> -static char mem_ev_name[100];
> -
> -const char *perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
> -{
> - struct perf_mem_event *e = &perf_mem_events_arm[i];
> -
> - if (i >= PERF_MEM_EVENTS__MAX)
> - return NULL;
> -
> - if (i == PERF_MEM_EVENTS__LOAD || i == PERF_MEM_EVENTS__LOAD_STORE)
> - scnprintf(mem_ev_name, sizeof(mem_ev_name),
> - e->name, perf_mem_events__loads_ldlat);
> - else /* PERF_MEM_EVENTS__STORE */
> - scnprintf(mem_ev_name, sizeof(mem_ev_name), e->name);
> -
> - return mem_ev_name;
> -}
> diff --git a/tools/perf/arch/powerpc/util/mem-events.c b/tools/perf/arch/powerpc/util/mem-events.c
> index 78b986e5268d..b7883e38950f 100644
> --- a/tools/perf/arch/powerpc/util/mem-events.c
> +++ b/tools/perf/arch/powerpc/util/mem-events.c
> @@ -2,11 +2,10 @@
> #include "map_symbol.h"
> #include "mem-events.h"
>
> -/* PowerPC does not support 'ldlat' parameter. */
> -const char *perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
> -{
> - if (i == PERF_MEM_EVENTS__LOAD)
> - return "cpu/mem-loads/";
> +#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
>
> - return "cpu/mem-stores/";
> -}
> +struct perf_mem_event perf_mem_events_power[PERF_MEM_EVENTS__MAX] = {
> + E("ldlat-loads", "%s/mem-loads/", "cpu/events/mem-loads", false, 0),
> + E("ldlat-stores", "%s/mem-stores/", "cpu/events/mem-stores", false, 0),
> + E(NULL, NULL, NULL, false, 0),
> +};
> diff --git a/tools/perf/arch/powerpc/util/mem-events.h b/tools/perf/arch/powerpc/util/mem-events.h
> new file mode 100644
> index 000000000000..6acc3d1b6873
> --- /dev/null
> +++ b/tools/perf/arch/powerpc/util/mem-events.h
> @@ -0,0 +1,7 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _POWER_MEM_EVENTS_H
> +#define _POWER_MEM_EVENTS_H
> +
> +extern struct perf_mem_event perf_mem_events_power[PERF_MEM_EVENTS__MAX];
> +
> +#endif /* _POWER_MEM_EVENTS_H */
> diff --git a/tools/perf/arch/powerpc/util/pmu.c b/tools/perf/arch/powerpc/util/pmu.c
> new file mode 100644
> index 000000000000..168173f88ddb
> --- /dev/null
> +++ b/tools/perf/arch/powerpc/util/pmu.c
> @@ -0,0 +1,11 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <string.h>
> +
> +#include "../../../util/pmu.h"
> +
> +void perf_pmu__arch_init(struct perf_pmu *pmu)
> +{
> + if (pmu->is_core)
> + pmu->mem_events = perf_mem_events_power;
> +}
> diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
> index 5fb41d50118d..f0e66a0151a0 100644
> --- a/tools/perf/arch/x86/util/mem-events.c
> +++ b/tools/perf/arch/x86/util/mem-events.c
> @@ -7,25 +7,26 @@
> #include "linux/string.h"
> #include "env.h"
>
> -static char mem_loads_name[100];
> -static bool mem_loads_name__init;
> -static char mem_stores_name[100];
> -
> #define MEM_LOADS_AUX 0x8203
> -#define MEM_LOADS_AUX_NAME "{%s/mem-loads-aux/,%s/mem-loads,ldlat=%u/}:P"
>
> -#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
> +#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
>
> struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX] = {
> - E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "%s/events/mem-loads"),
> - E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores"),
> - E(NULL, NULL, NULL),
> + E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "%s/events/mem-loads", true, 0),
> + E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores", false, 0),
> + E(NULL, NULL, NULL, false, 0),
> +};
> +
> +struct perf_mem_event perf_mem_events_intel_aux[PERF_MEM_EVENTS__MAX] = {
> + E("ldlat-loads", "{%s/mem-loads-aux/,%s/mem-loads,ldlat=%u/}:P", "%s/events/mem-loads", true, MEM_LOADS_AUX),
> + E("ldlat-stores", "%s/mem-stores/P", "%s/events/mem-stores", false, 0),
> + E(NULL, NULL, NULL, false, 0),
> };
>
> struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
> - E(NULL, NULL, NULL),
> - E(NULL, NULL, NULL),
> - E("mem-ldst", "ibs_op//", "ibs_op"),
> + E(NULL, NULL, NULL, false, 0),
> + E(NULL, NULL, NULL, false, 0),
> + E("mem-ldst", "%s//", "ibs_op", false, 0),
> };
>
> bool is_mem_loads_aux_event(struct evsel *leader)
> @@ -40,48 +41,3 @@ bool is_mem_loads_aux_event(struct evsel *leader)
>
> return leader->core.attr.config == MEM_LOADS_AUX;
> }
> -
> -const char *perf_mem_events__name(int i, const char *pmu_name)
> -{
> - struct perf_mem_event *e;
> -
> - if (x86__is_amd_cpu())
> - e = &perf_mem_events_amd[i];
> - else
> - e = &perf_mem_events_intel[i];
> -
> - if (!e)
> - return NULL;
> -
> - if (i == PERF_MEM_EVENTS__LOAD) {
> - if (mem_loads_name__init && !pmu_name)
> - return mem_loads_name;
> -
> - if (!pmu_name) {
> - mem_loads_name__init = true;
> - pmu_name = "cpu";
> - }
> -
> - if (perf_pmus__have_event(pmu_name, "mem-loads-aux")) {
> - scnprintf(mem_loads_name, sizeof(mem_loads_name),
> - MEM_LOADS_AUX_NAME, pmu_name, pmu_name,
> - perf_mem_events__loads_ldlat);
> - } else {
> - scnprintf(mem_loads_name, sizeof(mem_loads_name),
> - e->name, pmu_name,
> - perf_mem_events__loads_ldlat);
> - }
> - return mem_loads_name;
> - }
> -
> - if (i == PERF_MEM_EVENTS__STORE) {
> - if (!pmu_name)
> - pmu_name = "cpu";
> -
> - scnprintf(mem_stores_name, sizeof(mem_stores_name),
> - e->name, pmu_name);
> - return mem_stores_name;
> - }
> -
> - return e->name;
> -}
> diff --git a/tools/perf/arch/x86/util/mem-events.h b/tools/perf/arch/x86/util/mem-events.h
> index 3959e427f482..f55c8d3b7d59 100644
> --- a/tools/perf/arch/x86/util/mem-events.h
> +++ b/tools/perf/arch/x86/util/mem-events.h
> @@ -3,6 +3,7 @@
> #define _X86_MEM_EVENTS_H
>
> extern struct perf_mem_event perf_mem_events_intel[PERF_MEM_EVENTS__MAX];
> +extern struct perf_mem_event perf_mem_events_intel_aux[PERF_MEM_EVENTS__MAX];
>
> extern struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX];
>
> diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
> index cd22e80e5657..0f49ff13cfe2 100644
> --- a/tools/perf/arch/x86/util/pmu.c
> +++ b/tools/perf/arch/x86/util/pmu.c
> @@ -35,8 +35,12 @@ void perf_pmu__arch_init(struct perf_pmu *pmu __maybe_unused)
> if (x86__is_amd_cpu()) {
> if (!strcmp(pmu->name, "ibs_op"))
> pmu->mem_events = perf_mem_events_amd;
> - } else if (pmu->is_core)
> - pmu->mem_events = perf_mem_events_intel;
> + } else if (pmu->is_core) {
> + if (perf_pmu__have_event(pmu, "mem-loads-aux"))
> + pmu->mem_events = perf_mem_events_intel_aux;
> + else
> + pmu->mem_events = perf_mem_events_intel;
> + }
> }
>
> int perf_pmus__num_mem_pmus(void)
> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> index 27a33dc44964..c9a40b64e538 100644
> --- a/tools/perf/util/mem-events.c
> +++ b/tools/perf/util/mem-events.c
> @@ -17,17 +17,17 @@
>
> unsigned int perf_mem_events__loads_ldlat = 30;
>
> -#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
> +#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
>
> struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
> - E("ldlat-loads", "cpu/mem-loads,ldlat=%u/P", "cpu/events/mem-loads"),
> - E("ldlat-stores", "cpu/mem-stores/P", "cpu/events/mem-stores"),
> - E(NULL, NULL, NULL),
> + E("ldlat-loads", "%s/mem-loads,ldlat=%u/P", "cpu/events/mem-loads", true, 0),
> + E("ldlat-stores", "%s/mem-stores/P", "cpu/events/mem-stores", false, 0),
> + E(NULL, NULL, NULL, false, 0),
> };
> #undef E
>
> static char mem_loads_name[100];
> -static bool mem_loads_name__init;
> +static char mem_stores_name[100];
>
> struct perf_mem_event *perf_pmu__mem_events_ptr(struct perf_pmu *pmu, int i)
> {
> @@ -62,23 +62,45 @@ struct perf_pmu *perf_mem_events_find_pmu(void)
> return perf_pmus__scan_mem(NULL);
> }
>
> -const char * __weak perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
> +static const char *perf_pmu__mem_events_name(int i, struct perf_pmu *pmu)
> {
> - struct perf_mem_event *e = &perf_mem_events[i];
> + struct perf_mem_event *e = &pmu->mem_events[i];
>
> if (!e)
> return NULL;
>
> - if (i == PERF_MEM_EVENTS__LOAD) {
> - if (!mem_loads_name__init) {
> - mem_loads_name__init = true;
> - scnprintf(mem_loads_name, sizeof(mem_loads_name),
> - e->name, perf_mem_events__loads_ldlat);
> + if (i == PERF_MEM_EVENTS__LOAD || i == PERF_MEM_EVENTS__LOAD_STORE) {
> + if (e->ldlat) {
> + if (!e->aux_event) {
> + /* ARM and Most of Intel */
> + scnprintf(mem_loads_name, sizeof(mem_loads_name),
> + e->name, pmu->name,
> + perf_mem_events__loads_ldlat);
> + } else {
> + /* Intel with mem-loads-aux event */
> + scnprintf(mem_loads_name, sizeof(mem_loads_name),
> + e->name, pmu->name, pmu->name,
> + perf_mem_events__loads_ldlat);
> + }
> + } else {
> + if (!e->aux_event) {
> + /* AMD and POWER */
> + scnprintf(mem_loads_name, sizeof(mem_loads_name),
> + e->name, pmu->name);
> + } else
> + return NULL;
> }
> +
> return mem_loads_name;
> }
>
> - return e->name;
> + if (i == PERF_MEM_EVENTS__STORE) {
> + scnprintf(mem_stores_name, sizeof(mem_stores_name),
> + e->name, pmu->name);
> + return mem_stores_name;
> + }
> +
> + return NULL;
> }
>
> __weak bool is_mem_loads_aux_event(struct evsel *leader __maybe_unused)
> @@ -175,7 +197,7 @@ void perf_pmu__mem_events_list(struct perf_pmu *pmu)
> e->tag ? 13 : 0,
> e->tag ? : "",
> e->tag && verbose > 0 ? 25 : 0,
> - e->tag && verbose > 0 ? perf_mem_events__name(j, NULL) : "",
> + e->tag && verbose > 0 ? perf_pmu__mem_events_name(j, pmu) : "",
> e->supported ? ": available\n" : "");
> }
> }
> @@ -198,15 +220,15 @@ int perf_mem_events__record_args(const char **rec_argv, int *argv_nr,
>
> if (!e->supported) {
> pr_err("failed: event '%s' not supported\n",
> - perf_mem_events__name(j, pmu->name));
> + perf_pmu__mem_events_name(j, pmu));
> return -1;
> }
>
> if (perf_pmus__num_mem_pmus() == 1) {
> rec_argv[i++] = "-e";
> - rec_argv[i++] = perf_mem_events__name(j, NULL);
> + rec_argv[i++] = perf_pmu__mem_events_name(j, pmu);
> } else {
> - const char *s = perf_mem_events__name(j, pmu->name);
> + const char *s = perf_pmu__mem_events_name(j, pmu);
>
> if (!perf_mem_event__supported(mnt, pmu, e))
> continue;
> diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
> index 0ad301a2e424..79d342768d12 100644
> --- a/tools/perf/util/mem-events.h
> +++ b/tools/perf/util/mem-events.h
> @@ -14,6 +14,8 @@
> struct perf_mem_event {
> bool record;
> bool supported;
> + bool ldlat;
> + u32 aux_event;
> const char *tag;
> const char *name;
> const char *sysfs_name;
> @@ -39,7 +41,6 @@ extern struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX];
> int perf_pmu__mem_events_parse(struct perf_pmu *pmu, const char *str);
> int perf_pmu__mem_events_init(struct perf_pmu *pmu);
>
> -const char *perf_mem_events__name(int i, const char *pmu_name);
> struct perf_mem_event *perf_pmu__mem_events_ptr(struct perf_pmu *pmu, int i);
> struct perf_pmu *perf_mem_events_find_pmu(void);
> bool is_mem_loads_aux_event(struct evsel *leader);
> --
> 2.35.1
>

2023-12-08 10:31:33

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] perf mem: Add mem_events into the supported perf_pmu

On Thu, Dec 07, 2023 at 11:23:34AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> With the mem_events, perf doesn't need to read sysfs for each PMU to
> find the mem-events-supported PMU. The patch also makes it possible to
> clean up the related __weak functions later.
>
> The patch is only to add the mem_events into the perf_pmu for all ARCHs.
> It will be used in the later cleanup patches.
>
> Reviewed-by: Ian Rogers <[email protected]>
> Tested-by: Ravi Bangoria <[email protected]>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/arch/arm64/util/mem-events.c | 4 ++--
> tools/perf/arch/arm64/util/mem-events.h | 7 +++++++
> tools/perf/arch/arm64/util/pmu.c | 6 ++++++
> tools/perf/arch/s390/util/pmu.c | 3 +++
> tools/perf/arch/x86/util/mem-events.c | 4 ++--
> tools/perf/arch/x86/util/mem-events.h | 9 +++++++++
> tools/perf/arch/x86/util/pmu.c | 7 +++++++
> tools/perf/util/mem-events.c | 2 +-
> tools/perf/util/mem-events.h | 1 +
> tools/perf/util/pmu.c | 4 +++-
> tools/perf/util/pmu.h | 7 +++++++
> 11 files changed, 48 insertions(+), 6 deletions(-)
> create mode 100644 tools/perf/arch/arm64/util/mem-events.h
> create mode 100644 tools/perf/arch/x86/util/mem-events.h
>
> diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
> index 3bcc5c7035c2..aaa4804922b4 100644
> --- a/tools/perf/arch/arm64/util/mem-events.c
> +++ b/tools/perf/arch/arm64/util/mem-events.c
> @@ -4,7 +4,7 @@
>
> #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
>
> -static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
> +struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
> E("spe-load", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0"),
> E("spe-store", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0"),
> E("spe-ldst", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0"),
> @@ -17,7 +17,7 @@ struct perf_mem_event *perf_mem_events__ptr(int i)
> if (i >= PERF_MEM_EVENTS__MAX)
> return NULL;
>
> - return &perf_mem_events[i];
> + return &perf_mem_events_arm[i];

I recognized that it's hard code to "arm_spe_0", which might break if
system registers different Arm SPE groups. But this is not the issue
introduced by this patch, we might need to consider to fix it later.

> }
>
> const char *perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
> diff --git a/tools/perf/arch/arm64/util/mem-events.h b/tools/perf/arch/arm64/util/mem-events.h
> new file mode 100644
> index 000000000000..5fc50be4be38
> --- /dev/null
> +++ b/tools/perf/arch/arm64/util/mem-events.h
> @@ -0,0 +1,7 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ARM64_MEM_EVENTS_H
> +#define _ARM64_MEM_EVENTS_H
> +
> +extern struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX];
> +
> +#endif /* _ARM64_MEM_EVENTS_H */
> diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
> index 2a4eab2d160e..06ec9b838807 100644
> --- a/tools/perf/arch/arm64/util/pmu.c
> +++ b/tools/perf/arch/arm64/util/pmu.c
> @@ -8,6 +8,12 @@
> #include <api/fs/fs.h>
> #include <math.h>
>
> +void perf_pmu__arch_init(struct perf_pmu *pmu)
> +{
> + if (!strcmp(pmu->name, "arm_spe_0"))
> + pmu->mem_events = perf_mem_events_arm;

This is not right and it should cause building failure on aarch64.

aarch64 reuses aarch32's file arch/arm/util/pmu.c, and this file has
already defined perf_pmu__arch_init(), you should add above change in
the file arch/arm/util/pmu.c.

Now I cannot access a machine for testing Arm SPE, but I will play
a bit for this patch set to ensure it can pass compilation. After
that, I will seek Arm maintainers/reviewers help for the test.

Thanks,
Leo

2023-12-08 18:15:49

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] perf mem: Add mem_events into the supported perf_pmu



On 2023-12-08 5:29 a.m., Leo Yan wrote:
> On Thu, Dec 07, 2023 at 11:23:34AM -0800, [email protected] wrote:
>> From: Kan Liang <[email protected]>
>>
>> With the mem_events, perf doesn't need to read sysfs for each PMU to
>> find the mem-events-supported PMU. The patch also makes it possible to
>> clean up the related __weak functions later.
>>
>> The patch is only to add the mem_events into the perf_pmu for all ARCHs.
>> It will be used in the later cleanup patches.
>>
>> Reviewed-by: Ian Rogers <[email protected]>
>> Tested-by: Ravi Bangoria <[email protected]>
>> Signed-off-by: Kan Liang <[email protected]>
>> ---
>> tools/perf/arch/arm64/util/mem-events.c | 4 ++--
>> tools/perf/arch/arm64/util/mem-events.h | 7 +++++++
>> tools/perf/arch/arm64/util/pmu.c | 6 ++++++
>> tools/perf/arch/s390/util/pmu.c | 3 +++
>> tools/perf/arch/x86/util/mem-events.c | 4 ++--
>> tools/perf/arch/x86/util/mem-events.h | 9 +++++++++
>> tools/perf/arch/x86/util/pmu.c | 7 +++++++
>> tools/perf/util/mem-events.c | 2 +-
>> tools/perf/util/mem-events.h | 1 +
>> tools/perf/util/pmu.c | 4 +++-
>> tools/perf/util/pmu.h | 7 +++++++
>> 11 files changed, 48 insertions(+), 6 deletions(-)
>> create mode 100644 tools/perf/arch/arm64/util/mem-events.h
>> create mode 100644 tools/perf/arch/x86/util/mem-events.h
>>
>> diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
>> index 3bcc5c7035c2..aaa4804922b4 100644
>> --- a/tools/perf/arch/arm64/util/mem-events.c
>> +++ b/tools/perf/arch/arm64/util/mem-events.c
>> @@ -4,7 +4,7 @@
>>
>> #define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
>>
>> -static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
>> +struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
>> E("spe-load", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0"),
>> E("spe-store", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0"),
>> E("spe-ldst", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0"),
>> @@ -17,7 +17,7 @@ struct perf_mem_event *perf_mem_events__ptr(int i)
>> if (i >= PERF_MEM_EVENTS__MAX)
>> return NULL;
>>
>> - return &perf_mem_events[i];
>> + return &perf_mem_events_arm[i];
>
> I recognized that it's hard code to "arm_spe_0", which might break if
> system registers different Arm SPE groups. But this is not the issue
> introduced by this patch, we might need to consider to fix it later.
>
>> }
>>
>> const char *perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
>> diff --git a/tools/perf/arch/arm64/util/mem-events.h b/tools/perf/arch/arm64/util/mem-events.h
>> new file mode 100644
>> index 000000000000..5fc50be4be38
>> --- /dev/null
>> +++ b/tools/perf/arch/arm64/util/mem-events.h
>> @@ -0,0 +1,7 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _ARM64_MEM_EVENTS_H
>> +#define _ARM64_MEM_EVENTS_H
>> +
>> +extern struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX];
>> +
>> +#endif /* _ARM64_MEM_EVENTS_H */
>> diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c
>> index 2a4eab2d160e..06ec9b838807 100644
>> --- a/tools/perf/arch/arm64/util/pmu.c
>> +++ b/tools/perf/arch/arm64/util/pmu.c
>> @@ -8,6 +8,12 @@
>> #include <api/fs/fs.h>
>> #include <math.h>
>>
>> +void perf_pmu__arch_init(struct perf_pmu *pmu)
>> +{
>> + if (!strcmp(pmu->name, "arm_spe_0"))
>> + pmu->mem_events = perf_mem_events_arm;
>
> This is not right and it should cause building failure on aarch64.
>
> aarch64 reuses aarch32's file arch/arm/util/pmu.c, and this file has
> already defined perf_pmu__arch_init(), you should add above change in
> the file arch/arm/util/pmu.c.
>

Sure.

> Now I cannot access a machine for testing Arm SPE, but I will play
> a bit for this patch set to ensure it can pass compilation. After
> that, I will seek Arm maintainers/reviewers help for the test.
>

Thanks. I guess I will hold the v3 until the test is done in case there
are other issues found in ARM.

Thanks,
Kan

2023-12-09 04:32:43

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH V2 2/5] perf mem: Clean up perf_mem_events__ptr()

On Thu, Dec 07, 2023 at 11:23:35AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> The mem_events can be retrieved from the struct perf_pmu now. An ARCH
> specific perf_mem_events__ptr() is not required anymore. Remove all of
> them.
>
> The Intel hybrid has multiple mem-events-supported PMUs. But they share
> the same mem_events. Other ARCHs only support one mem-events-supported
> PMU. In the configuration, it's good enough to only configure the
> mem_events for one PMU. Add perf_mem_events_find_pmu() which returns the
> first mem-events-supported PMU.
>
> In the perf_mem_events__init(), the perf_pmus__scan() is not required
> anymore. It avoids checking the sysfs for every PMU on the system.
>
> Make the perf_mem_events__record_args() more generic. Remove the
> perf_mem_events__print_unsupport_hybrid().
>
> Since pmu is added as a new parameter, rename perf_mem_events__ptr() to
> perf_pmu__mem_events_ptr(). Several other functions also do a similar
> rename.
>
> Reviewed-by: Ian Rogers <[email protected]>
> Tested-by: Ravi Bangoria <[email protected]>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/arch/arm64/util/mem-events.c | 10 +--
> tools/perf/arch/x86/util/mem-events.c | 18 ++---
> tools/perf/builtin-c2c.c | 28 +++++--
> tools/perf/builtin-mem.c | 28 +++++--
> tools/perf/util/mem-events.c | 103 ++++++++++++------------
> tools/perf/util/mem-events.h | 9 ++-
> 6 files changed, 104 insertions(+), 92 deletions(-)
>
> diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
> index aaa4804922b4..2602e8688727 100644
> --- a/tools/perf/arch/arm64/util/mem-events.c
> +++ b/tools/perf/arch/arm64/util/mem-events.c
> @@ -12,17 +12,9 @@ struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
>
> static char mem_ev_name[100];
>
> -struct perf_mem_event *perf_mem_events__ptr(int i)
> -{
> - if (i >= PERF_MEM_EVENTS__MAX)
> - return NULL;
> -
> - return &perf_mem_events_arm[i];
> -}
> -
> const char *perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
> {
> - struct perf_mem_event *e = perf_mem_events__ptr(i);
> + struct perf_mem_event *e = &perf_mem_events_arm[i];
>
> if (i >= PERF_MEM_EVENTS__MAX)
> return NULL;

Nitpick: it's good to check if 'i' is a valid value and then access the
array with a valid index.

if (i >= PERF_MEM_EVENTS__MAX)
return NULL;

e = &perf_mem_events_arm[i];

> diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
> index 2b81d229982c..5fb41d50118d 100644
> --- a/tools/perf/arch/x86/util/mem-events.c
> +++ b/tools/perf/arch/x86/util/mem-events.c
> @@ -28,17 +28,6 @@ struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
> E("mem-ldst", "ibs_op//", "ibs_op"),
> };
>
> -struct perf_mem_event *perf_mem_events__ptr(int i)
> -{
> - if (i >= PERF_MEM_EVENTS__MAX)
> - return NULL;
> -
> - if (x86__is_amd_cpu())
> - return &perf_mem_events_amd[i];
> -
> - return &perf_mem_events_intel[i];
> -}
> -
> bool is_mem_loads_aux_event(struct evsel *leader)
> {
> struct perf_pmu *pmu = perf_pmus__find("cpu");
> @@ -54,7 +43,12 @@ bool is_mem_loads_aux_event(struct evsel *leader)
>
> const char *perf_mem_events__name(int i, const char *pmu_name)
> {
> - struct perf_mem_event *e = perf_mem_events__ptr(i);
> + struct perf_mem_event *e;

A nitpick as well:

Given perf's mem/c2c, callers will almostly invoke a valid index via the
argument 'i', but I still think here is a best place to return NULL
pointer for an invalid index rather than returning a wild pointer.

Thus I suggest to apply checking for x86 and other archs:

if (i >= PERF_MEM_EVENTS__MAX)
return NULL;

> +
> + if (x86__is_amd_cpu())
> + e = &perf_mem_events_amd[i];
> + else
> + e = &perf_mem_events_intel[i];
>
> if (!e)
> return NULL;

[...]

> int perf_mem_events__record_args(const char **rec_argv, int *argv_nr,
> char **rec_tmp, int *tmp_nr)
> {
> const char *mnt = sysfs__mount();
> + struct perf_pmu *pmu = NULL;
> int i = *argv_nr, k = 0;
> struct perf_mem_event *e;
>
> - for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> - e = perf_mem_events__ptr(j);
> - if (!e->record)
> - continue;
>
> - if (perf_pmus__num_mem_pmus() == 1) {
> - if (!e->supported) {
> - pr_err("failed: event '%s' not supported\n",
> - perf_mem_events__name(j, NULL));
> - return -1;
> - }
> + while ((pmu = perf_pmus__scan_mem(pmu)) != NULL) {
> + for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> + e = perf_pmu__mem_events_ptr(pmu, j);
>
> - rec_argv[i++] = "-e";
> - rec_argv[i++] = perf_mem_events__name(j, NULL);
> - } else {
> - struct perf_pmu *pmu = NULL;
> + if (!e->record)
> + continue;
>
> if (!e->supported) {
> - perf_mem_events__print_unsupport_hybrid(e, j);
> + pr_err("failed: event '%s' not supported\n",
> + perf_mem_events__name(j, pmu->name));
> return -1;
> }
>
> - while ((pmu = perf_pmus__scan(pmu)) != NULL) {
> + if (perf_pmus__num_mem_pmus() == 1) {
> + rec_argv[i++] = "-e";
> + rec_argv[i++] = perf_mem_events__name(j, NULL);
> + } else {
> const char *s = perf_mem_events__name(j, pmu->name);
>
> if (!perf_mem_event__supported(mnt, pmu, e))

I think we can improve a bit for this part.

Current implementation uses perf_pmus__num_mem_pmus() to decide if
system has only one memory PMU or multiple PMUs, and multiple PMUs
the tool iterates all memory PMUs to synthesize event options.

In this patch, it has changed to iterate all memory PMUs, no matter the
system has only one memory PMU or multiple PMUs. Thus, I don't see the
point for the condition checking for "perf_pmus__num_mem_pmus() == 1".
We can consolidate into the unified code like:

char *copy;
const char *s = perf_pmu__mem_events_name(j, pmu);

if (!s)
continue;

if (!perf_pmu__mem_events_supported(mnt, pmu, e))
continue;

copy = strdup(s);
if (!copy)
return -1;

rec_argv[i++] = "-e";
rec_argv[i++] = copy;
rec_tmp[k++] = copy;

Thanks,
Leo

2023-12-09 05:48:49

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] perf mem: Clean up perf_mem_events__name()

On Thu, Dec 07, 2023 at 11:23:36AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> Introduce a generic perf_mem_events__name(). Remove the ARCH-specific
> one.

Now memory event naming is arch dependent, this is because every arch
has different memory PMU names, and it appends specific configurations
(e.g. aarch64 appends 'ts_enable=1,...,min_latency=%u', and x86_64
appends 'ldlat=%u', etc). This is not friendly for extension, e.g. it's
impossible for users to specify any extra attributes for memory events.

This patch tries to consolidate in a central place for generating memory
event names, its approach is to add more flags to meet special usage
cases, which means the code gets more complex (and more difficult for
later's maintenance).

I think we need to distinguish the event naming into two levels: in
util/mem-events.c, we can consider it as a common layer, and we maintain
common options in it, e.g. 'latency', 'all-user', 'all-kernel',
'timestamp', 'physical_address', etc. In every arch's mem-events.c
file, it converts the common options to arch specific configurations.

In the end, this can avoid to add more and more flags into the
structure perf_mem_event.

Anyway, I also have a question for this patch itself, please see below
inline comment.

[...]

> diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
> index 2602e8688727..eb2ef84f0fc8 100644
> --- a/tools/perf/arch/arm64/util/mem-events.c
> +++ b/tools/perf/arch/arm64/util/mem-events.c
> @@ -2,28 +2,10 @@
> #include "map_symbol.h"
> #include "mem-events.h"
>
> -#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
> +#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
>
> struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
> - E("spe-load", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0"),
> - E("spe-store", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0"),
> - E("spe-ldst", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0"),
> + E("spe-load", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0", true, 0),
> + E("spe-store", "%s/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0", false, 0),
> + E("spe-ldst", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0", true, 0),

Arm SPE is AUX event, should set '1' to the field '.aux_event'.

I am a bit suspect if we really need the field '.aux_event', the
'.aux_event' field is only used for generating event string.

So in the below function perf_pmu__mem_events_name(), I prefer to call
an arch specific function, e.g. arch_memory_event_name(event_id, cfg),
the parameter 'event_id' passes memory event ID for load, store, and
load-store, and the parameter 'cfg' which is a pointer to the common
memory options (as mentioned above for latency, all-user or all-kernel
mode, timestamp tracing, etc).

In the end, the common layer just passes the common options to low
level arch's layer and get a event string for recording.

Thanks,
Leo

2023-12-09 06:17:51

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH V2 4/5] perf mem: Clean up perf_mem_event__supported()

On Thu, Dec 07, 2023 at 11:23:37AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> For some ARCHs, e.g., ARM and AMD, to get the availability of the
> mem-events, perf checks the existence of a specific PMU. For the other
> ARCHs, e.g., Intel and Power, perf has to check the existence of some
> specific events.
>
> The current perf only iterates the mem-events-supported PMUs. It's not
> required to check the existence of a specific PMU anymore.

With this change, both Arm and AMD archs have no chance to detect if the
hardware (or the device driver) is supported and the tool will always
take the memory events are exited on the system, right?

Thanks,
Leo

2023-12-09 06:27:37

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH V2 5/5] perf mem: Clean up is_mem_loads_aux_event()

On Thu, Dec 07, 2023 at 11:23:38AM -0800, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> The aux_event can be retrieved from the perf_pmu now. Implement a
> generic support.
>
> Reviewed-by: Ian Rogers <[email protected]>
> Tested-by: Ravi Bangoria <[email protected]>
> Signed-off-by: Kan Liang <[email protected]>
> ---
> tools/perf/arch/x86/util/mem-events.c | 23 ++++-------------------
> tools/perf/util/mem-events.c | 14 ++++++++++++--
> 2 files changed, 16 insertions(+), 21 deletions(-)
>
> diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
> index b776d849fc64..62df03e91c7e 100644
> --- a/tools/perf/arch/x86/util/mem-events.c
> +++ b/tools/perf/arch/x86/util/mem-events.c
> @@ -1,11 +1,9 @@
> // SPDX-License-Identifier: GPL-2.0
> -#include "util/pmu.h"
> -#include "util/pmus.h"
> -#include "util/env.h"
> -#include "map_symbol.h"
> -#include "mem-events.h"
> #include "linux/string.h"
> -#include "env.h"
> +#include "util/map_symbol.h"
> +#include "util/mem-events.h"
> +#include "mem-events.h"
> +
>
> #define MEM_LOADS_AUX 0x8203
>
> @@ -28,16 +26,3 @@ struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
> E(NULL, NULL, NULL, false, 0),
> E("mem-ldst", "%s//", NULL, false, 0),
> };
> -
> -bool is_mem_loads_aux_event(struct evsel *leader)
> -{
> - struct perf_pmu *pmu = perf_pmus__find("cpu");
> -
> - if (!pmu)
> - pmu = perf_pmus__find("cpu_core");
> -
> - if (pmu && !perf_pmu__have_event(pmu, "mem-loads-aux"))
> - return false;
> -
> - return leader->core.attr.config == MEM_LOADS_AUX;
> -}
> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
> index 0d174f161034..d418320e52e3 100644
> --- a/tools/perf/util/mem-events.c
> +++ b/tools/perf/util/mem-events.c
> @@ -103,9 +103,19 @@ static const char *perf_pmu__mem_events_name(int i, struct perf_pmu *pmu)
> return NULL;
> }
>
> -__weak bool is_mem_loads_aux_event(struct evsel *leader __maybe_unused)
> +bool is_mem_loads_aux_event(struct evsel *leader)
> {
> - return false;
> + struct perf_pmu *pmu = leader->pmu;
> + struct perf_mem_event *e;
> +
> + if (!pmu || !pmu->mem_events)
> + return false;
> +
> + e = &pmu->mem_events[PERF_MEM_EVENTS__LOAD];
> + if (!e->aux_event)
> + return false;
> +
> + return leader->core.attr.config == e->aux_event;
> }

I am wandering if we need to set the field 'aux_event' for Arm SPE.

So a quesiton maybe is not relevant with this patch actually, we can
see is_mem_loads_aux_event() is invoked in the file util/record.c:

static struct evsel *evsel__read_sampler(struct evsel *evsel, struct evlist *evlist)
{
struct evsel *leader = evsel__leader(evsel);

if (evsel__is_aux_event(leader) || arch_topdown_sample_read(leader) ||
is_mem_loads_aux_event(leader)) {
...
}

return leader;
}

Has evsel__is_aux_event() covered the memory load aux event? If it's,
then is_mem_loads_aux_event() is not needed anymore.

Thanks,
Leo

2023-12-09 06:42:26

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] perf mem: Add mem_events into the supported perf_pmu

Hi Kan,

On Fri, Dec 08, 2023 at 01:14:28PM -0500, Liang, Kan wrote:

[...]

> > Now I cannot access a machine for testing Arm SPE, but I will play
> > a bit for this patch set to ensure it can pass compilation. After
> > that, I will seek Arm maintainers/reviewers help for the test.
>
> Thanks. I guess I will hold the v3 until the test is done in case there
> are other issues found in ARM.

I will hold on a bit for the test until this patch set addresses the
concern for the breakage issues on Arm64. Please check my review in
other replies.

Thanks,
Leo

2023-12-11 18:10:00

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2 2/5] perf mem: Clean up perf_mem_events__ptr()



On 2023-12-08 11:31 p.m., Leo Yan wrote:
> On Thu, Dec 07, 2023 at 11:23:35AM -0800, [email protected] wrote:
>> From: Kan Liang <[email protected]>
>>
>> The mem_events can be retrieved from the struct perf_pmu now. An ARCH
>> specific perf_mem_events__ptr() is not required anymore. Remove all of
>> them.
>>
>> The Intel hybrid has multiple mem-events-supported PMUs. But they share
>> the same mem_events. Other ARCHs only support one mem-events-supported
>> PMU. In the configuration, it's good enough to only configure the
>> mem_events for one PMU. Add perf_mem_events_find_pmu() which returns the
>> first mem-events-supported PMU.
>>
>> In the perf_mem_events__init(), the perf_pmus__scan() is not required
>> anymore. It avoids checking the sysfs for every PMU on the system.
>>
>> Make the perf_mem_events__record_args() more generic. Remove the
>> perf_mem_events__print_unsupport_hybrid().
>>
>> Since pmu is added as a new parameter, rename perf_mem_events__ptr() to
>> perf_pmu__mem_events_ptr(). Several other functions also do a similar
>> rename.
>>
>> Reviewed-by: Ian Rogers <[email protected]>
>> Tested-by: Ravi Bangoria <[email protected]>
>> Signed-off-by: Kan Liang <[email protected]>
>> ---
>> tools/perf/arch/arm64/util/mem-events.c | 10 +--
>> tools/perf/arch/x86/util/mem-events.c | 18 ++---
>> tools/perf/builtin-c2c.c | 28 +++++--
>> tools/perf/builtin-mem.c | 28 +++++--
>> tools/perf/util/mem-events.c | 103 ++++++++++++------------
>> tools/perf/util/mem-events.h | 9 ++-
>> 6 files changed, 104 insertions(+), 92 deletions(-)
>>
>> diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
>> index aaa4804922b4..2602e8688727 100644
>> --- a/tools/perf/arch/arm64/util/mem-events.c
>> +++ b/tools/perf/arch/arm64/util/mem-events.c
>> @@ -12,17 +12,9 @@ struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
>>
>> static char mem_ev_name[100];
>>
>> -struct perf_mem_event *perf_mem_events__ptr(int i)
>> -{
>> - if (i >= PERF_MEM_EVENTS__MAX)
>> - return NULL;
>> -
>> - return &perf_mem_events_arm[i];
>> -}
>> -
>> const char *perf_mem_events__name(int i, const char *pmu_name __maybe_unused)
>> {
>> - struct perf_mem_event *e = perf_mem_events__ptr(i);
>> + struct perf_mem_event *e = &perf_mem_events_arm[i];
>>
>> if (i >= PERF_MEM_EVENTS__MAX)
>> return NULL;
>
> Nitpick: it's good to check if 'i' is a valid value and then access the
> array with a valid index.
>
> if (i >= PERF_MEM_EVENTS__MAX)
> return NULL;
>
> e = &perf_mem_events_arm[i];
>
>> diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
>> index 2b81d229982c..5fb41d50118d 100644
>> --- a/tools/perf/arch/x86/util/mem-events.c
>> +++ b/tools/perf/arch/x86/util/mem-events.c
>> @@ -28,17 +28,6 @@ struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
>> E("mem-ldst", "ibs_op//", "ibs_op"),
>> };
>>
>> -struct perf_mem_event *perf_mem_events__ptr(int i)
>> -{
>> - if (i >= PERF_MEM_EVENTS__MAX)
>> - return NULL;
>> -
>> - if (x86__is_amd_cpu())
>> - return &perf_mem_events_amd[i];
>> -
>> - return &perf_mem_events_intel[i];
>> -}
>> -
>> bool is_mem_loads_aux_event(struct evsel *leader)
>> {
>> struct perf_pmu *pmu = perf_pmus__find("cpu");
>> @@ -54,7 +43,12 @@ bool is_mem_loads_aux_event(struct evsel *leader)
>>
>> const char *perf_mem_events__name(int i, const char *pmu_name)
>> {
>> - struct perf_mem_event *e = perf_mem_events__ptr(i);
>> + struct perf_mem_event *e;
>
> A nitpick as well:
>
> Given perf's mem/c2c, callers will almostly invoke a valid index via the
> argument 'i', but I still think here is a best place to return NULL
> pointer for an invalid index rather than returning a wild pointer.
>
> Thus I suggest to apply checking for x86 and other archs:
>
> if (i >= PERF_MEM_EVENTS__MAX)
> return NULL;
>
>> +
>> + if (x86__is_amd_cpu())
>> + e = &perf_mem_events_amd[i];
>> + else
>> + e = &perf_mem_events_intel[i];
>>
>> if (!e)
>> return NULL;
>
> [...]
>
>> int perf_mem_events__record_args(const char **rec_argv, int *argv_nr,
>> char **rec_tmp, int *tmp_nr)
>> {
>> const char *mnt = sysfs__mount();
>> + struct perf_pmu *pmu = NULL;
>> int i = *argv_nr, k = 0;
>> struct perf_mem_event *e;
>>
>> - for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
>> - e = perf_mem_events__ptr(j);
>> - if (!e->record)
>> - continue;
>>
>> - if (perf_pmus__num_mem_pmus() == 1) {
>> - if (!e->supported) {
>> - pr_err("failed: event '%s' not supported\n",
>> - perf_mem_events__name(j, NULL));
>> - return -1;
>> - }
>> + while ((pmu = perf_pmus__scan_mem(pmu)) != NULL) {
>> + for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
>> + e = perf_pmu__mem_events_ptr(pmu, j);
>>
>> - rec_argv[i++] = "-e";
>> - rec_argv[i++] = perf_mem_events__name(j, NULL);
>> - } else {
>> - struct perf_pmu *pmu = NULL;
>> + if (!e->record)
>> + continue;
>>
>> if (!e->supported) {
>> - perf_mem_events__print_unsupport_hybrid(e, j);
>> + pr_err("failed: event '%s' not supported\n",
>> + perf_mem_events__name(j, pmu->name));
>> return -1;
>> }
>>
>> - while ((pmu = perf_pmus__scan(pmu)) != NULL) {
>> + if (perf_pmus__num_mem_pmus() == 1) {
>> + rec_argv[i++] = "-e";
>> + rec_argv[i++] = perf_mem_events__name(j, NULL);
>> + } else {
>> const char *s = perf_mem_events__name(j, pmu->name);
>>
>> if (!perf_mem_event__supported(mnt, pmu, e))
>
> I think we can improve a bit for this part.
>
> Current implementation uses perf_pmus__num_mem_pmus() to decide if
> system has only one memory PMU or multiple PMUs, and multiple PMUs
> the tool iterates all memory PMUs to synthesize event options.
>
> In this patch, it has changed to iterate all memory PMUs, no matter the
> system has only one memory PMU or multiple PMUs. Thus, I don't see the
> point for the condition checking for "perf_pmus__num_mem_pmus() == 1".
> We can consolidate into the unified code like:

Yep, I think it's doable. Also, it seems we can further clean up the
perf_pmus__num_mem_pmus(), which is a __weak function now.

It seems we just need to change the perf_mem_events_find_pmu() a little
bit and let it give both the first mem_events_pmu and the number of
mem_pmus.
>
> char *copy;
> const char *s = perf_pmu__mem_events_name(j, pmu);
>
> if (!s)
> continue;
>
> if (!perf_pmu__mem_events_supported(mnt, pmu, e))
> continue;
>
> copy = strdup(s);
> if (!copy)
> return -1;
>
> rec_argv[i++] = "-e";
> rec_argv[i++] = copy;
> rec_tmp[k++] = copy;

Not sure what's the rec_tmp for. It seems no one use it.
I will try if it can be removed.

Thanks,
Kan

>
> Thanks,
> Leo

2023-12-11 18:39:56

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] perf mem: Clean up perf_mem_events__name()



On 2023-12-09 12:48 a.m., Leo Yan wrote:
> On Thu, Dec 07, 2023 at 11:23:36AM -0800, [email protected] wrote:
>> From: Kan Liang <[email protected]>
>>
>> Introduce a generic perf_mem_events__name(). Remove the ARCH-specific
>> one.
>
> Now memory event naming is arch dependent, this is because every arch
> has different memory PMU names, and it appends specific configurations
> (e.g. aarch64 appends 'ts_enable=1,...,min_latency=%u', and x86_64
> appends 'ldlat=%u', etc). This is not friendly for extension, e.g. it's
> impossible for users to specify any extra attributes for memory events.
>
> This patch tries to consolidate in a central place for generating memory
> event names, its approach is to add more flags to meet special usage
> cases, which means the code gets more complex (and more difficult for
> later's maintenance).
>
> I think we need to distinguish the event naming into two levels: in
> util/mem-events.c, we can consider it as a common layer, and we maintain
> common options in it, e.g. 'latency', 'all-user', 'all-kernel',
> 'timestamp', 'physical_address', etc. In every arch's mem-events.c
> file, it converts the common options to arch specific configurations.
>
> In the end, this can avoid to add more and more flags into the
> structure perf_mem_event.

The purpose of this cleanup patch set is to remove the unnecessary
__weak functions, and try to make the code more generic.
The two flags has already covered all the current usage.
For now, it's good enough.

I agree that if there are more flags, it should not be a perfect
solution. But we don't have the third flag right now. Could we clean up
it later e.g., when introducing the third flag?

I just don't want to make the patch bigger and bigger. Also, I don't
think we usually implement something just for future possibilities.


>
> Anyway, I also have a question for this patch itself, please see below
> inline comment.
>
> [...]
>
>> diff --git a/tools/perf/arch/arm64/util/mem-events.c b/tools/perf/arch/arm64/util/mem-events.c
>> index 2602e8688727..eb2ef84f0fc8 100644
>> --- a/tools/perf/arch/arm64/util/mem-events.c
>> +++ b/tools/perf/arch/arm64/util/mem-events.c
>> @@ -2,28 +2,10 @@
>> #include "map_symbol.h"
>> #include "mem-events.h"
>>
>> -#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
>> +#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
>>
>> struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
>> - E("spe-load", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0"),
>> - E("spe-store", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0"),
>> - E("spe-ldst", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0"),
>> + E("spe-load", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0", true, 0),
>> + E("spe-store", "%s/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0", false, 0),
>> + E("spe-ldst", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0", true, 0),
>
> Arm SPE is AUX event, should set '1' to the field '.aux_event'.

It actually means whether an extra event is required with a mem_loads
event. I guess for ARM the answer is no.

>
> I am a bit suspect if we really need the field '.aux_event', the
> '.aux_event' field is only used for generating event string.

No, it stores the event encoding for the extra event.
ARM doesn't need it, so it's 0.

>
> So in the below function perf_pmu__mem_events_name(), I prefer to call
> an arch specific function, e.g. arch_memory_event_name(event_id, cfg),
> the parameter 'event_id' passes memory event ID for load, store, and
> load-store, and the parameter 'cfg' which is a pointer to the common
> memory options (as mentioned above for latency, all-user or all-kernel
> mode, timestamp tracing, etc).

If I understand correctly, I think we try to avoid the __weak function
for the perf tool. If there will be more parameters later, a mask may be
used for the parameters. But, again, I think it should be an improvement
for later.

Thanks,
Kan
>
> In the end, the common layer just passes the common options to low
> level arch's layer and get a event string for recording.
>
> Thanks,
> Leo

2023-12-11 18:45:30

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2 4/5] perf mem: Clean up perf_mem_event__supported()



On 2023-12-09 1:17 a.m., Leo Yan wrote:
> On Thu, Dec 07, 2023 at 11:23:37AM -0800, [email protected] wrote:
>> From: Kan Liang <[email protected]>
>>
>> For some ARCHs, e.g., ARM and AMD, to get the availability of the
>> mem-events, perf checks the existence of a specific PMU. For the other
>> ARCHs, e.g., Intel and Power, perf has to check the existence of some
>> specific events.
>>
>> The current perf only iterates the mem-events-supported PMUs. It's not
>> required to check the existence of a specific PMU anymore.
>
> With this change, both Arm and AMD archs have no chance to detect if the
> hardware (or the device driver) is supported and the tool will always
> take the memory events are exited on the system, right?

Currently, the Arm and AMD only check the specific PMU. If the PMU is
detected, the memory events are supported. The patch set doesn't change
it. It just moves the check to perf_pmu__arch_init(). When the specific
PMU is initialized, the mem_events is assigned. You don't need to do
runtime sysfs check. It should be an improvement for ARM and AMD.

Thanks,
Kan

>
> Thanks,
> Leo
>

2023-12-11 18:45:41

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2 5/5] perf mem: Clean up is_mem_loads_aux_event()



On 2023-12-09 1:27 a.m., Leo Yan wrote:
> On Thu, Dec 07, 2023 at 11:23:38AM -0800, [email protected] wrote:
>> From: Kan Liang <[email protected]>
>>
>> The aux_event can be retrieved from the perf_pmu now. Implement a
>> generic support.
>>
>> Reviewed-by: Ian Rogers <[email protected]>
>> Tested-by: Ravi Bangoria <[email protected]>
>> Signed-off-by: Kan Liang <[email protected]>
>> ---
>> tools/perf/arch/x86/util/mem-events.c | 23 ++++-------------------
>> tools/perf/util/mem-events.c | 14 ++++++++++++--
>> 2 files changed, 16 insertions(+), 21 deletions(-)
>>
>> diff --git a/tools/perf/arch/x86/util/mem-events.c b/tools/perf/arch/x86/util/mem-events.c
>> index b776d849fc64..62df03e91c7e 100644
>> --- a/tools/perf/arch/x86/util/mem-events.c
>> +++ b/tools/perf/arch/x86/util/mem-events.c
>> @@ -1,11 +1,9 @@
>> // SPDX-License-Identifier: GPL-2.0
>> -#include "util/pmu.h"
>> -#include "util/pmus.h"
>> -#include "util/env.h"
>> -#include "map_symbol.h"
>> -#include "mem-events.h"
>> #include "linux/string.h"
>> -#include "env.h"
>> +#include "util/map_symbol.h"
>> +#include "util/mem-events.h"
>> +#include "mem-events.h"
>> +
>>
>> #define MEM_LOADS_AUX 0x8203
>>
>> @@ -28,16 +26,3 @@ struct perf_mem_event perf_mem_events_amd[PERF_MEM_EVENTS__MAX] = {
>> E(NULL, NULL, NULL, false, 0),
>> E("mem-ldst", "%s//", NULL, false, 0),
>> };
>> -
>> -bool is_mem_loads_aux_event(struct evsel *leader)
>> -{
>> - struct perf_pmu *pmu = perf_pmus__find("cpu");
>> -
>> - if (!pmu)
>> - pmu = perf_pmus__find("cpu_core");
>> -
>> - if (pmu && !perf_pmu__have_event(pmu, "mem-loads-aux"))
>> - return false;
>> -
>> - return leader->core.attr.config == MEM_LOADS_AUX;
>> -}
>> diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
>> index 0d174f161034..d418320e52e3 100644
>> --- a/tools/perf/util/mem-events.c
>> +++ b/tools/perf/util/mem-events.c
>> @@ -103,9 +103,19 @@ static const char *perf_pmu__mem_events_name(int i, struct perf_pmu *pmu)
>> return NULL;
>> }
>>
>> -__weak bool is_mem_loads_aux_event(struct evsel *leader __maybe_unused)
>> +bool is_mem_loads_aux_event(struct evsel *leader)
>> {
>> - return false;
>> + struct perf_pmu *pmu = leader->pmu;
>> + struct perf_mem_event *e;
>> +
>> + if (!pmu || !pmu->mem_events)
>> + return false;
>> +
>> + e = &pmu->mem_events[PERF_MEM_EVENTS__LOAD];
>> + if (!e->aux_event)
>> + return false;
>> +
>> + return leader->core.attr.config == e->aux_event;
>> }
>
> I am wandering if we need to set the field 'aux_event' for Arm SPE.
>
> So a quesiton maybe is not relevant with this patch actually, we can
> see is_mem_loads_aux_event() is invoked in the file util/record.c:
>
> static struct evsel *evsel__read_sampler(struct evsel *evsel, struct evlist *evlist)
> {
> struct evsel *leader = evsel__leader(evsel);
>
> if (evsel__is_aux_event(leader) || arch_topdown_sample_read(leader) ||
> is_mem_loads_aux_event(leader)) {
> ...
> }
>
> return leader;
> }
>
> Has evsel__is_aux_event() covered the memory load aux event? If it's,
> then is_mem_loads_aux_event() is not needed anymore.

They are two different things. The evsel__is_aux_event() should means an
event requires AUX area, like intel_pt.

While the aux event for the mem_loads event is an extra event which has
to be scheduled together with the mem_loads event when sampling. It's
only available for some Intel platforms, e.g., SPR.

Thanks,
Kan
>
> Thanks,
> Leo

2023-12-11 19:02:15

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] perf mem: Add mem_events into the supported perf_pmu



On 2023-12-09 1:34 a.m., Leo Yan wrote:
> Hi Kan,
>
> On Fri, Dec 08, 2023 at 01:14:28PM -0500, Liang, Kan wrote:
>
> [...]
>
>>> Now I cannot access a machine for testing Arm SPE, but I will play
>>> a bit for this patch set to ensure it can pass compilation. After
>>> that, I will seek Arm maintainers/reviewers help for the test.
>>
>> Thanks. I guess I will hold the v3 until the test is done in case there
>> are other issues found in ARM.
>
> I will hold on a bit for the test until this patch set addresses the
> concern for the breakage issues on Arm64. Please check my review in
> other replies.

The reviews in the other replies don't look like break any current usage
on Arm64. I think the breakage issue is what you described in this
patch, right?

If we move the check of "arm_spe_0" to arch/arm/util/pmu.c, it seems we
have to move the perf_mem_events_arm[] into arch/arm/util/mem-events.c
as well. Is it OK?

I'm not familiar with ARM and have no idea how those events are
organized under arm64 and arm. Could you please send a fix for the
building failure on aarch64? I will fold it into the V3.

Thanks,
Kan
>
> Thanks,
> Leo
>

2023-12-13 09:52:53

by Athira Rajeev

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] Clean up perf mem



> On 08-Dec-2023, at 2:01 AM, Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Em Thu, Dec 07, 2023 at 11:23:33AM -0800, [email protected] escreveu:
>> From: Kan Liang <[email protected]>
>>
>> Changes since V1:
>> - Fix strcmp of PMU name checking (Ravi)
>> - Fix "/," typo (Ian)
>> - Rename several functions with perf_pmu__mem_events prefix. (Ian)
>> - Fold the header removal patch into the patch where the cleanups made.
>> (Arnaldo)
>> - Add reviewed-by and tested-by from Ian and Ravi
>
> It would be good to have a Tested-by from people working in all the
> architectures affectes, like we got from Ravi for AMD, can we get those?
>
> I'm applying it locally for test building, will push to
> perf-tools-next/tmp.perf-tools-next for a while, so there is some time
> to test.
>
> ARM64 (Leo?) and ppc, for PPC... humm Ravi did it, who could test it now?
Hi Arnaldo, Ravi

Looking into this for testing on powerpc. Will update back.

Thanks
Athira
>
> - Arnaldo
>
>> As discussed in the below thread, the patch set is to clean up perf mem.
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> Introduce generic functions perf_mem_events__ptr(),
>> perf_mem_events__name() ,and is_mem_loads_aux_event() to replace the
>> ARCH specific ones.
>> Simplify the perf_mem_event__supported().
>>
>> Only keeps the ARCH-specific perf_mem_events array in the corresponding
>> mem-events.c for each ARCH.
>>
>> There is no functional change.
>>
>> The patch set touches almost all the ARCHs, Intel, AMD, ARM, Power and
>> etc. But I can only test it on two Intel platforms.
>> Please give it try, if you have machines with other ARCHs.
>>
>> Here are the test results:
>> Intel hybrid machine:
>>
>> $perf mem record -e list
>> ldlat-loads : available
>> ldlat-stores : available
>>
>> $perf mem record -e ldlat-loads -v --ldlat 50
>> calling: record -e cpu_atom/mem-loads,ldlat=50/P -e cpu_core/mem-loads,ldlat=50/P
>>
>> $perf mem record -v
>> calling: record -e cpu_atom/mem-loads,ldlat=30/P -e cpu_atom/mem-stores/P -e cpu_core/mem-loads,ldlat=30/P -e cpu_core/mem-stores/P
>>
>> $perf mem record -t store -v
>> calling: record -e cpu_atom/mem-stores/P -e cpu_core/mem-stores/P
>>
>>
>> Intel SPR:
>> $perf mem record -e list
>> ldlat-loads : available
>> ldlat-stores : available
>>
>> $perf mem record -e ldlat-loads -v --ldlat 50
>> calling: record -e {cpu/mem-loads-aux/,cpu/mem-loads,ldlat=50/}:P
>>
>> $perf mem record -v
>> calling: record -e {cpu/mem-loads-aux/,cpu/mem-loads,ldlat=30/}:P -e cpu/mem-stores/P
>>
>> $perf mem record -t store -v
>> calling: record -e cpu/mem-stores/P
>>
>> Kan Liang (5):
>> perf mem: Add mem_events into the supported perf_pmu
>> perf mem: Clean up perf_mem_events__ptr()
>> perf mem: Clean up perf_mem_events__name()
>> perf mem: Clean up perf_mem_event__supported()
>> perf mem: Clean up is_mem_loads_aux_event()
>>
>> tools/perf/arch/arm64/util/mem-events.c | 36 +----
>> tools/perf/arch/arm64/util/pmu.c | 6 +
>> tools/perf/arch/powerpc/util/mem-events.c | 13 +-
>> tools/perf/arch/powerpc/util/mem-events.h | 7 +
>> tools/perf/arch/powerpc/util/pmu.c | 11 ++
>> tools/perf/arch/s390/util/pmu.c | 3 +
>> tools/perf/arch/x86/util/mem-events.c | 99 ++----------
>> tools/perf/arch/x86/util/pmu.c | 11 ++
>> tools/perf/builtin-c2c.c | 28 +++-
>> tools/perf/builtin-mem.c | 28 +++-
>> tools/perf/util/mem-events.c | 181 +++++++++++++---------
>> tools/perf/util/mem-events.h | 15 +-
>> tools/perf/util/pmu.c | 4 +-
>> tools/perf/util/pmu.h | 7 +
>> 14 files changed, 233 insertions(+), 216 deletions(-)
>> create mode 100644 tools/perf/arch/powerpc/util/mem-events.h
>> create mode 100644 tools/perf/arch/powerpc/util/pmu.c
>>
>> --
>> 2.35.1
>>
>
> --
>
> - Arnaldo


2023-12-13 13:34:10

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] perf mem: Clean up perf_mem_events__name()

On Mon, Dec 11, 2023 at 01:39:36PM -0500, Liang, Kan wrote:
> On 2023-12-09 12:48 a.m., Leo Yan wrote:
> > On Thu, Dec 07, 2023 at 11:23:36AM -0800, [email protected] wrote:
> >> From: Kan Liang <[email protected]>
> >>
> >> Introduce a generic perf_mem_events__name(). Remove the ARCH-specific
> >> one.
> >
> > Now memory event naming is arch dependent, this is because every arch
> > has different memory PMU names, and it appends specific configurations
> > (e.g. aarch64 appends 'ts_enable=1,...,min_latency=%u', and x86_64
> > appends 'ldlat=%u', etc). This is not friendly for extension, e.g. it's
> > impossible for users to specify any extra attributes for memory events.
> >
> > This patch tries to consolidate in a central place for generating memory
> > event names, its approach is to add more flags to meet special usage
> > cases, which means the code gets more complex (and more difficult for
> > later's maintenance).
> >
> > I think we need to distinguish the event naming into two levels: in
> > util/mem-events.c, we can consider it as a common layer, and we maintain
> > common options in it, e.g. 'latency', 'all-user', 'all-kernel',
> > 'timestamp', 'physical_address', etc. In every arch's mem-events.c
> > file, it converts the common options to arch specific configurations.
> >
> > In the end, this can avoid to add more and more flags into the
> > structure perf_mem_event.
>
> The purpose of this cleanup patch set is to remove the unnecessary
> __weak functions, and try to make the code more generic.

I understand weak functions are not very good practice. The point is
weak functions are used for some archs have implemented a feature but
other archs have not.

I can think a potential case to use a central place to maintain the
code for all archs - when we want to support cross analysis. But this
patch series is for supporting cross analysis, to be honest, I still
don't see benefit for this series, though I know you might try to
support hybrid modes.

> The two flags has already covered all the current usage.
> For now, it's good enough.
>
> I agree that if there are more flags, it should not be a perfect
> solution. But we don't have the third flag right now. Could we clean up
> it later e.g., when introducing the third flag?
>
> I just don't want to make the patch bigger and bigger. Also, I don't
> think we usually implement something just for future possibilities.

Fine for me, but please address two main concerns in next spin:

- Fix building failure in patch 01;
- Fix the concerned regression on Arm64/AMD archs in patch 04. I will
give a bit more explanation in another reply.


[...]

> >> -#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
> >> +#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
> >>
> >> struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
> >> - E("spe-load", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0"),
> >> - E("spe-store", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0"),
> >> - E("spe-ldst", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0"),
> >> + E("spe-load", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0", true, 0),
> >> + E("spe-store", "%s/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0", false, 0),
> >> + E("spe-ldst", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0", true, 0),
> >
> > Arm SPE is AUX event, should set '1' to the field '.aux_event'.
>
> It actually means whether an extra event is required with a mem_loads
> event. I guess for ARM the answer is no.

Thanks for confirmation.

> > I am a bit suspect if we really need the field '.aux_event', the
> > '.aux_event' field is only used for generating event string.
>
> No, it stores the event encoding for the extra event.
> ARM doesn't need it, so it's 0.

I searched a bit and confirmed '.aux_event' is only used in
util/mem-events.c and for 'perf record'.

I failed to connect the code with "it stores the event encoding for the
extra event". Could you elaborate a bit for this?

Thanks,
Leo

2023-12-13 13:54:52

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH V2 4/5] perf mem: Clean up perf_mem_event__supported()

On Mon, Dec 11, 2023 at 01:44:54PM -0500, Liang, Kan wrote:
>
>
> On 2023-12-09 1:17 a.m., Leo Yan wrote:
> > On Thu, Dec 07, 2023 at 11:23:37AM -0800, [email protected] wrote:
> >> From: Kan Liang <[email protected]>
> >>
> >> For some ARCHs, e.g., ARM and AMD, to get the availability of the
> >> mem-events, perf checks the existence of a specific PMU. For the other
> >> ARCHs, e.g., Intel and Power, perf has to check the existence of some
> >> specific events.
> >>
> >> The current perf only iterates the mem-events-supported PMUs. It's not
> >> required to check the existence of a specific PMU anymore.
> >
> > With this change, both Arm and AMD archs have no chance to detect if the
> > hardware (or the device driver) is supported and the tool will always
> > take the memory events are exited on the system, right?
>
> Currently, the Arm and AMD only check the specific PMU. If the PMU is
> detected, the memory events are supported. The patch set doesn't change
> it. It just moves the check to perf_pmu__arch_init(). When the specific
> PMU is initialized, the mem_events is assigned. You don't need to do
> runtime sysfs check. It should be an improvement for ARM and AMD.

Okay, I understand now. For Arm SPE, it has a dedicated PMU so if the
PMU is detected, then we can assume the memory events are supported.

For other cases, you need to check furthermore if PMU has supported
specific memory events.

This patch is fine for me, you could ignore my comment for the
regression caused by this patch.

Thanks,
Leo

2023-12-13 13:57:58

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH V2 4/5] perf mem: Clean up perf_mem_event__supported()

>>>> For some ARCHs, e.g., ARM and AMD, to get the availability of the
>>>> mem-events, perf checks the existence of a specific PMU. For the other
>>>> ARCHs, e.g., Intel and Power, perf has to check the existence of some
>>>> specific events.
>>>>
>>>> The current perf only iterates the mem-events-supported PMUs. It's not
>>>> required to check the existence of a specific PMU anymore.
>>>
>>> With this change, both Arm and AMD archs have no chance to detect if the
>>> hardware (or the device driver) is supported and the tool will always
>>> take the memory events are exited on the system, right?
>>
>> Currently, the Arm and AMD only check the specific PMU. If the PMU is
>> detected, the memory events are supported. The patch set doesn't change
>> it. It just moves the check to perf_pmu__arch_init(). When the specific
>> PMU is initialized, the mem_events is assigned. You don't need to do
>> runtime sysfs check. It should be an improvement for ARM and AMD.
>
> Okay, I understand now. For Arm SPE, it has a dedicated PMU so if the
> PMU is detected, then we can assume the memory events are supported.

Same for AMD. If ibs_op// pmu is present, the mem event is supported.

Thanks,
Ravi

2023-12-13 14:24:33

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] perf mem: Add mem_events into the supported perf_pmu

On Mon, Dec 11, 2023 at 02:01:37PM -0500, Liang, Kan wrote:

[...]

> > I will hold on a bit for the test until this patch set addresses the
> > concern for the breakage issues on Arm64. Please check my review in
> > other replies.
>
> The reviews in the other replies don't look like break any current usage
> on Arm64. I think the breakage issue is what you described in this
> patch, right?

I mentioned the breakage is in the patch 04, but I think the concern
is dismissed.

> If we move the check of "arm_spe_0" to arch/arm/util/pmu.c, it seems we
> have to move the perf_mem_events_arm[] into arch/arm/util/mem-events.c
> as well. Is it OK?

No. For fixing Arm64 building, please refer:

https://termbin.com/0dkn

> I'm not familiar with ARM and have no idea how those events are
> organized under arm64 and arm. Could you please send a fix for the
> building failure on aarch64? I will fold it into the V3.

After apply the change in above link on the top of your patch set,
it can build successfully at my side. Hope it's helpful.

Thanks,
Leo


>
> Thanks,
> Kan
> >
> > Thanks,
> > Leo
> >

2023-12-13 16:18:04

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] perf mem: Clean up perf_mem_events__name()



On 2023-12-13 8:33 a.m., Leo Yan wrote:
>>> I am a bit suspect if we really need the field '.aux_event', the
>>> '.aux_event' field is only used for generating event string.
>> No, it stores the event encoding for the extra event.
>> ARM doesn't need it, so it's 0.
> I searched a bit and confirmed '.aux_event' is only used in
> util/mem-events.c and for 'perf record'.
>
> I failed to connect the code with "it stores the event encoding for the
> extra event". Could you elaborate a bit for this?

The details of the reason of introducing the mem_load aux event can be
found here.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=61b985e3e775a3a75fda04ce7ef1b1aefc4758bc

A mem_load_aux event is a new requirement for SPR. For the other Intel
platforms, a single mem_load event is good enough to collect the data
source information. But for SPR, we have to group both the mem_load
event and the mem_load_aux event when collecting the data source
information. In the group, the mem_load_aux event must be the leader
event. But for the sample read case, only the sampling of the mem_load
make sense. So the is_mem_loads_aux_event() is introduced to switch the
sampling event to the mem_load event. Here is the perf tool patch.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a57d40832dc8366bc517bcbbfdb1d7fb583735b

The .aux_event is to store the event encoding of the mem_load_aux event.
If it's the leader of a sampling read group, we should use the second
event as a sampling event.

Thanks,
Kan

2023-12-13 16:21:14

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2 1/5] perf mem: Add mem_events into the supported perf_pmu



On 2023-12-13 9:24 a.m., Leo Yan wrote:
> On Mon, Dec 11, 2023 at 02:01:37PM -0500, Liang, Kan wrote:
>
> [...]
>
>>> I will hold on a bit for the test until this patch set addresses the
>>> concern for the breakage issues on Arm64. Please check my review in
>>> other replies.
>>
>> The reviews in the other replies don't look like break any current usage
>> on Arm64. I think the breakage issue is what you described in this
>> patch, right?
>
> I mentioned the breakage is in the patch 04, but I think the concern
> is dismissed.
>
>> If we move the check of "arm_spe_0" to arch/arm/util/pmu.c, it seems we
>> have to move the perf_mem_events_arm[] into arch/arm/util/mem-events.c
>> as well. Is it OK?
>
> No. For fixing Arm64 building, please refer:
>
> https://termbin.com/0dkn


That's great! Thanks a lot!

>
>> I'm not familiar with ARM and have no idea how those events are
>> organized under arm64 and arm. Could you please send a fix for the
>> building failure on aarch64? I will fold it into the V3.
>
> After apply the change in above link on the top of your patch set,
> it can build successfully at my side. Hope it's helpful.
>

Yes, that's help a lot. I will send the V3 shortly.

Thanks,
Kan

2023-12-13 17:33:53

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] perf mem: Clean up perf_mem_events__name()

On Wed, Dec 13, 2023 at 5:33 AM Leo Yan <[email protected]> wrote:
>
> On Mon, Dec 11, 2023 at 01:39:36PM -0500, Liang, Kan wrote:
> > On 2023-12-09 12:48 a.m., Leo Yan wrote:
> > > On Thu, Dec 07, 2023 at 11:23:36AM -0800, [email protected] wrote:
> > >> From: Kan Liang <[email protected]>
> > >>
> > >> Introduce a generic perf_mem_events__name(). Remove the ARCH-specific
> > >> one.
> > >
> > > Now memory event naming is arch dependent, this is because every arch
> > > has different memory PMU names, and it appends specific configurations
> > > (e.g. aarch64 appends 'ts_enable=1,...,min_latency=%u', and x86_64
> > > appends 'ldlat=%u', etc). This is not friendly for extension, e.g. it's
> > > impossible for users to specify any extra attributes for memory events.
> > >
> > > This patch tries to consolidate in a central place for generating memory
> > > event names, its approach is to add more flags to meet special usage
> > > cases, which means the code gets more complex (and more difficult for
> > > later's maintenance).
> > >
> > > I think we need to distinguish the event naming into two levels: in
> > > util/mem-events.c, we can consider it as a common layer, and we maintain
> > > common options in it, e.g. 'latency', 'all-user', 'all-kernel',
> > > 'timestamp', 'physical_address', etc. In every arch's mem-events.c
> > > file, it converts the common options to arch specific configurations.
> > >
> > > In the end, this can avoid to add more and more flags into the
> > > structure perf_mem_event.
> >
> > The purpose of this cleanup patch set is to remove the unnecessary
> > __weak functions, and try to make the code more generic.
>
> I understand weak functions are not very good practice. The point is
> weak functions are used for some archs have implemented a feature but
> other archs have not.
>
> I can think a potential case to use a central place to maintain the
> code for all archs - when we want to support cross analysis. But this
> patch series is for supporting cross analysis, to be honest, I still
> don't see benefit for this series, though I know you might try to
> support hybrid modes.

So thanks to Kan for doing this series and trying to clean the code
up. My complaint about the code is that it was overly hard wired.
Heck, we have event strings to parse that hard code PMU and event
names. In fixing hybrid my complaint was that we were adding yet more
complexity and as a lot of this was resting on printf format strings
it was hard to check that these were being used correctly. The
direction of this patch series I agree with.

Imo we should avoid weak definitions. Weak definitions are outside of
the C specification and have introduced bugs into the code base -
specifically a weak const array was having its size propagated into
code but then the linker later replaced that weak array. An
architecture #ifdef is better than a weak definition as the behavior
is defined and things blow up early rather than suffering from subtle
corruptions.

The Linux kernel device driver is abstracting what the hardware is
doing and since the more recent changes the PMU abstraction in the
perf tool is trying to match that. If we need to interface with PMUs
differently on each architecture then something is wrong. It happens
that things are wrong and we need to work with that. For example, on
intel there are topdown events that introduce ordering issues. We have
default initialization functions for different PMUs. The perf_pmu
struct is created in perf_pmu__lookup and that always calls an
perf_pmu__arch_init where the name of the PMU is already set. In the
weak perf_pmu__arch_init we tweak the perf_pmu struct so that it will
behave correctly elsewhere in the code. These changes are paralleling
that. That said, the Intel perf_pmu__arch_init does strcmps against
"intel_pt" and "intel_bts", does it really need to be arch specific
given it is already PMU specific? A situation we see in testing is
people trying to run user space ISA emulators like qemu, say ARM on
x86, should the PMU set up for intel_pt and intel_bts be broken in
this environment? I think that as the names are very specific I'd
prefer if the code were outside of the tools/perf/arch directory.
There are cases with PMU names like "cpu" where we're probably going
to need ifdefs or runtime is_intel() calls.

Anyway, my point is that I think we should be moving away from arch
specific stuff, as this patch series tries, and that way we have the
best chance of changes benefitting users regardless of hardware. It
may be that to make all of this work properly we need to modify PMUs,
but that all seems good, such as adding the extended type support for
legacy events on ARM PMUs so that legacy events can work without a
fixed CPU. We haven't got core PMUs working properly, see the recent
perf top hybrid problem. There are obviously issues with uncore as,
for example, most memory controllers are replicated PMUs but some
kernel perf drivers select a memory controller via a config value. We
either need to change the drivers for some kind of uniformity or do
some kind of abstracting in the perf tool. I think we'll probably need
to do it in the tool and when doing that we really shouldn't be doing
it in an arch specific or weak function way.

Thanks,
Ian

> > The two flags has already covered all the current usage.
> > For now, it's good enough.
> >
> > I agree that if there are more flags, it should not be a perfect
> > solution. But we don't have the third flag right now. Could we clean up
> > it later e.g., when introducing the third flag?
> >
> > I just don't want to make the patch bigger and bigger. Also, I don't
> > think we usually implement something just for future possibilities.
>
> Fine for me, but please address two main concerns in next spin:
>
> - Fix building failure in patch 01;
> - Fix the concerned regression on Arm64/AMD archs in patch 04. I will
> give a bit more explanation in another reply.
>
>
> [...]
>
> > >> -#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }
> > >> +#define E(t, n, s, l, a) { .tag = t, .name = n, .sysfs_name = s, .ldlat = l, .aux_event = a }
> > >>
> > >> struct perf_mem_event perf_mem_events_arm[PERF_MEM_EVENTS__MAX] = {
> > >> - E("spe-load", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0"),
> > >> - E("spe-store", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0"),
> > >> - E("spe-ldst", "arm_spe_0/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0"),
> > >> + E("spe-load", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=0,min_latency=%u/", "arm_spe_0", true, 0),
> > >> + E("spe-store", "%s/ts_enable=1,pa_enable=1,load_filter=0,store_filter=1/", "arm_spe_0", false, 0),
> > >> + E("spe-ldst", "%s/ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=%u/", "arm_spe_0", true, 0),
> > >
> > > Arm SPE is AUX event, should set '1' to the field '.aux_event'.
> >
> > It actually means whether an extra event is required with a mem_loads
> > event. I guess for ARM the answer is no.
>
> Thanks for confirmation.
>
> > > I am a bit suspect if we really need the field '.aux_event', the
> > > '.aux_event' field is only used for generating event string.
> >
> > No, it stores the event encoding for the extra event.
> > ARM doesn't need it, so it's 0.
>
> I searched a bit and confirmed '.aux_event' is only used in
> util/mem-events.c and for 'perf record'.
>
> I failed to connect the code with "it stores the event encoding for the
> extra event". Could you elaborate a bit for this?
>
> Thanks,
> Leo

2023-12-13 19:56:41

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH V2 0/5] Clean up perf mem



On 2023-12-13 4:51 a.m., Athira Rajeev wrote:
>
>
>> On 08-Dec-2023, at 2:01 AM, Arnaldo Carvalho de Melo <[email protected]> wrote:
>>
>> Em Thu, Dec 07, 2023 at 11:23:33AM -0800, [email protected] escreveu:
>>> From: Kan Liang <[email protected]>
>>>
>>> Changes since V1:
>>> - Fix strcmp of PMU name checking (Ravi)
>>> - Fix "/," typo (Ian)
>>> - Rename several functions with perf_pmu__mem_events prefix. (Ian)
>>> - Fold the header removal patch into the patch where the cleanups made.
>>> (Arnaldo)
>>> - Add reviewed-by and tested-by from Ian and Ravi
>>
>> It would be good to have a Tested-by from people working in all the
>> architectures affectes, like we got from Ravi for AMD, can we get those?
>>
>> I'm applying it locally for test building, will push to
>> perf-tools-next/tmp.perf-tools-next for a while, so there is some time
>> to test.
>>
>> ARM64 (Leo?) and ppc, for PPC... humm Ravi did it, who could test it now?
> Hi Arnaldo, Ravi
>
> Looking into this for testing on powerpc. Will update back.
>

Thanks Athira. I've sent out the latest V3. Please give it a try.

https://lore.kernel.org/lkml/[email protected]/

Thanks,
Kan

> Thanks
> Athira
>>
>> - Arnaldo
>>
>>> As discussed in the below thread, the patch set is to clean up perf mem.
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> Introduce generic functions perf_mem_events__ptr(),
>>> perf_mem_events__name() ,and is_mem_loads_aux_event() to replace the
>>> ARCH specific ones.
>>> Simplify the perf_mem_event__supported().
>>>
>>> Only keeps the ARCH-specific perf_mem_events array in the corresponding
>>> mem-events.c for each ARCH.
>>>
>>> There is no functional change.
>>>
>>> The patch set touches almost all the ARCHs, Intel, AMD, ARM, Power and
>>> etc. But I can only test it on two Intel platforms.
>>> Please give it try, if you have machines with other ARCHs.
>>>
>>> Here are the test results:
>>> Intel hybrid machine:
>>>
>>> $perf mem record -e list
>>> ldlat-loads : available
>>> ldlat-stores : available
>>>
>>> $perf mem record -e ldlat-loads -v --ldlat 50
>>> calling: record -e cpu_atom/mem-loads,ldlat=50/P -e cpu_core/mem-loads,ldlat=50/P
>>>
>>> $perf mem record -v
>>> calling: record -e cpu_atom/mem-loads,ldlat=30/P -e cpu_atom/mem-stores/P -e cpu_core/mem-loads,ldlat=30/P -e cpu_core/mem-stores/P
>>>
>>> $perf mem record -t store -v
>>> calling: record -e cpu_atom/mem-stores/P -e cpu_core/mem-stores/P
>>>
>>>
>>> Intel SPR:
>>> $perf mem record -e list
>>> ldlat-loads : available
>>> ldlat-stores : available
>>>
>>> $perf mem record -e ldlat-loads -v --ldlat 50
>>> calling: record -e {cpu/mem-loads-aux/,cpu/mem-loads,ldlat=50/}:P
>>>
>>> $perf mem record -v
>>> calling: record -e {cpu/mem-loads-aux/,cpu/mem-loads,ldlat=30/}:P -e cpu/mem-stores/P
>>>
>>> $perf mem record -t store -v
>>> calling: record -e cpu/mem-stores/P
>>>
>>> Kan Liang (5):
>>> perf mem: Add mem_events into the supported perf_pmu
>>> perf mem: Clean up perf_mem_events__ptr()
>>> perf mem: Clean up perf_mem_events__name()
>>> perf mem: Clean up perf_mem_event__supported()
>>> perf mem: Clean up is_mem_loads_aux_event()
>>>
>>> tools/perf/arch/arm64/util/mem-events.c | 36 +----
>>> tools/perf/arch/arm64/util/pmu.c | 6 +
>>> tools/perf/arch/powerpc/util/mem-events.c | 13 +-
>>> tools/perf/arch/powerpc/util/mem-events.h | 7 +
>>> tools/perf/arch/powerpc/util/pmu.c | 11 ++
>>> tools/perf/arch/s390/util/pmu.c | 3 +
>>> tools/perf/arch/x86/util/mem-events.c | 99 ++----------
>>> tools/perf/arch/x86/util/pmu.c | 11 ++
>>> tools/perf/builtin-c2c.c | 28 +++-
>>> tools/perf/builtin-mem.c | 28 +++-
>>> tools/perf/util/mem-events.c | 181 +++++++++++++---------
>>> tools/perf/util/mem-events.h | 15 +-
>>> tools/perf/util/pmu.c | 4 +-
>>> tools/perf/util/pmu.h | 7 +
>>> 14 files changed, 233 insertions(+), 216 deletions(-)
>>> create mode 100644 tools/perf/arch/powerpc/util/mem-events.h
>>> create mode 100644 tools/perf/arch/powerpc/util/pmu.c
>>>
>>> --
>>> 2.35.1
>>>
>>
>> --
>>
>> - Arnaldo
>
>
>

2023-12-18 03:21:22

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH V2 3/5] perf mem: Clean up perf_mem_events__name()

Hi Ian,

On Wed, Dec 13, 2023 at 09:33:24AM -0800, Ian Rogers wrote:

Sorry for late response due to I took a leave at end of last week.

[...]

> > > The purpose of this cleanup patch set is to remove the unnecessary
> > > __weak functions, and try to make the code more generic.
> >
> > I understand weak functions are not very good practice. The point is
> > weak functions are used for some archs have implemented a feature but
> > other archs have not.
> >
> > I can think a potential case to use a central place to maintain the
> > code for all archs - when we want to support cross analysis. But this
> > patch series is for supporting cross analysis, to be honest, I still
> > don't see benefit for this series, though I know you might try to
> > support hybrid modes.
>
> So thanks to Kan for doing this series and trying to clean the code
> up. My complaint about the code is that it was overly hard wired.
> Heck, we have event strings to parse that hard code PMU and event
> names. In fixing hybrid my complaint was that we were adding yet more
> complexity and as a lot of this was resting on printf format strings
> it was hard to check that these were being used correctly. The
> direction of this patch series I agree with.

I agreed as well ;)

> Imo we should avoid weak definitions. Weak definitions are outside of
> the C specification and have introduced bugs into the code base -
> specifically a weak const array was having its size propagated into
> code but then the linker later replaced that weak array. An
> architecture #ifdef is better than a weak definition as the behavior
> is defined and things blow up early rather than suffering from subtle
> corruptions.

Thanks a lot for sharing.

> The Linux kernel device driver is abstracting what the hardware is
> doing and since the more recent changes the PMU abstraction in the
> perf tool is trying to match that. If we need to interface with PMUs
> differently on each architecture then something is wrong. It happens
> that things are wrong and we need to work with that. For example, on
> intel there are topdown events that introduce ordering issues. We have
> default initialization functions for different PMUs. The perf_pmu
> struct is created in perf_pmu__lookup and that always calls an
> perf_pmu__arch_init where the name of the PMU is already set. In the
> weak perf_pmu__arch_init we tweak the perf_pmu struct so that it will
> behave correctly elsewhere in the code. These changes are paralleling
> that. That said, the Intel perf_pmu__arch_init does strcmps against
> "intel_pt" and "intel_bts", does it really need to be arch specific
> given it is already PMU specific?

To be clear, I don't object refactoring, I am just wandering if have
better approach.

Your above question is a good point. I admit the implementation in
arch's perf_pmu__arch_init() is not a good practice, IIUC, you are
seeking an general approach for dynamically probing PMU (and the
associated events).

What I can think about is using the static linked PMU descriptor +
init callback function, which is widely used in Linux kernel for
machine's initialization (refer to [1]).

Likewise, we can define a descriptor for every PMU and link the
descriptor into a data section, e.g.:

static const struct perf_pmu __intel_pt_pmu
__section(".pmu.info.init") = {
.name = "intel_pt",
.auxtrace = true,
.selectable = true,
.perf_event_attr_init_default = intel_pt_pmu_default_config,
.mem_events = NULL,
...
}

As a result, perf_pmu__lookup() just iterates the descriptor array
stored in the data section ".pmu.info.init". To support more complex
case, we even can add a callback pointer in the structure perf_pmu to
invoke PMU specific initialization.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/include/asm/mach/arch.h#n78

> A situation we see in testing is
> people trying to run user space ISA emulators like qemu, say ARM on
> x86, should the PMU set up for intel_pt and intel_bts be broken in
> this environment?

This is a good case, also is a complex case.

> I think that as the names are very specific I'd
> prefer if the code were outside of the tools/perf/arch directory.

I am not sure if understand your meaning.

Anyway, let me extend a bit with this patch series. For instance,
perf_pmu__mem_events_name() function as a central place generates memory
event naming for different archs (and even with different attributes).
This leads to architecture specific things are maintained in perf core
layer.

Rather than adding a data pointer '.mem_events' in to struct perf_pmu,
I'd like to add a new callback (say .mem_event_init()) into struct
perf_pmu, this function can return back the memory event string.

In the end, we can de-couple the perf core layer with arch specific
codes - and if we really want to support cross runtime case (e.g. Arm
binary on x86 machine), we can connect with linked descriptor as
mentioned above.

> There are cases with PMU names like "cpu" where we're probably going
> to need ifdefs or runtime is_intel() calls.
>
> Anyway, my point is that I think we should be moving away from arch
> specific stuff, as this patch series tries, and that way we have the
> best chance of changes benefitting users regardless of hardware.

To be clear, I agree it's great if we can build in all archs into single
perf binary for support cross runtime.

On the other hand, I don't think it's a good idea to totally remove arch
folders.

> It may be that to make all of this work properly we need to modify PMUs,
> but that all seems good, such as adding the extended type support for
> legacy events on ARM PMUs so that legacy events can work without a
> fixed CPU. We haven't got core PMUs working properly, see the recent
> perf top hybrid problem. There are obviously issues with uncore as,
> for example, most memory controllers are replicated PMUs but some
> kernel perf drivers select a memory controller via a config value. We
> either need to change the drivers for some kind of uniformity or do
> some kind of abstracting in the perf tool. I think we'll probably need
> to do it in the tool and when doing that we really shouldn't be doing
> it in an arch specific or weak function way.

Thanks for bringing up. Now I understand a bit your concerns.

Leo