2020-09-01 08:42:07

by Leo Yan

[permalink] [raw]
Subject: [PATCH v2 00/14] perf mem: Support AUX trace and Arm SPE

This patch set is to support AUX trace and Arm SPE as the first enabled
hardware tracing for Perf memory tool.

Patches 01 ~ 04 are preparasion patches which mainly resolve the issue
for memory events, since the existed code is hard coded the memory
events which based on x86 and PowerPC architectures, so patches 01 ~ 04
extend to support more flexible memory event name, and introduce weak
functions so can allow every architecture to define its own memory
events structure and returning event pointer and name respectively.

Patches 05 and 06 are used to extend Perf memory tool to support AUX
trace, and add a new option 'M' for itrace for generate memory events.

Patches 07 ~ 13 are to support Arm SPE with Perf memory tool. Firstly it
registers SPE events for memory events, then it extends the SPE packet
to pass addresses info and operation types, and also set 'data_src'
field so can allow the tool to display readable string in the result.

Patch 14 is to update documentation to reflect changes introduced for
support Arm SPE.

This patch set has been tested on ARMv8 Hisilicon D06 platform and
verfied on x86 so avoid to cause regression. Please note, this patch
set is dependent on the patch set "perf arm-spe: Refactor decoding &
dumping flow" [1].

Below commands can run successfully on D06:

$ perf mem record -t ldst -- ~/false_sharing.exe 2
$ perf mem record -t load -- ~/false_sharing.exe 2
$ perf mem record -t store -- ~/false_sharing.exe 2
$ perf mem report
$ perf mem report --itrace=M

# Samples: 391K of event 'memory'
# Total weight : 391193
# Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked
#
# Overhead Samples Local Weight Memory access Symbol Shared Object Data Symbol Data Object Snoop TLB access Locked
# ........ ............ ............ ........................ ............................................... .................. ............................................................ ................. ............ ...................... ......
#
18.56% 72611 0 L1 or L1 miss [.] read_write_func false_sharing.exe [.] buf1+0x0 false_sharing.exe N/A Walker hit No
16.16% 63207 0 L1 or L1 hit [.] read_write_func false_sharing.exe [.] __do_global_dtors_aux_fini_array_entry+0x228 false_sharing.exe N/A Walker hit No
15.91% 62239 0 L1 or L1 hit [.] read_write_func false_sharing.exe [.] __do_global_dtors_aux_fini_array_entry+0x250 false_sharing.exe N/A Walker hit No
4.67% 18280 0 N/A [.] read_write_func false_sharing.exe [.] buf2+0x8 false_sharing.exe N/A Walker hit No
3.34% 13082 0 L1 or L1 hit [.] read_write_func false_sharing.exe [.] __do_global_dtors_aux_fini_array_entry+0x230 false_sharing.exe N/A Walker hit No
2.49% 9755 0 L1 or L1 hit [.] read_write_func false_sharing.exe [.] 0x0000aaaac23a3450 false_sharing.exe N/A Walker hit No
2.46% 9611 0 L1 or L1 hit [.] read_write_func false_sharing.exe [.] lock_thd_name+0x0 false_sharing.exe N/A Walker hit No
2.26% 8856 0 L1 or L1 hit [.] read_write_func false_sharing.exe [.] 0x0000aaaac23a3458 false_sharing.exe N/A Walker hit No
2.19% 8549 0 L1 or L1 miss [.] read_write_func false_sharing.exe [.] buf2+0x28 false_sharing.exe N/A Walker hit No

Changes from v1:
* Refined patch 02 to use perf_mem_events__ptr() to return event pointer
and check if pointer is NULL, and remove the condition checking for
PERF_MEM_EVENTS__MAX; (James Clark)
* Added new itrace option 'M' for memory events;
* Added patch 14 to update documentation.

[1] https://lore.kernel.org/patchwork/cover/1288406/


Leo Yan (14):
perf mem: Search event name with more flexible path
perf mem: Introduce weak function perf_mem_events__ptr()
perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE
perf mem: Only initialize memory event for recording
perf auxtrace: Add option '-M' for memory events
perf mem: Support AUX trace
perf mem: Support Arm SPE events
perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC
perf arm-spe: Save memory addresses in packet
perf arm-spe: Store operation types in packet
perf arm-spe: Fill address info for samples
perf arm-spe: Synthesize memory event
perf arm-spe: Set sample's data source field
perf mem: Document options introduced by Arm SPE

tools/perf/Documentation/itrace.txt | 1 +
tools/perf/Documentation/perf-mem.txt | 10 +-
tools/perf/arch/arm64/util/Build | 2 +-
tools/perf/arch/arm64/util/mem-events.c | 46 ++++++
tools/perf/builtin-c2c.c | 23 ++-
tools/perf/builtin-mem.c | 73 ++++++++--
.../util/arm-spe-decoder/arm-spe-decoder.c | 15 ++
.../util/arm-spe-decoder/arm-spe-decoder.h | 8 ++
tools/perf/util/arm-spe.c | 132 +++++++++++++++---
tools/perf/util/auxtrace.c | 4 +
tools/perf/util/auxtrace.h | 2 +
tools/perf/util/mem-events.c | 41 ++++--
tools/perf/util/mem-events.h | 3 +-
13 files changed, 302 insertions(+), 58 deletions(-)
create mode 100644 tools/perf/arch/arm64/util/mem-events.c

--
2.20.1


2020-09-01 08:42:33

by Leo Yan

[permalink] [raw]
Subject: [PATCH v2 08/14] perf arm-spe: Enable attribution PERF_SAMPLE_DATA_SRC

This patch is to enable attribution PERF_SAMPLE_DATA_SRC for the perf
data, when decoding the tracing data, it will tells the tool it contains
memory data.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 07232664c927..305ab725b3ba 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -810,7 +810,7 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
attr.type = PERF_TYPE_HARDWARE;
attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
- PERF_SAMPLE_PERIOD;
+ PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC;
if (spe->timeless_decoding)
attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
else
--
2.20.1

2020-09-01 08:43:00

by Leo Yan

[permalink] [raw]
Subject: [PATCH v2 09/14] perf arm-spe: Save memory addresses in packet

This patch is to save virtual and physical memory addresses in packet,
the address info can be used for generating memory samples.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 4 ++++
tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 2 ++
2 files changed, 6 insertions(+)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index ae718e3419e3..1c430657939f 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -160,6 +160,10 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
decoder->record.from_ip = ip;
else if (idx == SPE_ADDR_PKT_HDR_INDEX_BRANCH)
decoder->record.to_ip = ip;
+ else if (idx == SPE_ADDR_PKT_HDR_INDEX_DATA_VIRT)
+ decoder->record.addr = ip;
+ else if (idx == SPE_ADDR_PKT_HDR_INDEX_DATA_PHYS)
+ decoder->record.phys_addr = ip;
break;
case ARM_SPE_COUNTER:
break;
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 24727b8ca7ff..31d1776785de 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -30,6 +30,8 @@ struct arm_spe_record {
u64 from_ip;
u64 to_ip;
u64 timestamp;
+ u64 addr;
+ u64 phys_addr;
};

struct arm_spe_insn;
--
2.20.1

2020-09-01 08:43:53

by Leo Yan

[permalink] [raw]
Subject: [PATCH v2 13/14] perf arm-spe: Set sample's data source field

The sample structure contains the field 'data_src' which is used to
tell the detailed info for data operations, e.g. this field indicates
the data operation is loading or storing, on which cache level, it's
snooping or remote accessing, etc. At the end, the 'data_src' will be
parsed by perf memory tool to display human readable strings.

This patch is to fill the 'data_src' field in the synthesized samples
base on different types. Now support types for Level 1 dcache miss,
Level 1 dcache hit, Last level cache miss, Last level cache access,
TLB miss, TLB hit, remote access for other socket.

Note, current perf tool can display statistics for L1/L2/L3 caches but
it doesn't support the 'last level cache'. To fit into current
implementation, 'data_src' field uses L3 cache for last level cache.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/arm-spe.c | 63 +++++++++++++++++++++++++++++++++------
1 file changed, 54 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 7f44ef8c89f1..142149f732b3 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -264,7 +264,7 @@ arm_spe_deliver_synth_event(struct arm_spe *spe,
}

static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
- u64 spe_events_id)
+ u64 spe_events_id, u64 data_src)
{
struct arm_spe *spe = speq->spe;
struct arm_spe_record *record = &speq->decoder->record;
@@ -277,6 +277,7 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
sample.stream_id = spe_events_id;
sample.addr = record->addr;
sample.phys_addr = record->phys_addr;
+ sample.data_src = data_src;

return arm_spe_deliver_synth_event(spe, speq, event, &sample);
}
@@ -311,21 +312,60 @@ static bool arm_spe__is_memory_event(enum arm_spe_sample_type type)
return false;
}

+static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
+{
+ union perf_mem_data_src data_src = { 0 };
+
+ if (record->op == ARM_SPE_LD)
+ data_src.mem_op = PERF_MEM_OP_LOAD;
+ else
+ data_src.mem_op = PERF_MEM_OP_STORE;
+
+ if (record->type & ARM_SPE_L1D_MISS) {
+ data_src.mem_lvl_num = PERF_MEM_LVLNUM_L1;
+ data_src.mem_lvl = PERF_MEM_LVL_MISS | PERF_MEM_LVL_L1;
+ } else if (record->type & ARM_SPE_L1D_ACCESS) {
+ data_src.mem_lvl_num = PERF_MEM_LVLNUM_L1;
+ data_src.mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L1;
+ } else if (record->type & ARM_SPE_LLC_MISS) {
+ data_src.mem_lvl_num = PERF_MEM_LVLNUM_L3;
+ data_src.mem_lvl = PERF_MEM_LVL_MISS | PERF_MEM_LVL_L3;
+ } else if (record->type & ARM_SPE_LLC_ACCESS) {
+ data_src.mem_lvl_num = PERF_MEM_LVLNUM_L3;
+ data_src.mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_L3;
+ } else if (record->type & ARM_SPE_REMOTE_ACCESS) {
+ data_src.mem_lvl_num = PERF_MEM_LVLNUM_ANY_CACHE;
+ data_src.mem_lvl = PERF_MEM_LVL_HIT | PERF_MEM_LVL_REM_CCE1;
+ }
+
+ if (record->type & ARM_SPE_TLB_MISS)
+ data_src.mem_dtlb = PERF_MEM_TLB_WK | PERF_MEM_TLB_MISS;
+ else if (record->type & ARM_SPE_TLB_ACCESS)
+ data_src.mem_dtlb = PERF_MEM_TLB_WK | PERF_MEM_TLB_HIT;
+
+ return data_src.val;
+}
+
static int arm_spe_sample(struct arm_spe_queue *speq)
{
const struct arm_spe_record *record = &speq->decoder->record;
struct arm_spe *spe = speq->spe;
+ u64 data_src;
int err;

+ data_src = arm_spe__synth_data_source(record);
+
if (spe->sample_flc) {
if (record->type & ARM_SPE_L1D_MISS) {
- err = arm_spe__synth_mem_sample(speq, spe->l1d_miss_id);
+ err = arm_spe__synth_mem_sample(speq, spe->l1d_miss_id,
+ data_src);
if (err)
return err;
}

if (record->type & ARM_SPE_L1D_ACCESS) {
- err = arm_spe__synth_mem_sample(speq, spe->l1d_access_id);
+ err = arm_spe__synth_mem_sample(speq, spe->l1d_access_id,
+ data_src);
if (err)
return err;
}
@@ -333,13 +373,15 @@ static int arm_spe_sample(struct arm_spe_queue *speq)

if (spe->sample_llc) {
if (record->type & ARM_SPE_LLC_MISS) {
- err = arm_spe__synth_mem_sample(speq, spe->llc_miss_id);
+ err = arm_spe__synth_mem_sample(speq, spe->llc_miss_id,
+ data_src);
if (err)
return err;
}

if (record->type & ARM_SPE_LLC_ACCESS) {
- err = arm_spe__synth_mem_sample(speq, spe->llc_access_id);
+ err = arm_spe__synth_mem_sample(speq, spe->llc_access_id,
+ data_src);
if (err)
return err;
}
@@ -347,13 +389,15 @@ static int arm_spe_sample(struct arm_spe_queue *speq)

if (spe->sample_tlb) {
if (record->type & ARM_SPE_TLB_MISS) {
- err = arm_spe__synth_mem_sample(speq, spe->tlb_miss_id);
+ err = arm_spe__synth_mem_sample(speq, spe->tlb_miss_id,
+ data_src);
if (err)
return err;
}

if (record->type & ARM_SPE_TLB_ACCESS) {
- err = arm_spe__synth_mem_sample(speq, spe->tlb_access_id);
+ err = arm_spe__synth_mem_sample(speq, spe->tlb_access_id,
+ data_src);
if (err)
return err;
}
@@ -367,13 +411,14 @@ static int arm_spe_sample(struct arm_spe_queue *speq)

if (spe->sample_remote_access &&
(record->type & ARM_SPE_REMOTE_ACCESS)) {
- err = arm_spe__synth_mem_sample(speq, spe->remote_access_id);
+ err = arm_spe__synth_mem_sample(speq, spe->remote_access_id,
+ data_src);
if (err)
return err;
}

if (spe->sample_memory && arm_spe__is_memory_event(record->type)) {
- err = arm_spe__synth_mem_sample(speq, spe->memory_id);
+ err = arm_spe__synth_mem_sample(speq, spe->memory_id, data_src);
if (err)
return err;
}
--
2.20.1

2020-09-01 08:43:58

by Leo Yan

[permalink] [raw]
Subject: [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr()

Different architectures might use different event or different event
parameters for memory profiling, this patch introduces weak function
perf_mem_events__ptr(), which allows to return back architecture
specific memory event.

After the function perf_mem_events__ptr() is introduced, the variable
'perf_mem_events' can be accessed by using this new function; so marks
the variable as 'static' variable, this can allow the architectures to
define its own memory event array.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 23 ++++++++++++++++-------
tools/perf/builtin-mem.c | 26 ++++++++++++++++++--------
tools/perf/util/mem-events.c | 26 +++++++++++++++++++-------
tools/perf/util/mem-events.h | 2 +-
4 files changed, 54 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 5938b100eaf4..594ec6b015b5 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2914,6 +2914,7 @@ static int perf_c2c__record(int argc, const char **argv)
int ret;
bool all_user = false, all_kernel = false;
bool event_set = false;
+ struct perf_mem_event *e;
struct option options[] = {
OPT_CALLBACK('e', "event", &event_set, "event",
"event selector. Use 'perf mem record -e list' to list available events",
@@ -2941,30 +2942,38 @@ static int perf_c2c__record(int argc, const char **argv)
rec_argv[i++] = "record";

if (!event_set) {
- perf_mem_events[PERF_MEM_EVENTS__LOAD].record = true;
- perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ e->record = true;
+
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+ e->record = true;
}

- if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ if (e->record)
rec_argv[i++] = "-W";

rec_argv[i++] = "-d";
rec_argv[i++] = "--phys-data";
rec_argv[i++] = "--sample-cpu";

- for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
- if (!perf_mem_events[j].record)
+ j = 0;
+ while ((e = perf_mem_events__ptr(j)) != NULL) {
+ if (!e->record) {
+ j++;
continue;
+ }

- if (!perf_mem_events[j].supported) {
+ if (!e->supported) {
pr_err("failed: event '%s' not supported\n",
- perf_mem_events[j].name);
+ perf_mem_events__name(j));
free(rec_argv);
return -1;
}

rec_argv[i++] = "-e";
rec_argv[i++] = perf_mem_events__name(j);
+ j++;
}

if (all_user)
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 3523279af6af..070e0f1d3300 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -64,6 +64,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
const char **rec_argv;
int ret;
bool all_user = false, all_kernel = false;
+ struct perf_mem_event *e;
struct option options[] = {
OPT_CALLBACK('e', "event", &mem, "event",
"event selector. use 'perf mem record -e list' to list available events",
@@ -86,13 +87,18 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)

rec_argv[i++] = "record";

- if (mem->operation & MEM_OPERATION_LOAD)
- perf_mem_events[PERF_MEM_EVENTS__LOAD].record = true;
+ if (mem->operation & MEM_OPERATION_LOAD) {
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ e->record = true;
+ }

- if (mem->operation & MEM_OPERATION_STORE)
- perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
+ if (mem->operation & MEM_OPERATION_STORE) {
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
+ e->record = true;
+ }

- if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
+ e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
+ if (e->record)
rec_argv[i++] = "-W";

rec_argv[i++] = "-d";
@@ -100,11 +106,14 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
if (mem->phys_addr)
rec_argv[i++] = "--phys-data";

- for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
- if (!perf_mem_events[j].record)
+ j = 0;
+ while ((e = perf_mem_events__ptr(j)) != NULL) {
+ if (!e->record) {
+ j++;
continue;
+ }

- if (!perf_mem_events[j].supported) {
+ if (!e->supported) {
pr_err("failed: event '%s' not supported\n",
perf_mem_events__name(j));
free(rec_argv);
@@ -113,6 +122,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)

rec_argv[i++] = "-e";
rec_argv[i++] = perf_mem_events__name(j);
+ j++;
}

if (all_user)
diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 35c8d175a9d2..7a5a0d699e27 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -17,7 +17,7 @@ unsigned int perf_mem_events__loads_ldlat = 30;

#define E(t, n, s) { .tag = t, .name = n, .sysfs_name = s }

-struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
+static struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
E("ldlat-loads", "cpu/mem-loads,ldlat=%u/P", "cpu/events/mem-loads"),
E("ldlat-stores", "cpu/mem-stores/P", "cpu/events/mem-stores"),
};
@@ -28,19 +28,31 @@ struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX] = {
static char mem_loads_name[100];
static bool mem_loads_name__init;

+struct perf_mem_event * __weak perf_mem_events__ptr(int i)
+{
+ if (i >= PERF_MEM_EVENTS__MAX)
+ return NULL;
+
+ return &perf_mem_events[i];
+}
+
char * __weak perf_mem_events__name(int i)
{
+ struct perf_mem_event *e = perf_mem_events__ptr(i);
+
+ if (!e)
+ return NULL;
+
if (i == PERF_MEM_EVENTS__LOAD) {
if (!mem_loads_name__init) {
mem_loads_name__init = true;
scnprintf(mem_loads_name, sizeof(mem_loads_name),
- perf_mem_events[i].name,
- perf_mem_events__loads_ldlat);
+ e->name, perf_mem_events__loads_ldlat);
}
return mem_loads_name;
}

- return (char *)perf_mem_events[i].name;
+ return (char *)e->name;
}

int perf_mem_events__parse(const char *str)
@@ -61,7 +73,7 @@ int perf_mem_events__parse(const char *str)

while (tok) {
for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
- struct perf_mem_event *e = &perf_mem_events[j];
+ struct perf_mem_event *e = perf_mem_events__ptr(j);

if (strstr(e->tag, tok))
e->record = found = true;
@@ -90,7 +102,7 @@ int perf_mem_events__init(void)

for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
char path[PATH_MAX];
- struct perf_mem_event *e = &perf_mem_events[j];
+ struct perf_mem_event *e = perf_mem_events__ptr(j);
struct stat st;

scnprintf(path, PATH_MAX, "%s/devices/%s",
@@ -108,7 +120,7 @@ void perf_mem_events__list(void)
int j;

for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
- struct perf_mem_event *e = &perf_mem_events[j];
+ struct perf_mem_event *e = perf_mem_events__ptr(j);

fprintf(stderr, "%-13s%-*s%s\n",
e->tag,
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 904dad34f7f7..726a9c8103e4 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -31,13 +31,13 @@ enum {
PERF_MEM_EVENTS__MAX,
};

-extern struct perf_mem_event perf_mem_events[PERF_MEM_EVENTS__MAX];
extern unsigned int perf_mem_events__loads_ldlat;

int perf_mem_events__parse(const char *str);
int perf_mem_events__init(void);

char *perf_mem_events__name(int i);
+struct perf_mem_event *perf_mem_events__ptr(int i);

void perf_mem_events__list(void);

--
2.20.1

2020-09-03 14:12:40

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr()

On Tue, Sep 01, 2020 at 09:38:03AM +0100, Leo Yan wrote:

SNIP

> @@ -2941,30 +2942,38 @@ static int perf_c2c__record(int argc, const char **argv)
> rec_argv[i++] = "record";
>
> if (!event_set) {
> - perf_mem_events[PERF_MEM_EVENTS__LOAD].record = true;
> - perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
> + e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> + e->record = true;
> +
> + e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
> + e->record = true;
> }
>
> - if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
> + e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> + if (e->record)
> rec_argv[i++] = "-W";
>
> rec_argv[i++] = "-d";
> rec_argv[i++] = "--phys-data";
> rec_argv[i++] = "--sample-cpu";
>
> - for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> - if (!perf_mem_events[j].record)
> + j = 0;
> + while ((e = perf_mem_events__ptr(j)) != NULL) {
> + if (!e->record) {

you could keep the above 'for loop' in here, it seems better
than taking care of j++

> + j++;
> continue;
> + }
>
> - if (!perf_mem_events[j].supported) {
> + if (!e->supported) {
> pr_err("failed: event '%s' not supported\n",
> - perf_mem_events[j].name);
> + perf_mem_events__name(j));
> free(rec_argv);
> return -1;
> }
>
> rec_argv[i++] = "-e";
> rec_argv[i++] = perf_mem_events__name(j);
> + j++;
> }
>
> if (all_user)

SNIP

> @@ -100,11 +106,14 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
> if (mem->phys_addr)
> rec_argv[i++] = "--phys-data";
>
> - for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> - if (!perf_mem_events[j].record)
> + j = 0;
> + while ((e = perf_mem_events__ptr(j)) != NULL) {
> + if (!e->record) {

same here

thanks,
jirka

> + j++;
> continue;
> + }
>
> - if (!perf_mem_events[j].supported) {
> + if (!e->supported) {
> pr_err("failed: event '%s' not supported\n",
> perf_mem_events__name(j));
> free(rec_argv);
> @@ -113,6 +122,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
>
> rec_argv[i++] = "-e";
> rec_argv[i++] = perf_mem_events__name(j);
> + j++;

SNIP

2020-09-04 00:37:12

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr()

Hi Jiri,

On Thu, Sep 03, 2020 at 03:50:54PM +0200, Jiri Olsa wrote:
> On Tue, Sep 01, 2020 at 09:38:03AM +0100, Leo Yan wrote:
>
> SNIP
>
> > @@ -2941,30 +2942,38 @@ static int perf_c2c__record(int argc, const char **argv)
> > rec_argv[i++] = "record";
> >
> > if (!event_set) {
> > - perf_mem_events[PERF_MEM_EVENTS__LOAD].record = true;
> > - perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
> > + e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> > + e->record = true;
> > +
> > + e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
> > + e->record = true;
> > }
> >
> > - if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
> > + e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> > + if (e->record)
> > rec_argv[i++] = "-W";
> >
> > rec_argv[i++] = "-d";
> > rec_argv[i++] = "--phys-data";
> > rec_argv[i++] = "--sample-cpu";
> >
> > - for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> > - if (!perf_mem_events[j].record)
> > + j = 0;
> > + while ((e = perf_mem_events__ptr(j)) != NULL) {
> > + if (!e->record) {
>
> you could keep the above 'for loop' in here, it seems better
> than taking care of j++

Actually in patch v1 I did this way :) I followed James' suggestion to
encapsulate PERF_MEM_EVENTS__MAX into perf_mem_events__ptr(), thus
builtin-mem.c and buildin-c2c.c are not necessary to use
PERF_MEM_EVENTS__MAX in the loop and only needs to detect if the
pointer is NULL or not when return from perf_mem_events__ptr().

How about change as below?

for (j = 0; (e = perf_mem_events__ptr(j)) != NULL; j++) {
[...]
}

If you still think this is not good, I will change back to the old
code style in next spin

Thanks for reviewing!

Leo

> > + j++;
> > continue;
> > + }
> >
> > - if (!perf_mem_events[j].supported) {
> > + if (!e->supported) {
> > pr_err("failed: event '%s' not supported\n",
> > - perf_mem_events[j].name);
> > + perf_mem_events__name(j));
> > free(rec_argv);
> > return -1;
> > }
> >
> > rec_argv[i++] = "-e";
> > rec_argv[i++] = perf_mem_events__name(j);
> > + j++;
> > }
> >
> > if (all_user)
>
> SNIP
>
> > @@ -100,11 +106,14 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
> > if (mem->phys_addr)
> > rec_argv[i++] = "--phys-data";
> >
> > - for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> > - if (!perf_mem_events[j].record)
> > + j = 0;
> > + while ((e = perf_mem_events__ptr(j)) != NULL) {
> > + if (!e->record) {
>
> same here
>
> thanks,
> jirka
>
> > + j++;
> > continue;
> > + }
> >
> > - if (!perf_mem_events[j].supported) {
> > + if (!e->supported) {
> > pr_err("failed: event '%s' not supported\n",
> > perf_mem_events__name(j));
> > free(rec_argv);
> > @@ -113,6 +122,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem)
> >
> > rec_argv[i++] = "-e";
> > rec_argv[i++] = perf_mem_events__name(j);
> > + j++;
>
> SNIP
>

2020-09-04 15:54:01

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr()

On Fri, Sep 04, 2020 at 08:34:47AM +0800, Leo Yan wrote:
> Hi Jiri,
>
> On Thu, Sep 03, 2020 at 03:50:54PM +0200, Jiri Olsa wrote:
> > On Tue, Sep 01, 2020 at 09:38:03AM +0100, Leo Yan wrote:
> >
> > SNIP
> >
> > > @@ -2941,30 +2942,38 @@ static int perf_c2c__record(int argc, const char **argv)
> > > rec_argv[i++] = "record";
> > >
> > > if (!event_set) {
> > > - perf_mem_events[PERF_MEM_EVENTS__LOAD].record = true;
> > > - perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
> > > + e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> > > + e->record = true;
> > > +
> > > + e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
> > > + e->record = true;
> > > }
> > >
> > > - if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
> > > + e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> > > + if (e->record)
> > > rec_argv[i++] = "-W";
> > >
> > > rec_argv[i++] = "-d";
> > > rec_argv[i++] = "--phys-data";
> > > rec_argv[i++] = "--sample-cpu";
> > >
> > > - for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> > > - if (!perf_mem_events[j].record)
> > > + j = 0;
> > > + while ((e = perf_mem_events__ptr(j)) != NULL) {
> > > + if (!e->record) {
> >
> > you could keep the above 'for loop' in here, it seems better
> > than taking care of j++
>
> Actually in patch v1 I did this way :) I followed James' suggestion to
> encapsulate PERF_MEM_EVENTS__MAX into perf_mem_events__ptr(), thus
> builtin-mem.c and buildin-c2c.c are not necessary to use
> PERF_MEM_EVENTS__MAX in the loop and only needs to detect if the
> pointer is NULL or not when return from perf_mem_events__ptr().

ah because u added that load_store event

>
> How about change as below?
>
> for (j = 0; (e = perf_mem_events__ptr(j)) != NULL; j++) {
> [...]

will this work? e will be NULL for first iteration no?

there are still other for loops with PERF_MEM_EVENTS__MAX used
in the patch.. you overload the perf_mem_events access for arm,
and add missing load_store NULL item to generic version, so there's
always PERF_MEM_EVENTS__MAX items in the array

can we just use the current for loop and check for e->tag != NULL
or any other field

thanks,
jirka

2020-09-07 08:19:32

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v2 02/14] perf mem: Introduce weak function perf_mem_events__ptr()

On Fri, Sep 04, 2020 at 05:52:51PM +0200, Jiri Olsa wrote:
> On Fri, Sep 04, 2020 at 08:34:47AM +0800, Leo Yan wrote:
> > Hi Jiri,
> >
> > On Thu, Sep 03, 2020 at 03:50:54PM +0200, Jiri Olsa wrote:
> > > On Tue, Sep 01, 2020 at 09:38:03AM +0100, Leo Yan wrote:
> > >
> > > SNIP
> > >
> > > > @@ -2941,30 +2942,38 @@ static int perf_c2c__record(int argc, const char **argv)
> > > > rec_argv[i++] = "record";
> > > >
> > > > if (!event_set) {
> > > > - perf_mem_events[PERF_MEM_EVENTS__LOAD].record = true;
> > > > - perf_mem_events[PERF_MEM_EVENTS__STORE].record = true;
> > > > + e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> > > > + e->record = true;
> > > > +
> > > > + e = perf_mem_events__ptr(PERF_MEM_EVENTS__STORE);
> > > > + e->record = true;
> > > > }
> > > >
> > > > - if (perf_mem_events[PERF_MEM_EVENTS__LOAD].record)
> > > > + e = perf_mem_events__ptr(PERF_MEM_EVENTS__LOAD);
> > > > + if (e->record)
> > > > rec_argv[i++] = "-W";
> > > >
> > > > rec_argv[i++] = "-d";
> > > > rec_argv[i++] = "--phys-data";
> > > > rec_argv[i++] = "--sample-cpu";
> > > >
> > > > - for (j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> > > > - if (!perf_mem_events[j].record)
> > > > + j = 0;
> > > > + while ((e = perf_mem_events__ptr(j)) != NULL) {
> > > > + if (!e->record) {
> > >
> > > you could keep the above 'for loop' in here, it seems better
> > > than taking care of j++
> >
> > Actually in patch v1 I did this way :) I followed James' suggestion to
> > encapsulate PERF_MEM_EVENTS__MAX into perf_mem_events__ptr(), thus
> > builtin-mem.c and buildin-c2c.c are not necessary to use
> > PERF_MEM_EVENTS__MAX in the loop and only needs to detect if the
> > pointer is NULL or not when return from perf_mem_events__ptr().
>
> ah because u added that load_store event

Yes.

> >
> > How about change as below?
> >
> > for (j = 0; (e = perf_mem_events__ptr(j)) != NULL; j++) {
> > [...]
>
> will this work? e will be NULL for first iteration no?
>
> there are still other for loops with PERF_MEM_EVENTS__MAX used
> in the patch.. you overload the perf_mem_events access for arm,
> and add missing load_store NULL item to generic version, so there's
> always PERF_MEM_EVENTS__MAX items in the array

Yes, exactly.

> can we just use the current for loop and check for e->tag != NULL
> or any other field

Understood. This would be directive, will keep current code and will
check 'e->record' field.

Thanks,
Leo