2022-04-26 21:57:57

by Ali Saidi

[permalink] [raw]
Subject: [PATCH v4 0/4] perf: arm-spe: Decode SPE source and use for perf c2c

When synthesizing data from SPE, augment the type with source information
for Arm Neoverse cores so we can detect situtions like cache line
contention and transfers on Arm platforms.

This changes enables future changes to c2c on a system with SPE where lines that
are shared among multiple cores show up in perf c2c output.

Changes in v7:
* Minor change requested by Leo Yan

Changes in v6:
* Drop changes to c2c command which will come from Leo Yan

Changes in v5:
* Add a new snooping type to disambiguate cache-to-cache transfers where
we don't know if the data is clean or dirty.
* Set snoop flags on all the data-source cases
* Special case stores as we have no information on them

Changes in v4:
* Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/
* Add neoverse-v1 to the neoverse cores list

Ali Saidi (4):
tools: arm64: Import cputype.h
perf arm-spe: Use SPE data source for neoverse cores
perf mem: Support mem_lvl_num in c2c command
perf mem: Support HITM for when mem_lvl_num is any

tools/arch/arm64/include/asm/cputype.h | 258 ++++++++++++++++++
.../util/arm-spe-decoder/arm-spe-decoder.c | 1 +
.../util/arm-spe-decoder/arm-spe-decoder.h | 12 +
tools/perf/util/arm-spe.c | 110 +++++++-
tools/perf/util/mem-events.c | 20 +-
5 files changed, 383 insertions(+), 18 deletions(-)
create mode 100644 tools/arch/arm64/include/asm/cputype.h

--
2.32.0


2022-04-26 23:36:46

by Ali Saidi

[permalink] [raw]
Subject: [PATCH v7 4/5] perf arm-spe: Don't set data source if it's not a memory operation

From: Leo Yan <[email protected]>

Except memory load and store operations, Arm SPE records also can
support other operation types, bug when set the data source field the
current code assumes a record is a either load operation or store
operation, this leads to wrongly synthesize memory samples.

This patch strictly checks the record operation type, it only sets data
source only for the operation types ARM_SPE_LD and ARM_SPE_ST,
otherwise, returns zero for data source. Therefore, we can synthesize
memory samples only when data source is a non-zero value, the function
arm_spe__is_memory_event() is useless and removed.

Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: Ali Saidi <[email protected]>
Tested-by: Ali Saidi <[email protected]>
---
tools/perf/util/arm-spe.c | 22 ++++++++--------------
1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index d2b64e3f588b..e032efc03274 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -387,26 +387,16 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
return arm_spe_deliver_synth_event(spe, speq, event, &sample);
}

-#define SPE_MEM_TYPE (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS | \
- ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS | \
- ARM_SPE_REMOTE_ACCESS)
-
-static bool arm_spe__is_memory_event(enum arm_spe_sample_type type)
-{
- if (type & SPE_MEM_TYPE)
- return true;
-
- return false;
-}
-
static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
{
union perf_mem_data_src data_src = { 0 };

if (record->op == ARM_SPE_LD)
data_src.mem_op = PERF_MEM_OP_LOAD;
- else
+ else if (record->op == ARM_SPE_ST)
data_src.mem_op = PERF_MEM_OP_STORE;
+ else
+ return 0;

if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
data_src.mem_lvl = PERF_MEM_LVL_L3;
@@ -510,7 +500,11 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
return err;
}

- if (spe->sample_memory && arm_spe__is_memory_event(record->type)) {
+ /*
+ * When data_src is zero it means the record is not a memory operation,
+ * skip to synthesize memory sample for this case.
+ */
+ if (spe->sample_memory && data_src) {
err = arm_spe__synth_mem_sample(speq, spe->memory_id, data_src);
if (err)
return err;
--
2.32.0

2022-04-27 10:20:47

by Ali Saidi

[permalink] [raw]
Subject: [PATCH v7 3/5] perf mem: Print snoop peer flag

From: Leo Yan <[email protected]>

Since PERF_MEM_SNOOPX_PEER flag is a new snoop type, print this flag if
it is set.

Before:
memstress 3603 [020] 122.463754: 1 l1d-miss: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 l1d-access: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 llc-miss: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 llc-access: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 tlb-access: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 memory: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP N/A|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)

After:

memstress 3603 [020] 122.463754: 1 l1d-miss: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 l1d-access: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 llc-miss: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 llc-access: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 tlb-access: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)
memstress 3603 [020] 122.463754: 1 memory: 8688000842 |OP LOAD|LVL L3 or L3 hit|SNP Peer|TLB Walker hit|LCK No|BLK N/A aaaac17c3e88 [unknown] (/home/ubuntu/memstress)

Signed-off-by: Leo Yan <[email protected]>
Reviewed-by: Ali Saidi <[email protected]>
Tested-by: Ali Saidi <[email protected]>
---
tools/perf/util/mem-events.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index efaf263464b9..db5225caaabe 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -410,6 +410,11 @@ static const char * const snoop_access[] = {
"HitM",
};

+static const char * const snoopx_access[] = {
+ "Fwd",
+ "Peer",
+};
+
int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
{
size_t i, l = 0;
@@ -430,13 +435,20 @@ int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info)
}
l += scnprintf(out + l, sz - l, snoop_access[i]);
}
- if (mem_info &&
- (mem_info->data_src.mem_snoopx & PERF_MEM_SNOOPX_FWD)) {
+
+ m = 0;
+ if (mem_info)
+ m = mem_info->data_src.mem_snoopx;
+
+ for (i = 0; m && i < ARRAY_SIZE(snoopx_access); i++, m >>= 1) {
+ if (!(m & 0x1))
+ continue;
+
if (l) {
strcat(out, " or ");
l += 4;
}
- l += scnprintf(out + l, sz - l, "Fwd");
+ l += scnprintf(out + l, sz - l, snoopx_access[i]);
}

if (*out == '\0')
--
2.32.0

2022-04-27 10:28:24

by Ali Saidi

[permalink] [raw]
Subject: [PATCH v7 1/5] perf: Add SNOOP_PEER flag to perf mem data struct

Add a flag to the perf mem data struct to signal that a request caused a
cache-to-cache transfer of a line from a peer of the requestor and
wasn't sourced from a lower cache level. The line being moved from one
peer cache to another has latency and performance implications. On Arm64
Neoverse systems the data source can indicate a cache-to-cache transfer
but not if the line is dirty or clean, so instead of overloading HITM
define a new flag that indicates this type of transfer.

Signed-off-by: Ali Saidi <[email protected]>
Reviewed-by: Leo Yan <[email protected]>
---
include/uapi/linux/perf_event.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d37629dbad72..7b88bfd097dc 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1310,7 +1310,7 @@ union perf_mem_data_src {
#define PERF_MEM_SNOOP_SHIFT 19

#define PERF_MEM_SNOOPX_FWD 0x01 /* forward */
-/* 1 free */
+#define PERF_MEM_SNOOPX_PEER 0x02 /* xfer from peer */
#define PERF_MEM_SNOOPX_SHIFT 38

/* locked instruction */
--
2.32.0

2022-04-27 10:46:56

by Ali Saidi

[permalink] [raw]
Subject: [PATCH v7 2/5] perf tools: sync addition of PERF_MEM_SNOOPX_PEER

Add a flag to the perf mem data struct to signal that a request caused a
cache-to-cache transfer of a line from a peer of the requestor and
wasn't sourced from a lower cache level. The line being moved from one
peer cache to another has latency and performance implications. On Arm64
Neoverse systems the data source can indicate a cache-to-cache transfer
but not if the line is dirty or clean, so instead of overloading HITM
define a new flag that indicates this type of transfer.

Signed-off-by: Ali Saidi <[email protected]>
Reviewed-by: Leo Yan <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index d37629dbad72..7b88bfd097dc 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -1310,7 +1310,7 @@ union perf_mem_data_src {
#define PERF_MEM_SNOOP_SHIFT 19

#define PERF_MEM_SNOOPX_FWD 0x01 /* forward */
-/* 1 free */
+#define PERF_MEM_SNOOPX_PEER 0x02 /* xfer from peer */
#define PERF_MEM_SNOOPX_SHIFT 38

/* locked instruction */
--
2.32.0

2022-05-03 17:28:19

by German Gomez

[permalink] [raw]
Subject: Re: [PATCH v7 4/5] perf arm-spe: Don't set data source if it's not a memory operation


On 26/04/2022 14:59, Ali Saidi wrote:
> From: Leo Yan <[email protected]>
>
> Except memory load and store operations, Arm SPE records also can
> support other operation types, bug when set the data source field the
> current code assumes a record is a either load operation or store
> operation, this leads to wrongly synthesize memory samples.
>
> This patch strictly checks the record operation type, it only sets data
> source only for the operation types ARM_SPE_LD and ARM_SPE_ST,
> otherwise, returns zero for data source. Therefore, we can synthesize
> memory samples only when data source is a non-zero value, the function
> arm_spe__is_memory_event() is useless and removed.
>
> Signed-off-by: Leo Yan <[email protected]>
> Reviewed-by: Ali Saidi <[email protected]>
> Tested-by: Ali Saidi <[email protected]>

I think the Fixes tag is missing, right?

Fixes: e55ed3423c1b ("perf arm-spe: Synthesize memory event")
Reviewed-by: German Gomez <[email protected]>

Thanks,
German

> ---
> tools/perf/util/arm-spe.c | 22 ++++++++--------------
> 1 file changed, 8 insertions(+), 14 deletions(-)
>
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index d2b64e3f588b..e032efc03274 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -387,26 +387,16 @@ static int arm_spe__synth_instruction_sample(struct arm_spe_queue *speq,
> return arm_spe_deliver_synth_event(spe, speq, event, &sample);
> }
>
> -#define SPE_MEM_TYPE (ARM_SPE_L1D_ACCESS | ARM_SPE_L1D_MISS | \
> - ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS | \
> - ARM_SPE_REMOTE_ACCESS)
> -
> -static bool arm_spe__is_memory_event(enum arm_spe_sample_type type)
> -{
> - if (type & SPE_MEM_TYPE)
> - return true;
> -
> - return false;
> -}
> -
> static u64 arm_spe__synth_data_source(const struct arm_spe_record *record)
> {
> union perf_mem_data_src data_src = { 0 };
>
> if (record->op == ARM_SPE_LD)
> data_src.mem_op = PERF_MEM_OP_LOAD;
> - else
> + else if (record->op == ARM_SPE_ST)
> data_src.mem_op = PERF_MEM_OP_STORE;
> + else
> + return 0;
>
> if (record->type & (ARM_SPE_LLC_ACCESS | ARM_SPE_LLC_MISS)) {
> data_src.mem_lvl = PERF_MEM_LVL_L3;
> @@ -510,7 +500,11 @@ static int arm_spe_sample(struct arm_spe_queue *speq)
> return err;
> }
>
> - if (spe->sample_memory && arm_spe__is_memory_event(record->type)) {
> + /*
> + * When data_src is zero it means the record is not a memory operation,
> + * skip to synthesize memory sample for this case.
> + */
> + if (spe->sample_memory && data_src) {
> err = arm_spe__synth_mem_sample(speq, spe->memory_id, data_src);
> if (err)
> return err;