2022-06-06 05:16:56

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 00/17] perf c2c: Support data source and display for Arm64

Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
us to detect cache line contention and transfers.

This patch set includes Ali's patch set v9 "perf: arm-spe: Decode SPE
source and use for perf c2c" [1] and rebased on the latest perf core
banch with latest commit 1bcca2b1bd67 ("perf vendor events intel:
Update metrics for Alderlake").

Patches 01-05 comes from Ali's patch set to support data source for Arm
SPE for neoverse cores.

Patches 06-17 are patches from patch set v4 for support perf c2c peer
display for Arm64 [2].

This patch set has been verified for both x86 perf memory events and Arm
SPE events.

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/

Changes from v4:
* Included Ali's patch set for adding data source in Arm SPE samples;
* Added Ian's ACK and Ali's review and test tags;
* Update document for the default peer dispaly for Arm64 (Ali).

Changes from v3:
* Changed to display remote and local peer accesses (Joe);
* Fixed the usage info for display types (Joe);
* Do not display HITM dimensions when use 'peer' display, and HITM
display doesn't show any 'peer' dimensions (James);
* Split to smaller patches for adding dimensions of peer operations;
* Updated documentation to reflect the latest GUI and stdio.

Changes from v2:
* Updated patch 04 to account metrics for both cache level and ld_peer
for PEER flag;
* Updated document for metric 'rmt_hit' which is accounted for all
remote accesses (include remote DRAM and any upward caches).

Changes from v1:
* Updated patches 01, 02 and 03 to support 'N/A' metrics for store
operations, so can align with the patch set [1] for store samples.


Ali Saidi (3):
perf: Add SNOOP_PEER flag to perf mem data struct
perf tools: sync addition of PERF_MEM_SNOOPX_PEER
perf arm-spe: Use SPE data source for neoverse cores

Leo Yan (14):
perf mem: Print snoop peer flag
perf arm-spe: Don't set data source if it's not a memory operation
perf mem: Add statistics for peer snooping
perf c2c: Output statistics for peer snooping
perf c2c: Add dimensions for peer load operations
perf c2c: Add dimensions of peer metrics for cache line view
perf c2c: Add mean dimensions for peer operations
perf c2c: Use explicit names for display macros
perf c2c: Rename dimension from 'percent_hitm' to
'percent_costly_snoop'
perf c2c: Refactor node header
perf c2c: Refactor display string
perf c2c: Sort on peer snooping for load operations
perf c2c: Use 'peer' as default display for Arm64
perf c2c: Update documentation for new display option 'peer'

include/uapi/linux/perf_event.h | 2 +-
tools/include/uapi/linux/perf_event.h | 2 +-
tools/perf/Documentation/perf-c2c.txt | 31 +-
tools/perf/builtin-c2c.c | 454 ++++++++++++++----
.../util/arm-spe-decoder/arm-spe-decoder.c | 1 +
.../util/arm-spe-decoder/arm-spe-decoder.h | 12 +
tools/perf/util/arm-spe.c | 140 +++++-
tools/perf/util/mem-events.c | 46 +-
tools/perf/util/mem-events.h | 3 +
9 files changed, 550 insertions(+), 141 deletions(-)

--
2.25.1


2022-06-06 05:32:04

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 14/17] perf c2c: Refactor display string

The display type is shown by combination the display string array and a
suffix string "HITMs", which is not friendly to extend display for other
sorting type (e.g. extension for peer operations).

This patch moves the suffix string "HITMs" into display string array for
HITM types, so it can allow us to not necessarily to output string
"HITMs" for new incoming display type.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Tested-by: Ali Saidi <[email protected]>
Reviewed-by: Ali Saidi <[email protected]>
---
tools/perf/builtin-c2c.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 49a9b8480b41..8b7c1fd35380 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -122,9 +122,9 @@ enum {
};

static const char *display_str[DISPLAY_MAX] = {
- [DISPLAY_LCL_HITM] = "Local",
- [DISPLAY_RMT_HITM] = "Remote",
- [DISPLAY_TOT_HITM] = "Total",
+ [DISPLAY_LCL_HITM] = "Local HITMs",
+ [DISPLAY_RMT_HITM] = "Remote HITMs",
+ [DISPLAY_TOT_HITM] = "Total HITMs",
};

static const struct option c2c_options[] = {
@@ -2489,7 +2489,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
fprintf(out, "%-36s: %s\n", first ? " Events" : "", evsel__name(evsel));
first = false;
}
- fprintf(out, " Cachelines sort on : %s HITMs\n",
+ fprintf(out, " Cachelines sort on : %s\n",
display_str[c2c.display]);
fprintf(out, " Cacheline data grouping : %s\n", c2c.cl_sort);
}
@@ -2646,7 +2646,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
{
scnprintf(bf, size,
"Shared Data Cache Line Table "
- "(%lu entries, sorted on %s HITMs)",
+ "(%lu entries, sorted on %s)",
browser->nr_non_filtered_entries,
display_str[c2c.display]);
return 0;
--
2.25.1

2022-06-06 05:33:43

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 01/17] perf: Add SNOOP_PEER flag to perf mem data struct

From: Ali Saidi <[email protected]>

Add a flag to the perf mem data struct to signal that a request caused a
cache-to-cache transfer of a line from a peer of the requestor and
wasn't sourced from a lower cache level. The line being moved from one
peer cache to another has latency and performance implications. On Arm64
Neoverse systems the data source can indicate a cache-to-cache transfer
but not if the line is dirty or clean, so instead of overloading HITM
define a new flag that indicates this type of transfer.

Signed-off-by: Ali Saidi <[email protected]>
Reviewed-by: Leo Yan <[email protected]>
Reviewed-by: Kajol Jain<[email protected]>
---
include/uapi/linux/perf_event.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d37629dbad72..7b88bfd097dc 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -1310,7 +1310,7 @@ union perf_mem_data_src {
#define PERF_MEM_SNOOP_SHIFT 19

#define PERF_MEM_SNOOPX_FWD 0x01 /* forward */
-/* 1 free */
+#define PERF_MEM_SNOOPX_PEER 0x02 /* xfer from peer */
#define PERF_MEM_SNOOPX_SHIFT 38

/* locked instruction */
--
2.25.1

2022-06-06 05:42:58

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 06/17] perf mem: Add statistics for peer snooping

Since the flag PERF_MEM_SNOOPX_PEER is added to support cache snooping
from peer cache line, it can come from a peer core, a peer cluster, or
a remote NUMA node.

This patch adds statistics for the flag PERF_MEM_SNOOPX_PEER. Note, we
take PERF_MEM_SNOOPX_PEER as an affiliated info, it needs to cooperate
with cache level statistics. Therefore, we account the load operations
for both the cache level's metrics (e.g. ld_l2hit, ld_llchit, etc.) and
peer related metrics when flag PERF_MEM_SNOOPX_PEER is set.

So three new metrics are introduced: 'lcl_peer' is for local cache
access, the metric 'rmt_peer' is for remote access (includes remote DRAM
and any caches in remote node), and the metric 'tot_peer' is accounting
the sum value of 'lcl_peer' and 'rmt_peer'.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Tested-by: Ali Saidi <[email protected]>
Reviewed-by: Ali Saidi <[email protected]>
---
tools/perf/util/mem-events.c | 28 +++++++++++++++++++++++++---
tools/perf/util/mem-events.h | 3 +++
2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index 5dca1882c284..764883183519 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -525,6 +525,7 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi)
u64 op = data_src->mem_op;
u64 lvl = data_src->mem_lvl;
u64 snoop = data_src->mem_snoop;
+ u64 snoopx = data_src->mem_snoopx;
u64 lock = data_src->mem_lock;
u64 blk = data_src->mem_blk;
/*
@@ -544,6 +545,12 @@ do { \
stats->tot_hitm++; \
} while (0)

+#define PEER_INC(__f) \
+do { \
+ stats->__f++; \
+ stats->tot_peer++; \
+} while (0)
+
#define P(a, b) PERF_MEM_##a##_##b

stats->nr_entries++;
@@ -567,12 +574,20 @@ do { \
if (lvl & P(LVL, IO)) stats->ld_io++;
if (lvl & P(LVL, LFB)) stats->ld_fbhit++;
if (lvl & P(LVL, L1 )) stats->ld_l1hit++;
- if (lvl & P(LVL, L2 )) stats->ld_l2hit++;
+ if (lvl & P(LVL, L2)) {
+ stats->ld_l2hit++;
+
+ if (snoopx & P(SNOOPX, PEER))
+ PEER_INC(lcl_peer);
+ }
if (lvl & P(LVL, L3 )) {
if (snoop & P(SNOOP, HITM))
HITM_INC(lcl_hitm);
else
stats->ld_llchit++;
+
+ if (snoopx & P(SNOOPX, PEER))
+ PEER_INC(lcl_peer);
}

if (lvl & P(LVL, LOC_RAM)) {
@@ -597,10 +612,14 @@ do { \
if ((lvl & P(LVL, REM_CCE1)) ||
(lvl & P(LVL, REM_CCE2)) ||
mrem) {
- if (snoop & P(SNOOP, HIT))
+ if (snoop & P(SNOOP, HIT)) {
stats->rmt_hit++;
- else if (snoop & P(SNOOP, HITM))
+ } else if (snoop & P(SNOOP, HITM)) {
HITM_INC(rmt_hitm);
+ } else if (snoopx & P(SNOOPX, PEER)) {
+ stats->rmt_hit++;
+ PEER_INC(rmt_peer);
+ }
}

if ((lvl & P(LVL, MISS)))
@@ -664,6 +683,9 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add)
stats->lcl_hitm += add->lcl_hitm;
stats->rmt_hitm += add->rmt_hitm;
stats->tot_hitm += add->tot_hitm;
+ stats->lcl_peer += add->lcl_peer;
+ stats->rmt_peer += add->rmt_peer;
+ stats->tot_peer += add->tot_peer;
stats->rmt_hit += add->rmt_hit;
stats->lcl_dram += add->lcl_dram;
stats->rmt_dram += add->rmt_dram;
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 8a8b568baeee..12372309d60e 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -78,6 +78,9 @@ struct c2c_stats {
u32 lcl_hitm; /* count of loads with local HITM */
u32 rmt_hitm; /* count of loads with remote HITM */
u32 tot_hitm; /* count of loads with local and remote HITM */
+ u32 lcl_peer; /* count of loads with local peer cache */
+ u32 rmt_peer; /* count of loads with remote peer cache */
+ u32 tot_peer; /* count of loads with local and remote peer cache */
u32 rmt_hit; /* count of loads with remote hit clean; */
u32 lcl_dram; /* count of loads miss to local DRAM */
u32 rmt_dram; /* count of loads miss to remote DRAM */
--
2.25.1

2022-06-06 05:53:51

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 17/17] perf c2c: Update documentation for new display option 'peer'

Since the new display option 'peer' is introduced, this patch is to
update the documentation to reflect it.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Reviewed-by: Ali Saidi <[email protected]>
---
tools/perf/Documentation/perf-c2c.txt | 31 +++++++++++++++++++++------
1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index 6f69173731aa..f1f7ae6b08d1 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -109,7 +109,9 @@ REPORT OPTIONS

-d::
--display::
- Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
+ Switch to HITM type (rmt, lcl) or peer snooping type (peer) to display
+ and sort on. Total HITMs (tot) as default, except Arm64 uses peer mode
+ as default.

--stitch-lbr::
Show callgraph with stitched LBRs, which may have more complete
@@ -174,12 +176,18 @@ For each cacheline in the 1) list we display following data:
Cacheline
- cacheline address (hex number)

- Rmt/Lcl Hitm
+ Rmt/Lcl Hitm (Display with HITM types)
- cacheline percentage of all Remote/Local HITM accesses

- LLC Load Hitm - Total, LclHitm, RmtHitm
+ Peer Snoop (Display with peer type)
+ - cacheline percentage of all peer accesses
+
+ LLC Load Hitm - Total, LclHitm, RmtHitm (For display with HITM types)
- count of Total/Local/Remote load HITMs

+ Load Peer - Total, Local, Remote (For display with peer type)
+ - count of Total/Local/Remote load from peer cache or DRAM
+
Total records
- sum of all cachelines accesses

@@ -201,16 +209,21 @@ For each cacheline in the 1) list we display following data:
- count of LLC load accesses, includes LLC hits and LLC HITMs

RMT Load Hit - RmtHit, RmtHitm
- - count of remote load accesses, includes remote hits and remote HITMs
+ - count of remote load accesses, includes remote hits and remote HITMs;
+ on Arm neoverse cores, RmtHit is used to account remote accesses,
+ includes remote DRAM or any upward cache level in remote node

Load Dram - Lcl, Rmt
- count of local and remote DRAM accesses

For each offset in the 2) list we display following data:

- HITM - Rmt, Lcl
+ HITM - Rmt, Lcl (Display with HITM types)
- % of Remote/Local HITM accesses for given offset within cacheline

+ Peer Snoop - Rmt, Lcl (Display with peer type)
+ - % of Remote/Local peer accesses for given offset within cacheline
+
Store Refs - L1 Hit, L1 Miss, N/A
- % of store accesses that hit L1, missed L1 and N/A (no available) memory
level for given offset within cacheline
@@ -227,9 +240,12 @@ For each offset in the 2) list we display following data:
Code address
- code address responsible for the accesses

- cycles - rmt hitm, lcl hitm, load
+ cycles - rmt hitm, lcl hitm, load (Display with HITM types)
- sum of cycles for given accesses - Remote/Local HITM and generic load

+ cycles - rmt peer, lcl peer, load (Display with peer type)
+ - sum of cycles for given accesses - Remote/Local peer load and generic load
+
cpu cnt
- number of cpus that participated on the access

@@ -251,7 +267,8 @@ The 'Node' field displays nodes that accesses given cacheline
offset. Its output comes in 3 flavors:
- node IDs separated by ','
- node IDs with stats for each ID, in following format:
- Node{cpus %hitms %stores}
+ Node{cpus %hitms %stores} (Display with HITM types)
+ Node{cpus %peers %stores} (Display with peer type)
- node IDs with list of affected CPUs in following format:
Node{cpu list}

--
2.25.1

2022-06-06 05:54:49

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 07/17] perf c2c: Output statistics for peer snooping

This patch outputs statistics for peer snooping for whole trace events
and global shared cache line.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Tested-by: Ali Saidi <[email protected]>
Reviewed-by: Ali Saidi <[email protected]>
---
tools/perf/builtin-c2c.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 4898ee57d156..37bebeb6c11b 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2202,6 +2202,8 @@ static void print_c2c__display_stats(FILE *out)
fprintf(out, " Load LLC Misses : %10d\n", llc_misses);
fprintf(out, " Load access blocked by data : %10d\n", stats->blk_data);
fprintf(out, " Load access blocked by address : %10d\n", stats->blk_addr);
+ fprintf(out, " Load HIT Local Peer : %10d\n", stats->lcl_peer);
+ fprintf(out, " Load HIT Remote Peer : %10d\n", stats->rmt_peer);
fprintf(out, " LLC Misses to Local DRAM : %10.1f%%\n", ((double)stats->lcl_dram/(double)llc_misses) * 100.);
fprintf(out, " LLC Misses to Remote DRAM : %10.1f%%\n", ((double)stats->rmt_dram/(double)llc_misses) * 100.);
fprintf(out, " LLC Misses to Remote cache (HIT) : %10.1f%%\n", ((double)stats->rmt_hit /(double)llc_misses) * 100.);
@@ -2230,6 +2232,7 @@ static void print_shared_cacheline_info(FILE *out)
fprintf(out, " L1D hits on shared lines : %10d\n", stats->ld_l1hit);
fprintf(out, " L2D hits on shared lines : %10d\n", stats->ld_l2hit);
fprintf(out, " LLC hits on shared lines : %10d\n", stats->ld_llchit + stats->lcl_hitm);
+ fprintf(out, " Load hits on peer cache or nodes : %10d\n", stats->lcl_peer + stats->rmt_peer);
fprintf(out, " Locked Access on shared lines : %10d\n", stats->locks);
fprintf(out, " Blocked Access on shared lines : %10d\n", stats->blk_data + stats->blk_addr);
fprintf(out, " Store HITs on shared lines : %10d\n", stats->store);
--
2.25.1

2022-06-06 06:15:48

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 13/17] perf c2c: Refactor node header

The node header array contains 3 items, each item is used for one of
the 3 flavors for node accessing info. To extend sorting on other
snooping type and not always stick to HITMs, the second header string
"Node{cpus %hitms %stores}" should be adjusted (e.g. it's changed as
"Node{cpus %peer %stores}").

For this reason, this patch changes the node header array to three
flat variables and uses switch-case in function setup_nodes_header(),
thus it is easier for altering the header string.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Tested-by: Ali Saidi <[email protected]>
Reviewed-by: Ali Saidi <[email protected]>
---
tools/perf/builtin-c2c.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 66ff834516a2..49a9b8480b41 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1723,12 +1723,6 @@ static struct c2c_dimension dim_dso = {
.se = &sort_dso,
};

-static struct c2c_header header_node[3] = {
- HEADER_LOW("Node"),
- HEADER_LOW("Node{cpus %hitms %stores}"),
- HEADER_LOW("Node{cpu list}"),
-};
-
static struct c2c_dimension dim_node = {
.name = "node",
.cmp = empty_cmp,
@@ -2229,9 +2223,27 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
return 0;
}

+static struct c2c_header header_node_0 = HEADER_LOW("Node");
+static struct c2c_header header_node_1 = HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_2 = HEADER_LOW("Node{cpu list}");
+
static void setup_nodes_header(void)
{
- dim_node.header = header_node[c2c.node_info];
+ switch (c2c.node_info) {
+ case 0:
+ dim_node.header = header_node_0;
+ break;
+ case 1:
+ dim_node.header = header_node_1;
+ break;
+ case 2:
+ dim_node.header = header_node_2;
+ break;
+ default:
+ break;
+ }
+
+ return;
}

static int setup_nodes(struct perf_session *session)
--
2.25.1

2022-06-06 06:17:17

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 15/17] perf c2c: Sort on peer snooping for load operations

This patch adds a new option 'peer' so can sort on the cache hit for
peer snooping.

For displaying with option 'peer', the "Shared Data Cache Line Table"
and "Shared Cache Line Distribution Pareto" both sort with the metrics
"tot_peer".

As result, we can get the 'peer' display:

# perf c2c report -d peer --coalesce tid,pid,iaddr,dso -N --stdio

=================================================
Shared Data Cache Line Table
=================================================
#
# ----------- Cacheline ---------- Peer ------- Load Peer ------- Total Total Total --------- Stores -------- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
# Index Address Node PA cnt Snoop Total Local Remote records Loads Stores L1Hit L1Miss N/A FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
# ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
#
0 0xaaaac17d6000 N/A 0 100.00% 99 99 0 18851 18851 0 0 0 0 0 18752 0 99 0 0 0 0 0

=================================================
Shared Cache Line Distribution Pareto
=================================================
#
# -- Peer Snoop -- ------- Store Refs ------ --------- Data address --------- ---------- cycles ---------- Total cpu Shared
# Num Rmt Lcl L1 Hit L1 Miss N/A Offset Node PA cnt Pid Tid Code address rmt peer lcl peer load records cnt Symbol Object Source:Line Node{cpus %peers %stores}
# ..... ....... ....... ....... ....... ....... .................. .... ...... ....... ................. .................. ........ ........ ........ ....... ........ ...................... ................ ............... ....
#
----------------------------------------------------------------------
0 0 99 0 0 0 0xaaaac17d6000
----------------------------------------------------------------------
0.00% 3.03% 0.00% 0.00% 0.00% 0x20 N/A 0 3603 3603:memstress 0xaaaac17c25ac 0 376 41 9314 2 [.] 0x00000000000025ac memstress memstress[25ac] 0{ 2 100.0% n/a}
0.00% 3.03% 0.00% 0.00% 0.00% 0x20 N/A 0 3603 3606:memstress 0xaaaac17c25ac 0 375 44 9155 1 [.] 0x00000000000025ac memstress memstress[25ac] 0{ 1 100.0% n/a}
0.00% 48.48% 0.00% 0.00% 0.00% 0x29 N/A 0 3603 3606:memstress 0xaaaac17c3e88 0 180 170 65 1 [.] 0x0000000000003e88 memstress memstress[3e88] 0{ 1 100.0% n/a}
0.00% 45.45% 0.00% 0.00% 0.00% 0x29 N/A 0 3603 3603:memstress 0xaaaac17c3e88 0 180 175 70 2 [.] 0x0000000000003e88 memstress memstress[3e88] 0{ 2 100.0% n/a}

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Tested-by: Ali Saidi <[email protected]>
Reviewed-by: Ali Saidi <[email protected]>
---
tools/perf/builtin-c2c.c | 135 ++++++++++++++++++++++++++++-----------
1 file changed, 99 insertions(+), 36 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 8b7c1fd35380..f7a961e55a92 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -118,6 +118,7 @@ enum {
DISPLAY_LCL_HITM,
DISPLAY_RMT_HITM,
DISPLAY_TOT_HITM,
+ DISPLAY_SNP_PEER,
DISPLAY_MAX,
};

@@ -125,6 +126,7 @@ static const char *display_str[DISPLAY_MAX] = {
[DISPLAY_LCL_HITM] = "Local HITMs",
[DISPLAY_RMT_HITM] = "Remote HITMs",
[DISPLAY_TOT_HITM] = "Total HITMs",
+ [DISPLAY_SNP_PEER] = "Peer Snoop",
};

static const struct option c2c_options[] = {
@@ -822,6 +824,11 @@ static double percent_costly_snoop(struct c2c_hist_entry *c2c_he)
case DISPLAY_TOT_HITM:
st = stats->tot_hitm;
tot = total->tot_hitm;
+ break;
+ case DISPLAY_SNP_PEER:
+ st = stats->tot_peer;
+ tot = total->tot_peer;
+ break;
default:
break;
}
@@ -1229,6 +1236,10 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
ret = display_metrics(hpp, stats->tot_hitm,
c2c_he->stats.tot_hitm);
break;
+ case DISPLAY_SNP_PEER:
+ ret = display_metrics(hpp, stats->tot_peer,
+ c2c_he->stats.tot_peer);
+ break;
default:
break;
}
@@ -1609,6 +1620,7 @@ static struct c2c_header percent_costly_snoop_header[] = {
[DISPLAY_LCL_HITM] = HEADER_BOTH("Lcl", "Hitm"),
[DISPLAY_RMT_HITM] = HEADER_BOTH("Rmt", "Hitm"),
[DISPLAY_TOT_HITM] = HEADER_BOTH("Tot", "Hitm"),
+ [DISPLAY_SNP_PEER] = HEADER_BOTH("Peer", "Snoop"),
};

static struct c2c_dimension dim_percent_costly_snoop = {
@@ -2107,6 +2119,10 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
he->filtered = filter_display(c2c_he->stats.tot_hitm,
stats->tot_hitm);
break;
+ case DISPLAY_SNP_PEER:
+ he->filtered = filter_display(c2c_he->stats.tot_peer,
+ stats->tot_peer);
+ break;
default:
break;
}
@@ -2135,6 +2151,8 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
case DISPLAY_TOT_HITM:
has_record = !!c2c_he->stats.tot_hitm;
break;
+ case DISPLAY_SNP_PEER:
+ has_record = !!c2c_he->stats.tot_peer;
default:
break;
}
@@ -2224,7 +2242,10 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
}

static struct c2c_header header_node_0 = HEADER_LOW("Node");
-static struct c2c_header header_node_1 = HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_1_hitms_stores =
+ HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_1_peers_stores =
+ HEADER_LOW("Node{cpus %peers %stores}");
static struct c2c_header header_node_2 = HEADER_LOW("Node{cpu list}");

static void setup_nodes_header(void)
@@ -2234,7 +2255,10 @@ static void setup_nodes_header(void)
dim_node.header = header_node_0;
break;
case 1:
- dim_node.header = header_node_1;
+ if (c2c.display == DISPLAY_SNP_PEER)
+ dim_node.header = header_node_1_peers_stores;
+ else
+ dim_node.header = header_node_1_hitms_stores;
break;
case 2:
dim_node.header = header_node_2;
@@ -2308,13 +2332,14 @@ static int setup_nodes(struct perf_session *session)
}

#define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)
+#define HAS_PEER(__h) ((__h)->stats.lcl_peer || (__h)->stats.rmt_peer)

static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
{
struct c2c_hist_entry *c2c_he;
c2c_he = container_of(he, struct c2c_hist_entry, he);

- if (HAS_HITMS(c2c_he)) {
+ if (HAS_HITMS(c2c_he) || HAS_PEER(c2c_he)) {
c2c.shared_clines++;
c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
}
@@ -2447,13 +2472,22 @@ static void print_pareto(FILE *out)
int ret;
const char *cl_output;

- cl_output = "cl_num,"
- "cl_rmt_hitm,"
- "cl_lcl_hitm,"
- "cl_stores_l1hit,"
- "cl_stores_l1miss,"
- "cl_stores_na,"
- "dcacheline";
+ if (c2c.display != DISPLAY_SNP_PEER)
+ cl_output = "cl_num,"
+ "cl_rmt_hitm,"
+ "cl_lcl_hitm,"
+ "cl_stores_l1hit,"
+ "cl_stores_l1miss,"
+ "cl_stores_na,"
+ "dcacheline";
+ else
+ cl_output = "cl_num,"
+ "cl_rmt_peer,"
+ "cl_lcl_peer,"
+ "cl_stores_l1hit,"
+ "cl_stores_l1miss,"
+ "cl_stores_na,"
+ "dcacheline";

perf_hpp_list__init(&hpp_list);
ret = hpp_list__parse(&hpp_list, cl_output, NULL);
@@ -2852,6 +2886,8 @@ static int setup_display(const char *str)
c2c.display = DISPLAY_RMT_HITM;
else if (!strcmp(display, "lcl"))
c2c.display = DISPLAY_LCL_HITM;
+ else if (!strcmp(display, "peer"))
+ c2c.display = DISPLAY_SNP_PEER;
else {
pr_err("failed: unknown display type: %s\n", str);
return -1;
@@ -2898,10 +2934,12 @@ static int build_cl_output(char *cl_sort, bool no_source)
}

if (asprintf(&c2c.cl_output,
- "%s%s%s%s%s%s%s%s%s%s",
+ "%s%s%s%s%s%s%s%s%s%s%s%s",
c2c.use_stdio ? "cl_num_empty," : "",
- "percent_rmt_hitm,"
- "percent_lcl_hitm,"
+ c2c.display == DISPLAY_SNP_PEER ? "percent_rmt_peer,"
+ "percent_lcl_peer," :
+ "percent_rmt_hitm,"
+ "percent_lcl_hitm,",
"percent_stores_l1hit,"
"percent_stores_l1miss,"
"percent_stores_na,"
@@ -2909,8 +2947,10 @@ static int build_cl_output(char *cl_sort, bool no_source)
add_pid ? "pid," : "",
add_tid ? "tid," : "",
add_iaddr ? "iaddr," : "",
- "mean_rmt,"
- "mean_lcl,"
+ c2c.display == DISPLAY_SNP_PEER ? "mean_rmt_peer,"
+ "mean_lcl_peer," :
+ "mean_rmt,"
+ "mean_lcl,",
"mean_load,"
"tot_recs,"
"cpucnt,",
@@ -2931,6 +2971,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
static int setup_coalesce(const char *coalesce, bool no_source)
{
const char *c = coalesce ?: coalesce_default;
+ const char *sort_str = NULL;

if (asprintf(&c2c.cl_sort, "offset,%s", c) < 0)
return -ENOMEM;
@@ -2938,12 +2979,16 @@ static int setup_coalesce(const char *coalesce, bool no_source)
if (build_cl_output(c2c.cl_sort, no_source))
return -1;

- if (asprintf(&c2c.cl_resort, "offset,%s",
- c2c.display == DISPLAY_TOT_HITM ?
- "tot_hitm" :
- c2c.display == DISPLAY_RMT_HITM ?
- "rmt_hitm,lcl_hitm" :
- "lcl_hitm,rmt_hitm") < 0)
+ if (c2c.display == DISPLAY_TOT_HITM)
+ sort_str = "tot_hitm";
+ else if (c2c.display == DISPLAY_RMT_HITM)
+ sort_str = "rmt_hitm,lcl_hitm";
+ else if (c2c.display == DISPLAY_LCL_HITM)
+ sort_str = "lcl_hitm,rmt_hitm";
+ else if (c2c.display == DISPLAY_SNP_PEER)
+ sort_str = "tot_peer";
+
+ if (asprintf(&c2c.cl_resort, "offset,%s", sort_str) < 0)
return -ENOMEM;

pr_debug("coalesce sort fields: %s\n", c2c.cl_sort);
@@ -2989,7 +3034,7 @@ static int perf_c2c__report(int argc, const char **argv)
"print_type,threshold[,print_limit],order,sort_key[,branch],value",
callchain_help, &parse_callchain_opt,
callchain_default_opt),
- OPT_STRING('d', "display", &display, "Switch HITM output type", "lcl,rmt"),
+ OPT_STRING('d', "display", &display, "Switch HITM output type", "tot,lcl,rmt,peer"),
OPT_STRING('c', "coalesce", &coalesce, "coalesce fields",
"coalesce fields: pid,tid,iaddr,dso"),
OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
@@ -3084,20 +3129,36 @@ static int perf_c2c__report(int argc, const char **argv)
goto out_mem2node;
}

- output_str = "cl_idx,"
- "dcacheline,"
- "dcacheline_node,"
- "dcacheline_count,"
- "percent_costly_snoop,"
- "tot_hitm,lcl_hitm,rmt_hitm,"
- "tot_recs,"
- "tot_loads,"
- "tot_stores,"
- "stores_l1hit,stores_l1miss,stores_na,"
- "ld_fbhit,ld_l1hit,ld_l2hit,"
- "ld_lclhit,lcl_hitm,"
- "ld_rmthit,rmt_hitm,"
- "dram_lcl,dram_rmt";
+ if (c2c.display != DISPLAY_SNP_PEER)
+ output_str = "cl_idx,"
+ "dcacheline,"
+ "dcacheline_node,"
+ "dcacheline_count,"
+ "percent_costly_snoop,"
+ "tot_hitm,lcl_hitm,rmt_hitm,"
+ "tot_recs,"
+ "tot_loads,"
+ "tot_stores,"
+ "stores_l1hit,stores_l1miss,stores_na,"
+ "ld_fbhit,ld_l1hit,ld_l2hit,"
+ "ld_lclhit,lcl_hitm,"
+ "ld_rmthit,rmt_hitm,"
+ "dram_lcl,dram_rmt";
+ else
+ output_str = "cl_idx,"
+ "dcacheline,"
+ "dcacheline_node,"
+ "dcacheline_count,"
+ "percent_costly_snoop,"
+ "tot_peer,lcl_peer,rmt_peer,"
+ "tot_recs,"
+ "tot_loads,"
+ "tot_stores,"
+ "stores_l1hit,stores_l1miss,stores_na,"
+ "ld_fbhit,ld_l1hit,ld_l2hit,"
+ "ld_lclhit,lcl_hitm,"
+ "ld_rmthit,rmt_hitm,"
+ "dram_lcl,dram_rmt";

if (c2c.display == DISPLAY_TOT_HITM)
sort_str = "tot_hitm";
@@ -3105,6 +3166,8 @@ static int perf_c2c__report(int argc, const char **argv)
sort_str = "rmt_hitm";
else if (c2c.display == DISPLAY_LCL_HITM)
sort_str = "lcl_hitm";
+ else if (c2c.display == DISPLAY_SNP_PEER)
+ sort_str = "tot_peer";

c2c_hists__reinit(&c2c.hists, output_str, sort_str);

--
2.25.1

2022-06-06 06:20:58

by Leo Yan

[permalink] [raw]
Subject: [PATCH v5 11/17] perf c2c: Use explicit names for display macros

Perf c2c tool has an assumption that it heavily depends on HITM snoop
type to detect cache false sharing, unfortunately, HITM is not supported
on some architectures.

Essentially, perf c2c tool wants to find some very costly snooping
operations for false cache sharing, this means it's not necessarily
to stick using HITM tags and we can explore other snooping types
(e.g. SNOOPX_PEER).

For this reason, this patch renames HITM related display macros with
suffix '_HITM', so it can be distinct if later add more display types
for on other snooping type.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Ian Rogers <[email protected]>
Tested-by: Ali Saidi <[email protected]>
Reviewed-by: Ali Saidi <[email protected]>
---
tools/perf/builtin-c2c.c | 58 ++++++++++++++++++++--------------------
1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 8dd9218a052f..cbeb1878a71c 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -115,16 +115,16 @@ struct perf_c2c {
};

enum {
- DISPLAY_LCL,
- DISPLAY_RMT,
- DISPLAY_TOT,
+ DISPLAY_LCL_HITM,
+ DISPLAY_RMT_HITM,
+ DISPLAY_TOT_HITM,
DISPLAY_MAX,
};

static const char *display_str[DISPLAY_MAX] = {
- [DISPLAY_LCL] = "Local",
- [DISPLAY_RMT] = "Remote",
- [DISPLAY_TOT] = "Total",
+ [DISPLAY_LCL_HITM] = "Local",
+ [DISPLAY_RMT_HITM] = "Remote",
+ [DISPLAY_TOT_HITM] = "Total",
};

static const struct option c2c_options[] = {
@@ -811,15 +811,15 @@ static double percent_hitm(struct c2c_hist_entry *c2c_he)
total = &hists->stats;

switch (c2c.display) {
- case DISPLAY_RMT:
+ case DISPLAY_RMT_HITM:
st = stats->rmt_hitm;
tot = total->rmt_hitm;
break;
- case DISPLAY_LCL:
+ case DISPLAY_LCL_HITM:
st = stats->lcl_hitm;
tot = total->lcl_hitm;
break;
- case DISPLAY_TOT:
+ case DISPLAY_TOT_HITM:
st = stats->tot_hitm;
tot = total->tot_hitm;
default:
@@ -1217,15 +1217,15 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
advance_hpp(hpp, ret);

switch (c2c.display) {
- case DISPLAY_RMT:
+ case DISPLAY_RMT_HITM:
ret = display_metrics(hpp, stats->rmt_hitm,
c2c_he->stats.rmt_hitm);
break;
- case DISPLAY_LCL:
+ case DISPLAY_LCL_HITM:
ret = display_metrics(hpp, stats->lcl_hitm,
c2c_he->stats.lcl_hitm);
break;
- case DISPLAY_TOT:
+ case DISPLAY_TOT_HITM:
ret = display_metrics(hpp, stats->tot_hitm,
c2c_he->stats.tot_hitm);
break;
@@ -1606,9 +1606,9 @@ static struct c2c_dimension dim_tot_loads = {
};

static struct c2c_header percent_hitm_header[] = {
- [DISPLAY_LCL] = HEADER_BOTH("Lcl", "Hitm"),
- [DISPLAY_RMT] = HEADER_BOTH("Rmt", "Hitm"),
- [DISPLAY_TOT] = HEADER_BOTH("Tot", "Hitm"),
+ [DISPLAY_LCL_HITM] = HEADER_BOTH("Lcl", "Hitm"),
+ [DISPLAY_RMT_HITM] = HEADER_BOTH("Rmt", "Hitm"),
+ [DISPLAY_TOT_HITM] = HEADER_BOTH("Tot", "Hitm"),
};

static struct c2c_dimension dim_percent_hitm = {
@@ -2101,15 +2101,15 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
c2c_he = container_of(he, struct c2c_hist_entry, he);

switch (c2c.display) {
- case DISPLAY_LCL:
+ case DISPLAY_LCL_HITM:
he->filtered = filter_display(c2c_he->stats.lcl_hitm,
stats->lcl_hitm);
break;
- case DISPLAY_RMT:
+ case DISPLAY_RMT_HITM:
he->filtered = filter_display(c2c_he->stats.rmt_hitm,
stats->rmt_hitm);
break;
- case DISPLAY_TOT:
+ case DISPLAY_TOT_HITM:
he->filtered = filter_display(c2c_he->stats.tot_hitm,
stats->tot_hitm);
break;
@@ -2132,13 +2132,13 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
return true;

switch (c2c.display) {
- case DISPLAY_LCL:
+ case DISPLAY_LCL_HITM:
has_record = !!c2c_he->stats.lcl_hitm;
break;
- case DISPLAY_RMT:
+ case DISPLAY_RMT_HITM:
has_record = !!c2c_he->stats.rmt_hitm;
break;
- case DISPLAY_TOT:
+ case DISPLAY_TOT_HITM:
has_record = !!c2c_he->stats.tot_hitm;
break;
default:
@@ -2835,11 +2835,11 @@ static int setup_display(const char *str)
const char *display = str ?: "tot";

if (!strcmp(display, "tot"))
- c2c.display = DISPLAY_TOT;
+ c2c.display = DISPLAY_TOT_HITM;
else if (!strcmp(display, "rmt"))
- c2c.display = DISPLAY_RMT;
+ c2c.display = DISPLAY_RMT_HITM;
else if (!strcmp(display, "lcl"))
- c2c.display = DISPLAY_LCL;
+ c2c.display = DISPLAY_LCL_HITM;
else {
pr_err("failed: unknown display type: %s\n", str);
return -1;
@@ -2927,9 +2927,9 @@ static int setup_coalesce(const char *coalesce, bool no_source)
return -1;

if (asprintf(&c2c.cl_resort, "offset,%s",
- c2c.display == DISPLAY_TOT ?
+ c2c.display == DISPLAY_TOT_HITM ?
"tot_hitm" :
- c2c.display == DISPLAY_RMT ?
+ c2c.display == DISPLAY_RMT_HITM ?
"rmt_hitm,lcl_hitm" :
"lcl_hitm,rmt_hitm") < 0)
return -ENOMEM;
@@ -3087,11 +3087,11 @@ static int perf_c2c__report(int argc, const char **argv)
"ld_rmthit,rmt_hitm,"
"dram_lcl,dram_rmt";

- if (c2c.display == DISPLAY_TOT)
+ if (c2c.display == DISPLAY_TOT_HITM)
sort_str = "tot_hitm";
- else if (c2c.display == DISPLAY_RMT)
+ else if (c2c.display == DISPLAY_RMT_HITM)
sort_str = "rmt_hitm";
- else if (c2c.display == DISPLAY_LCL)
+ else if (c2c.display == DISPLAY_LCL_HITM)
sort_str = "lcl_hitm";

c2c_hists__reinit(&c2c.hists, output_str, sort_str);
--
2.25.1

2022-07-20 18:51:24

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v5 01/17] perf: Add SNOOP_PEER flag to perf mem data struct

Em Sat, Jun 04, 2022 at 12:28:04PM +0800, Leo Yan escreveu:
> From: Ali Saidi <[email protected]>
>
> Add a flag to the perf mem data struct to signal that a request caused a
> cache-to-cache transfer of a line from a peer of the requestor and
> wasn't sourced from a lower cache level. The line being moved from one
> peer cache to another has latency and performance implications. On Arm64
> Neoverse systems the data source can indicate a cache-to-cache transfer
> but not if the line is dirty or clean, so instead of overloading HITM
> define a new flag that indicates this type of transfer.
>
> Signed-off-by: Ali Saidi <[email protected]>
> Reviewed-by: Leo Yan <[email protected]>
> Reviewed-by: Kajol Jain<[email protected]>

Hey, any knews about this going upstream? PeterZ?

- Arnaldo

> ---
> include/uapi/linux/perf_event.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index d37629dbad72..7b88bfd097dc 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -1310,7 +1310,7 @@ union perf_mem_data_src {
> #define PERF_MEM_SNOOP_SHIFT 19
>
> #define PERF_MEM_SNOOPX_FWD 0x01 /* forward */
> -/* 1 free */
> +#define PERF_MEM_SNOOPX_PEER 0x02 /* xfer from peer */
> #define PERF_MEM_SNOOPX_SHIFT 38
>
> /* locked instruction */
> --
> 2.25.1

--

- Arnaldo

2022-07-20 18:53:37

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v5 01/17] perf: Add SNOOP_PEER flag to perf mem data struct

Em Wed, Jul 20, 2022 at 03:45:51PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Sat, Jun 04, 2022 at 12:28:04PM +0800, Leo Yan escreveu:
> > From: Ali Saidi <[email protected]>
> >
> > Add a flag to the perf mem data struct to signal that a request caused a
> > cache-to-cache transfer of a line from a peer of the requestor and
> > wasn't sourced from a lower cache level. The line being moved from one
> > peer cache to another has latency and performance implications. On Arm64
> > Neoverse systems the data source can indicate a cache-to-cache transfer
> > but not if the line is dirty or clean, so instead of overloading HITM
> > define a new flag that indicates this type of transfer.
> >
> > Signed-off-by: Ali Saidi <[email protected]>
> > Reviewed-by: Leo Yan <[email protected]>
> > Reviewed-by: Kajol Jain<[email protected]>
>
> Hey, any knews about this going upstream? PeterZ?

Just took a look and it isn't in tip/master.

- Arnaldo

> > ---
> > include/uapi/linux/perf_event.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index d37629dbad72..7b88bfd097dc 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -1310,7 +1310,7 @@ union perf_mem_data_src {
> > #define PERF_MEM_SNOOP_SHIFT 19
> >
> > #define PERF_MEM_SNOOPX_FWD 0x01 /* forward */
> > -/* 1 free */
> > +#define PERF_MEM_SNOOPX_PEER 0x02 /* xfer from peer */
> > #define PERF_MEM_SNOOPX_SHIFT 38
> >
> > /* locked instruction */
> > --
> > 2.25.1
>
> --
>
> - Arnaldo

--

- Arnaldo

2022-07-21 00:31:00

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v5 01/17] perf: Add SNOOP_PEER flag to perf mem data struct

On Wed, Jul 20, 2022 at 03:46:49PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jul 20, 2022 at 03:45:51PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Sat, Jun 04, 2022 at 12:28:04PM +0800, Leo Yan escreveu:
> > > From: Ali Saidi <[email protected]>
> > >
> > > Add a flag to the perf mem data struct to signal that a request caused a
> > > cache-to-cache transfer of a line from a peer of the requestor and
> > > wasn't sourced from a lower cache level. The line being moved from one
> > > peer cache to another has latency and performance implications. On Arm64
> > > Neoverse systems the data source can indicate a cache-to-cache transfer
> > > but not if the line is dirty or clean, so instead of overloading HITM
> > > define a new flag that indicates this type of transfer.
> > >
> > > Signed-off-by: Ali Saidi <[email protected]>
> > > Reviewed-by: Leo Yan <[email protected]>
> > > Reviewed-by: Kajol Jain<[email protected]>
> >
> > Hey, any knews about this going upstream? PeterZ?
>
> Just took a look and it isn't in tip/master.

Yeah, this patch is not picked by maintainers.

I confirmed that this patch can be applied cleanly on tip/master
branch. Peter.Z, could you pick this patch?

Thanks,
Leo

2022-08-10 14:16:40

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v5 00/17] perf c2c: Support data source and display for Arm64

Em Sat, Jun 04, 2022 at 12:28:03PM +0800, Leo Yan escreveu:
> Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> us to detect cache line contention and transfers.
>
> This patch set includes Ali's patch set v9 "perf: arm-spe: Decode SPE
> source and use for perf c2c" [1] and rebased on the latest perf core
> banch with latest commit 1bcca2b1bd67 ("perf vendor events intel:
> Update metrics for Alderlake").
>
> Patches 01-05 comes from Ali's patch set to support data source for Arm
> SPE for neoverse cores.

Leo, please remove touching the kernel perf_event.h on the first patch,
I see it doesn't affect the kernel right now as it is done just from
synthesizing perf records from hw trace data, and we haven't received
any review comment from Peter Zijlstra (I think he is in vacations).

Also please refresh it:

⬢[acme@toolbox perf]$ git am ./v5_20220604_leo_yan_perf_c2c_support_data_source_and_display_for_arm64.mbx
Applying: perf: Add SNOOP_PEER flag to perf mem data struct
Applying: perf tools: sync addition of PERF_MEM_SNOOPX_PEER
Applying: perf mem: Print snoop peer flag
Applying: perf arm-spe: Don't set data source if it's not a memory operation
error: patch failed: tools/perf/util/arm-spe.c:387
error: tools/perf/util/arm-spe.c: patch does not apply
Patch failed at 0004 perf arm-spe: Don't set data source if it's not a memory operation
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
⬢[acme@toolbox perf]$

- Arnaldo

> Patches 06-17 are patches from patch set v4 for support perf c2c peer
> display for Arm64 [2].
>
> This patch set has been verified for both x86 perf memory events and Arm
> SPE events.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
> [2] https://lore.kernel.org/lkml/[email protected]/
>
> Changes from v4:
> * Included Ali's patch set for adding data source in Arm SPE samples;
> * Added Ian's ACK and Ali's review and test tags;
> * Update document for the default peer dispaly for Arm64 (Ali).
>
> Changes from v3:
> * Changed to display remote and local peer accesses (Joe);
> * Fixed the usage info for display types (Joe);
> * Do not display HITM dimensions when use 'peer' display, and HITM
> display doesn't show any 'peer' dimensions (James);
> * Split to smaller patches for adding dimensions of peer operations;
> * Updated documentation to reflect the latest GUI and stdio.
>
> Changes from v2:
> * Updated patch 04 to account metrics for both cache level and ld_peer
> for PEER flag;
> * Updated document for metric 'rmt_hit' which is accounted for all
> remote accesses (include remote DRAM and any upward caches).
>
> Changes from v1:
> * Updated patches 01, 02 and 03 to support 'N/A' metrics for store
> operations, so can align with the patch set [1] for store samples.
>
>
> Ali Saidi (3):
> perf: Add SNOOP_PEER flag to perf mem data struct
> perf tools: sync addition of PERF_MEM_SNOOPX_PEER
> perf arm-spe: Use SPE data source for neoverse cores
>
> Leo Yan (14):
> perf mem: Print snoop peer flag
> perf arm-spe: Don't set data source if it's not a memory operation
> perf mem: Add statistics for peer snooping
> perf c2c: Output statistics for peer snooping
> perf c2c: Add dimensions for peer load operations
> perf c2c: Add dimensions of peer metrics for cache line view
> perf c2c: Add mean dimensions for peer operations
> perf c2c: Use explicit names for display macros
> perf c2c: Rename dimension from 'percent_hitm' to
> 'percent_costly_snoop'
> perf c2c: Refactor node header
> perf c2c: Refactor display string
> perf c2c: Sort on peer snooping for load operations
> perf c2c: Use 'peer' as default display for Arm64
> perf c2c: Update documentation for new display option 'peer'
>
> include/uapi/linux/perf_event.h | 2 +-
> tools/include/uapi/linux/perf_event.h | 2 +-
> tools/perf/Documentation/perf-c2c.txt | 31 +-
> tools/perf/builtin-c2c.c | 454 ++++++++++++++----
> .../util/arm-spe-decoder/arm-spe-decoder.c | 1 +
> .../util/arm-spe-decoder/arm-spe-decoder.h | 12 +
> tools/perf/util/arm-spe.c | 140 +++++-
> tools/perf/util/mem-events.c | 46 +-
> tools/perf/util/mem-events.h | 3 +
> 9 files changed, 550 insertions(+), 141 deletions(-)
>
> --
> 2.25.1

--

- Arnaldo

2022-08-11 07:05:31

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v5 00/17] perf c2c: Support data source and display for Arm64

Hi Arnaldo,

On Wed, Aug 10, 2022 at 10:37:32AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Sat, Jun 04, 2022 at 12:28:03PM +0800, Leo Yan escreveu:
> > Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> > us to detect cache line contention and transfers.
> >
> > This patch set includes Ali's patch set v9 "perf: arm-spe: Decode SPE
> > source and use for perf c2c" [1] and rebased on the latest perf core
> > banch with latest commit 1bcca2b1bd67 ("perf vendor events intel:
> > Update metrics for Alderlake").
> >
> > Patches 01-05 comes from Ali's patch set to support data source for Arm
> > SPE for neoverse cores.
>
> Leo, please remove touching the kernel perf_event.h on the first patch,
> I see it doesn't affect the kernel right now as it is done just from
> synthesizing perf records from hw trace data, and we haven't received
> any review comment from Peter Zijlstra (I think he is in vacations).

Done! The new patch set іs in below link, which dropped the patch for
kernel perf_event.h:
https://lore.kernel.org/lkml/[email protected]/

One question: should I later continue to upstream the first patch for
syncing the kernel header perf_event.h after Peter.Z comes back?


> Also please refresh it:
>
> ⬢[acme@toolbox perf]$ git am ./v5_20220604_leo_yan_perf_c2c_support_data_source_and_display_for_arm64.mbx
> Applying: perf: Add SNOOP_PEER flag to perf mem data struct
> Applying: perf tools: sync addition of PERF_MEM_SNOOPX_PEER
> Applying: perf mem: Print snoop peer flag
> Applying: perf arm-spe: Don't set data source if it's not a memory operation
> error: patch failed: tools/perf/util/arm-spe.c:387
> error: tools/perf/util/arm-spe.c: patch does not apply
> Patch failed at 0004 perf arm-spe: Don't set data source if it's not a memory operation
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
> ⬢[acme@toolbox perf]$

To fix the merging conflict in the new patch set, I also dropped the
patch "perf arm-spe: Don't set data source if it's not a memory
operation", since this patch has been merged into the mainline kernel.

Note, when verified the patch set, I found a compilation error, so I
sent a separate patch to fix it:
https://lore.kernel.org/lkml/[email protected]/

Thanks a lot for continuous tracking this series.

Leo

2022-08-12 13:15:22

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v5 00/17] perf c2c: Support data source and display for Arm64

Em Thu, Aug 11, 2022 at 02:41:22PM +0800, Leo Yan escreveu:
> Hi Arnaldo,
>
> On Wed, Aug 10, 2022 at 10:37:32AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Sat, Jun 04, 2022 at 12:28:03PM +0800, Leo Yan escreveu:
> > > Arm64 Neoverse CPUs supports data source in Arm SPE trace, this allows
> > > us to detect cache line contention and transfers.
> > >
> > > This patch set includes Ali's patch set v9 "perf: arm-spe: Decode SPE
> > > source and use for perf c2c" [1] and rebased on the latest perf core
> > > banch with latest commit 1bcca2b1bd67 ("perf vendor events intel:
> > > Update metrics for Alderlake").
> > >
> > > Patches 01-05 comes from Ali's patch set to support data source for Arm
> > > SPE for neoverse cores.
> >
> > Leo, please remove touching the kernel perf_event.h on the first patch,
> > I see it doesn't affect the kernel right now as it is done just from
> > synthesizing perf records from hw trace data, and we haven't received
> > any review comment from Peter Zijlstra (I think he is in vacations).
>
> Done! The new patch set іs in below link, which dropped the patch for
> kernel perf_event.h:
> https://lore.kernel.org/lkml/[email protected]/
>
> One question: should I later continue to upstream the first patch for
> syncing the kernel header perf_event.h after Peter.Z comes back?

yes, and we may have to backtrack and find some other way to implement
this if he is opposed, as he in the past didn't like
perf_event_attr.type namespace being used by userspace only records such
as PERF_RECORD_FINISHED_ROUND, PERF_RECORD_COMPRESSED, etc.

In this case its different, I think its ok as we already have
PERF_MEM_SNOOPX_FWD and PERF_MEM_SNOOPX_PEER probably will be emitted by
the some of the architectures, from the kernel, right?

- Arnaldo

> > Also please refresh it:
> >
> > ⬢[acme@toolbox perf]$ git am ./v5_20220604_leo_yan_perf_c2c_support_data_source_and_display_for_arm64.mbx
> > Applying: perf: Add SNOOP_PEER flag to perf mem data struct
> > Applying: perf tools: sync addition of PERF_MEM_SNOOPX_PEER
> > Applying: perf mem: Print snoop peer flag
> > Applying: perf arm-spe: Don't set data source if it's not a memory operation
> > error: patch failed: tools/perf/util/arm-spe.c:387
> > error: tools/perf/util/arm-spe.c: patch does not apply
> > Patch failed at 0004 perf arm-spe: Don't set data source if it's not a memory operation
> > hint: Use 'git am --show-current-patch=diff' to see the failed patch
> > When you have resolved this problem, run "git am --continue".
> > If you prefer to skip this patch, run "git am --skip" instead.
> > To restore the original branch and stop patching, run "git am --abort".
> > ⬢[acme@toolbox perf]$
>
> To fix the merging conflict in the new patch set, I also dropped the
> patch "perf arm-spe: Don't set data source if it's not a memory
> operation", since this patch has been merged into the mainline kernel.

Thanks.

> Note, when verified the patch set, I found a compilation error, so I
> sent a separate patch to fix it:
> https://lore.kernel.org/lkml/[email protected]/

I think we have a good enough bandaid in place, with that arm-spe.o only
additional -I it builds on all build containers I have, the ones failing
are for unrelated reasons:

[perfbuilder@five ~]$ export BUILD_TARBALL=http://192.168.86.14/perf/perf-5.19.0.tar.xz
[perfbuilder@five ~]$ time dm
1 131.61 almalinux:8 : Ok gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4) , clang version 12.0.1 (Red Hat 12.0.1-4.module_el8.5.0+1025+93159d6c)
2 124.99 almalinux:9 : Ok gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9) , clang version 13.0.1 (Red Hat 13.0.1-1.el9)
3 29.18 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0 , clang version 13.0.1 (Red Hat 13.0.1-1.el9)
4 124.11 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0 , Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
5 133.59 alpine:3.11 : Ok gcc (Alpine 9.3.0) 9.3.0 , Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0)
6 128.81 alpine:3.12 : Ok gcc (Alpine 9.3.0) 9.3.0 , Alpine clang version 10.0.0 (https://gitlab.alpinelinux.org/alpine/aports.git 7445adce501f8473efdb93b17b5eaf2f1445ed4c)
7 134.70 alpine:3.13 : Ok gcc (Alpine 10.2.1_pre1) 10.2.1 20201203 , Alpine clang version 10.0.1
8 135.21 alpine:3.14 : Ok gcc (Alpine 10.3.1_git20210424) 10.3.1 20210424 , Alpine clang version 11.1.0
9 137.62 alpine:3.15 : Ok gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027 , Alpine clang version 12.0.1
10 127.40 alpine:3.16 : Ok gcc (Alpine 11.2.1_git20220219) 11.2.1 20220219 , Alpine clang version 13.0.1
11 129.80 alpine:edge : Ok gcc (Alpine 11.2.1_git20220219) 11.2.1 20220219 , Alpine clang version 14.0.6
12 25.85 alt:p8 : Ok x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1) , Alpine clang version 14.0.6
13 83.87 alt:p9 : Ok x86_64-alt-linux-gcc (GCC) 8.4.1 20200305 (ALT p9 8.4.1-alt0.p9.1) , clang version 10.0.0
14 95.90 alt:p10 : Ok x86_64-alt-linux-gcc (GCC) 10.3.1 20210703 (ALT Sisyphus 10.3.1-alt2) , clang version 11.0.1
15 96.50 alt:sisyphus : FAIL gcc version 12.1.1 20220518 (ALT Sisyphus 12.1.1-alt1) (GCC)
/usr/lib/llvm-13.0/include/clang/AST/DeclBase.h: In instantiation of 'void clang::DeclContext::filtered_decl_iterator<SpecificDecl, Acceptable>::SkipToNextDecl() [with SpecificDecl = clang::ObjCPropertyDecl; bool (SpecificDecl::* Acceptable)() const = &clang::ObjCPropertyDecl::isInstanceProperty]':
16 108.04 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-15) , clang version 11.1.0 (Amazon Linux 2 11.1.0-1.amzn2.0.2)
17 130.10 amazonlinux:devel : Ok gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2) , clang version 12.0.1 (Fedora 12.0.1-1.amzn2022)
18 138.10 archlinux:base : Ok gcc (GCC) 12.1.1 20220730 , clang version 14.0.6
19 107.93 centos:8 : Ok gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1) , clang version 11.0.1 (Red Hat 11.0.1-1.module_el8.4.0+966+2995ef20)
20 117.97 centos:stream : Ok gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-15) , clang version 14.0.0 (Red Hat 14.0.0-1.module_el8.7.0+1142+5343df54)
21 35.30 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 12.1.1 20220803 releases/gcc-12.1.0-322-g3df2f03587 , clang version 14.0.0 (Red Hat 14.0.0-1.module_el8.7.0+1142+5343df54)
22 22.85 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 , clang version 14.0.0 (Red Hat 14.0.0-1.module_el8.7.0+1142+5343df54)
23 89.39 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0 , Debian clang version 11.0.1-2~deb10u1
24 107.93 debian:11 : Ok gcc (Debian 10.2.1-6) 10.2.1 20210110 , Debian clang version 11.0.1-2
25 129.52 debian:experimental : Ok gcc (Debian 12.1.0-7) 12.1.0 , Debian clang version 14.0.6-2
26 26.47 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 11.3.0-3) 11.3.0
27 22.25 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 11.2.0-18) 11.2.0
28 23.86 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 10.2.1-6) 10.2.1 20210110
29 24.56 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 11.2.0-18) 11.2.0
30 24.75 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
31 26.36 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
32 19.74 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
33 26.27 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)
34 27.17 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2)
35 27.67 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6)
36 111.72 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2) , clang version 6.0.1 (tags/RELEASE_601/final)
37 117.14 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2) , clang version 7.0.1 (Fedora 7.0.1-6.fc29)
38 121.34 fedora:30 : Ok gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2) , clang version 8.0.0 (Fedora 8.0.0-3.fc30)
39 121.06 fedora:31 : Ok gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2) , clang version 9.0.1 (Fedora 9.0.1-4.fc31)
40 103.71 fedora:32 : Ok gcc (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1) , clang version 10.0.1 (Fedora 10.0.1-3.fc32)
41 124.68 fedora:33 : Ok gcc (GCC) 10.3.1 20210422 (Red Hat 10.3.1-1) , clang version 11.0.0 (Fedora 11.0.0-3.fc33)
42 134.50 fedora:34 : Ok gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2) , clang version 12.0.1 (Fedora 12.0.1-1.fc34)
43 21.65 fedora:34-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225
44 19.84 fedora:34-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225
45 139.75 fedora:35 : Ok gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2) , clang version 13.0.0 (Fedora 13.0.0-3.fc35)
46 140.06 fedora:36 : Ok gcc (GCC) 12.1.1 20220507 (Red Hat 12.1.1-1) , clang version 14.0.0 (Fedora 14.0.0-1.fc36)
47 140.95 fedora:37 : Ok gcc (GCC) 12.1.1 20220628 (Red Hat 12.1.1-3) , clang version 14.0.5 (Fedora 14.0.5-5.fc37)
48 140.23 fedora:rawhide : Ok gcc (GCC) 12.1.1 20220628 (Red Hat 12.1.1-3) , clang version 14.0.5 (Fedora 14.0.5-5.fc37)
49 116.96 gentoo-stage3:latest : Ok gcc (Gentoo 11.2.0 p1) 11.2.0 , clang version 13.0.0
50 30.88 mageia:7 : Ok gcc (Mageia 8.4.0-1.mga7) 8.4.0 , clang version 13.0.0
51 10.61 mageia:8 : FAIL gcc version 10.4.0 (Mageia 10.4.0-3.mga8)
ImportError: No module named setuptools
cp: cannot stat '/tmp/build/perf/python_ext_build/lib/perf*.so': No such file or directory
52 131.50 manjaro:base : Ok gcc (GCC) 11.1.0 , clang version 13.0.0
53 7.18 openmandriva:4.2 : FAIL gcc version 11.2.0 20210728 (OpenMandriva) (GCC)
In file included from builtin-bench.c:22:
bench/bench.h:68:19: error: conflicting types for 'pthread_attr_setaffinity_np'; have 'int(pthread_attr_t *, size_t, cpu_set_t *)' {aka 'int(pthread_attr_t *, long unsigned int, cpu_set_t *)'}
68 | static inline int pthread_attr_setaffinity_np(pthread_attr_t *attr __maybe_unused,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
54 7.08 openmandriva:cooker : FAIL gcc version 11.2.0 20210728 (OpenMandriva) (GCC)
In file included from builtin-bench.c:22:
bench/bench.h:68:19: error: conflicting types for 'pthread_attr_setaffinity_np'; have 'int(pthread_attr_t *, size_t, cpu_set_t *)' {aka 'int(pthread_attr_t *, long unsigned int, cpu_set_t *)'}
68 | static inline int pthread_attr_setaffinity_np(pthread_attr_t *attr __maybe_unused,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
55 41.21 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407] , OpenMandriva 13.0.0-1 clang version 13.0.0 (/builddir/build/BUILD/llvm-project-13.0.0.src/clang 69c9d9094dd9c820a6ba8cad88f5901643d8f257)
56 137.78 opensuse:15.1 : Ok gcc (SUSE Linux) 7.5.0 , clang version 7.0.1 (tags/RELEASE_701/final 349238)
57 131.97 opensuse:15.2 : Ok gcc (SUSE Linux) 7.5.0 , clang version 9.0.1
58 150.03 opensuse:15.3 : Ok gcc (SUSE Linux) 7.5.0 , clang version 11.0.1
59 152.63 opensuse:15.4 : Ok gcc (SUSE Linux) 7.5.0 , clang version 13.0.1
60 181.35 opensuse:tumbleweed : Ok gcc (SUSE Linux) 12.1.1 20220629 [revision 7811663964aa7e31c3939b859bbfa2e16919639f] , clang version 14.0.6
61 124.89 oraclelinux:8 : Ok gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4.0.1) , clang version 12.0.1 (Red Hat 12.0.1-4.0.1.module+el8.5.0+20428+2b4ecd47)
62 118.96 oraclelinux:9 : Ok gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9.4.0.2) , clang version 13.0.1 (Red Hat 13.0.1-1.0.1.el9)
63 116.66 rockylinux:8 : Ok gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10) , clang version 13.0.1 (Red Hat 13.0.1-2.module+el8.6.0+987+d36ea6a1)
64 121.77 rockylinux:9 : Ok gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9) , clang version 13.0.1 (Red Hat 13.0.1-1.el9)
65 24.75 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 , clang version 13.0.1 (Red Hat 13.0.1-1.el9)
66 20.94 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
67 20.75 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
68 20.95 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
69 20.85 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
70 98.89 ubuntu:18.04 : Ok gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 , clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
71 22.45 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
72 22.55 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
73 18.33 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
74 21.75 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
75 23.36 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
76 23.36 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
77 110.06 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
78 20.55 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
79 21.56 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
80 20.14 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
81 30.08 ubuntu:20.04 : Ok gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
82 24.37 ubuntu:20.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
83 121.26 ubuntu:21.04 : Ok gcc (Ubuntu 10.3.0-1ubuntu1) 10.3.0 , Ubuntu clang version 12.0.0-3ubuntu1~21.04.2
84 123.37 ubuntu:21.10 : Ok gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0 , Ubuntu clang version 13.0.0-2
85 141.34 ubuntu:22.04 : Ok gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0 , Ubuntu clang version 14.0.0-1ubuntu1
86 131.33 ubuntu:22.10 : Ok gcc (Ubuntu 11.3.0-5ubuntu1) 11.3.0 , Ubuntu clang version 14.0.6-2
BUILD_TARBALL_HEAD=e754dd7e8be86e1adc9d4d13fb1105b848c11752
87 6976.47

real 117m55.077s
user 1m18.659s
sys 0m53.139s
[perfbuilder@five ~]$



> Thanks a lot for continuous tracking this series.

You're welcome.

- Arnaldo

2022-08-12 15:41:24

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v5 00/17] perf c2c: Support data source and display for Arm64

On Fri, Aug 12, 2022 at 09:43:07AM -0300, Arnaldo Carvalho de Melo wrote:

[...]

> > One question: should I later continue to upstream the first patch for
> > syncing the kernel header perf_event.h after Peter.Z comes back?
>
> yes, and we may have to backtrack and find some other way to implement
> this if he is opposed, as he in the past didn't like
> perf_event_attr.type namespace being used by userspace only records such
> as PERF_RECORD_FINISHED_ROUND, PERF_RECORD_COMPRESSED, etc.
>
> In this case its different, I think its ok as we already have
> PERF_MEM_SNOOPX_FWD and PERF_MEM_SNOOPX_PEER probably will be emitted by
> the some of the architectures, from the kernel, right?

Yes, as I know x86 generates memory samples from kernel, and SNOOPX_PEER
can be a useful snooping flag for other archs.

As a last resort if SNOOPX_PEER is rejected, we can rollback to use
existed flag (like reusing PERF_MEM_SNOOPX_FWD), though this would be
ambiguous for expressing the memory operations on Arm64.

Thanks,
Leo