2022-04-27 16:21:08

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 00/11] perf c2c: Support display for Arm64

Arm64 Neoverse CPUs can deliver data source in Arm SPE trace, this
allows us to detect like cache line contention and transfers.

There have two main differences between x86 arch and Arm64 arch for the
data source:

The data source in Arm64 Neoverse CPUs cannot provide the cache level
info for store operations, so this is why this patch set starts to
support 'any level' for store operations when we lack the exact cache
level info.

Another difference is Arm SPE trace data cannot provide 'HITM' liked
snooping flag, Ali Said has a patch set "perf: arm-spe: Decode SPE
source and use for perf c2c" [1] introducing 'peer' flag and synthesize
memory samples with 'peer' flag. This patch is to finish the second
half to enable 'perf c2c' tool to consume the 'peer' flag, so it adds
an extra display 'peer' mode.

Patches 01, 02 and 03 are to support 'Any Lvl' cache for store
operations.

Patches 04 and 05 adds statistics and dimensions for memory samples with
peer flag.

Patches 06, 07, 08 are for refactoring, it refines the code with more
general naming so this can allow us to easier to extend display modes
but not strictly bound to HITM tags.

Patches 09, 10 and 11 are to extend display 'peer' mode, it also updates
the document and also changes to use 'peer' mode as default mode on
Arm64 arches.

This patch set has been verified for both x86 and Arm64 memory samples.

The display result with x86 memory samples:

=================================================
Shared Data Cache Line Table
=================================================
#
# ----------- Cacheline ---------- Tot ------- Load Hitm ------- Snoop Total Total Total --------- Stores -------- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
# Index Address Node PA cnt Hitm Total LclHitm RmtHitm Peer records Loads Stores L1Hit L1Miss Anylvl FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
# ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
#
0 0x55c8971f0080 0 1967 66.14% 252 252 0 0 6044 3550 2494 2024 470 0 528 2672 78 20 252 0 0 0 0
1 0x55c8971f00c0 0 1 33.86% 129 129 0 0 914 914 0 0 0 0 272 374 52 87 129 0 0 0 0

=================================================
Shared Cache Line Distribution Pareto
=================================================
#
# ----- HITM ----- Snoop ------- Store Refs ------ --------- Data address --------- --------------- cycles --------------- Total cpu Shared
# Num RmtHitm LclHitm Peer L1 Hit L1 Miss Any Lvl Offset Node PA cnt Code address rmt hitm lcl hitm load peer records cnt Symbol Object Source:Line Node
# ..... ....... ....... ....... ....... ....... ....... .................. .... ...... .................. ........ ........ ........ ........ ....... ........ ...................... ................. ....................... ....
#
-------------------------------------------------------------------------------
0 0 252 0 2024 470 0 0x55c8971f0080
-------------------------------------------------------------------------------
0.00% 12.30% 0.00% 0.00% 0.00% 0.00% 0x0 0 1 0x55c8971ed3e9 0 1313 863 0 1222 3 [.] 0x00000000000013e9 false_sharing.exe false_sharing.exe[13e9] 0
0.00% 0.79% 0.00% 90.51% 0.00% 0.00% 0x0 0 1 0x55c8971ed3e2 0 1800 878 0 3029 3 [.] 0x00000000000013e2 false_sharing.exe false_sharing.exe[13e2] 0
0.00% 0.00% 0.00% 9.49% 100.00% 0.00% 0x0 0 1 0x55c8971ed3f4 0 0 0 0 662 3 [.] 0x00000000000013f4 false_sharing.exe false_sharing.exe[13f4] 0
0.00% 86.90% 0.00% 0.00% 0.00% 0.00% 0x20 0 1 0x55c8971ed447 0 141 103 0 1131 2 [.] 0x0000000000001447 false_sharing.exe false_sharing.exe[1447] 0

-------------------------------------------------------------------------------
1 0 129 0 0 0 0 0x55c8971f00c0
-------------------------------------------------------------------------------
0.00% 100.00% 0.00% 0.00% 0.00% 0.00% 0x20 0 1 0x55c8971ed455 0 88 94 0 914 2 [.] 0x0000000000001455 false_sharing.exe false_sharing.exe[1455] 0


The display result with Arm SPE memory samples:

=================================================
Shared Data Cache Line Table
=================================================
#
# ----------- Cacheline ---------- Snoop ------- Load Hitm ------- Snoop Total Total Total --------- Stores -------- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
# Index Address Node PA cnt Peer Total LclHitm RmtHitm Peer records Loads Stores L1Hit L1Miss Anylvl FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
# ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
#
0 0xaaaac17d6000 N/A 0 100.00% 0 0 0 99 18851 18851 0 0 0 0 0 18752 0 0 0 0 0 0 0

=================================================
Shared Cache Line Distribution Pareto
=================================================
#
# ----- HITM ----- Snoop ------- Store Refs ------ --------- Data address --------- --------------- cycles --------------- Total cpu Shared
# Num RmtHitm LclHitm Peer L1 Hit L1 Miss Any Lvl Offset Node PA cnt Code address rmt hitm lcl hitm load peer records cnt Symbol Object Source:Line Node
# ..... ....... ....... ....... ....... ....... ....... .................. .... ...... .................. ........ ........ ........ ........ ....... ........ ...................... ................ ............... ....
#
-------------------------------------------------------------------------------
0 0 0 99 0 0 0 0xaaaac17d6000
-------------------------------------------------------------------------------
0.00% 0.00% 6.06% 0.00% 0.00% 0.00% 0x20 N/A 0 0xaaaac17c25ac 0 0 43 375 18469 2 [.] 0x00000000000025ac memstress memstress[25ac] 0
0.00% 0.00% 93.94% 0.00% 0.00% 0.00% 0x29 N/A 0 0xaaaac17c3e88 0 0 173 180 135 2 [.] 0x0000000000003e88 memstress memstress[3e88] 0

[1] https://lore.kernel.org/lkml/[email protected]/


Leo Yan (11):
perf mem: Add any cache level statistics for store operation
perf c2c: Add dimensions for 'anylvl' metrics of store operation
perf c2c: Update documentation for store metric 'Any Lvl'
perf mem: Add statistics for peer snooping
perf c2c: Add dimensions for peer load operations
perf c2c: Use explicit names for display macros
perf c2c: Rename dimension from 'percent_hitm' to
'percent_costly_snoop'
perf c2c: Refactor node header
perf c2c: Sort on peer snooping for load operations
perf c2c: Update documentation for new display option 'peer'
perf c2c: Use 'peer' as default display for Arm64

tools/perf/Documentation/perf-c2c.txt | 30 ++-
tools/perf/builtin-c2c.c | 363 ++++++++++++++++++++------
tools/perf/util/mem-events.c | 15 +-
tools/perf/util/mem-events.h | 2 +
4 files changed, 324 insertions(+), 86 deletions(-)

--
2.25.1


2022-04-27 16:21:34

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 07/11] perf c2c: Rename dimension from 'percent_hitm' to 'percent_costly_snoop'

Use more general naming for the main sort dimension, this can allow us
not to sort only on HITM snoop type, so it can be extended to support
other costly snooping operations. So rename the dimension to the prefix
'percent_costly_".

Signed-off-by: Leo Yan <[email protected]>
Tested-by: Ali Saidi <[email protected]>
---
tools/perf/builtin-c2c.c | 40 ++++++++++++++++++++--------------------
1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index e4697cdbdfc2..b90696ebfbc9 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -794,7 +794,7 @@ percent_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
return hpp_color_scnprintf(hpp, "%*.2f%%", width - 1, per);
}

-static double percent_hitm(struct c2c_hist_entry *c2c_he)
+static double percent_costly_snoop(struct c2c_hist_entry *c2c_he)
{
struct c2c_hists *hists;
struct c2c_stats *stats;
@@ -834,8 +834,8 @@ static double percent_hitm(struct c2c_hist_entry *c2c_he)
})

static int
-percent_hitm_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
- struct hist_entry *he)
+percent_costly_snoop_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
{
struct c2c_hist_entry *c2c_he;
int width = c2c_width(fmt, hpp, he->hists);
@@ -843,20 +843,20 @@ percent_hitm_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
double per;

c2c_he = container_of(he, struct c2c_hist_entry, he);
- per = percent_hitm(c2c_he);
+ per = percent_costly_snoop(c2c_he);
return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
}

static int
-percent_hitm_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
- struct hist_entry *he)
+percent_costly_snoop_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
{
- return percent_color(fmt, hpp, he, percent_hitm);
+ return percent_color(fmt, hpp, he, percent_costly_snoop);
}

static int64_t
-percent_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
- struct hist_entry *left, struct hist_entry *right)
+percent_costly_snoop_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+ struct hist_entry *left, struct hist_entry *right)
{
struct c2c_hist_entry *c2c_left;
struct c2c_hist_entry *c2c_right;
@@ -866,8 +866,8 @@ percent_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
c2c_left = container_of(left, struct c2c_hist_entry, he);
c2c_right = container_of(right, struct c2c_hist_entry, he);

- per_left = percent_hitm(c2c_left);
- per_right = percent_hitm(c2c_right);
+ per_left = percent_costly_snoop(c2c_left);
+ per_right = percent_costly_snoop(c2c_right);

return per_left - per_right;
}
@@ -1544,17 +1544,17 @@ static struct c2c_dimension dim_tot_loads = {
.width = 7,
};

-static struct c2c_header percent_hitm_header[] = {
+static struct c2c_header percent_costly_snoop_header[] = {
[DISPLAY_LCL_HITM] = HEADER_BOTH("Lcl", "Hitm"),
[DISPLAY_RMT_HITM] = HEADER_BOTH("Rmt", "Hitm"),
[DISPLAY_TOT_HITM] = HEADER_BOTH("Tot", "Hitm"),
};

-static struct c2c_dimension dim_percent_hitm = {
- .name = "percent_hitm",
- .cmp = percent_hitm_cmp,
- .entry = percent_hitm_entry,
- .color = percent_hitm_color,
+static struct c2c_dimension dim_percent_costly_snoop = {
+ .name = "percent_costly_snoop",
+ .cmp = percent_costly_snoop_cmp,
+ .entry = percent_costly_snoop_entry,
+ .color = percent_costly_snoop_color,
.width = 7,
};

@@ -1763,7 +1763,7 @@ static struct c2c_dimension *dimensions[] = {
&dim_ld_rmthit,
&dim_tot_recs,
&dim_tot_loads,
- &dim_percent_hitm,
+ &dim_percent_costly_snoop,
&dim_percent_rmt_hitm,
&dim_percent_lcl_hitm,
&dim_percent_ld_peer,
@@ -2665,7 +2665,7 @@ static int ui_quirks(void)
nodestr = "CL";
}

- dim_percent_hitm.header = percent_hitm_header[c2c.display];
+ dim_percent_costly_snoop.header = percent_costly_snoop_header[c2c.display];

/* Fix the zero line for dcacheline column. */
buf = fill_line("Cacheline", dim_dcacheline.width +
@@ -2993,7 +2993,7 @@ static int perf_c2c__report(int argc, const char **argv)
"dcacheline,"
"dcacheline_node,"
"dcacheline_count,"
- "percent_hitm,"
+ "percent_costly_snoop,"
"tot_hitm,lcl_hitm,rmt_hitm,"
"ld_peer,"
"tot_recs,"
--
2.25.1

2022-04-27 16:21:43

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 06/11] perf c2c: Use explicit names for display macros

Perf c2c tool has an assumption that it heavily depends on HITM snoop
type to detect cache false sharing, unfortunately, HITM is not supported
on some architectures.

Essentially, perf c2c tool wants to find some very costly snooping
operations for false cache sharing, this means it's not necessarily
to stick using HITM tags and we can explore other snooping types
(e.g. SNOOPX_PEER).

For this reason, this patch renames HITM related display macros with
suffix '_HITM', so it can be distinct if later add more display types
for on other snooping type.

Signed-off-by: Leo Yan <[email protected]>
Tested-by: Ali Saidi <[email protected]>
---
tools/perf/builtin-c2c.c | 58 ++++++++++++++++++++--------------------
1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index cef6513012e2..e4697cdbdfc2 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -114,16 +114,16 @@ struct perf_c2c {
};

enum {
- DISPLAY_LCL,
- DISPLAY_RMT,
- DISPLAY_TOT,
+ DISPLAY_LCL_HITM,
+ DISPLAY_RMT_HITM,
+ DISPLAY_TOT_HITM,
DISPLAY_MAX,
};

static const char *display_str[DISPLAY_MAX] = {
- [DISPLAY_LCL] = "Local",
- [DISPLAY_RMT] = "Remote",
- [DISPLAY_TOT] = "Total",
+ [DISPLAY_LCL_HITM] = "Local",
+ [DISPLAY_RMT_HITM] = "Remote",
+ [DISPLAY_TOT_HITM] = "Total",
};

static const struct option c2c_options[] = {
@@ -807,15 +807,15 @@ static double percent_hitm(struct c2c_hist_entry *c2c_he)
total = &hists->stats;

switch (c2c.display) {
- case DISPLAY_RMT:
+ case DISPLAY_RMT_HITM:
st = stats->rmt_hitm;
tot = total->rmt_hitm;
break;
- case DISPLAY_LCL:
+ case DISPLAY_LCL_HITM:
st = stats->lcl_hitm;
tot = total->lcl_hitm;
break;
- case DISPLAY_TOT:
+ case DISPLAY_TOT_HITM:
st = stats->tot_hitm;
tot = total->tot_hitm;
default:
@@ -1181,15 +1181,15 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
advance_hpp(hpp, ret);

switch (c2c.display) {
- case DISPLAY_RMT:
+ case DISPLAY_RMT_HITM:
ret = display_metrics(hpp, stats->rmt_hitm,
c2c_he->stats.rmt_hitm);
break;
- case DISPLAY_LCL:
+ case DISPLAY_LCL_HITM:
ret = display_metrics(hpp, stats->lcl_hitm,
c2c_he->stats.lcl_hitm);
break;
- case DISPLAY_TOT:
+ case DISPLAY_TOT_HITM:
ret = display_metrics(hpp, stats->tot_hitm,
c2c_he->stats.tot_hitm);
break;
@@ -1545,9 +1545,9 @@ static struct c2c_dimension dim_tot_loads = {
};

static struct c2c_header percent_hitm_header[] = {
- [DISPLAY_LCL] = HEADER_BOTH("Lcl", "Hitm"),
- [DISPLAY_RMT] = HEADER_BOTH("Rmt", "Hitm"),
- [DISPLAY_TOT] = HEADER_BOTH("Tot", "Hitm"),
+ [DISPLAY_LCL_HITM] = HEADER_BOTH("Lcl", "Hitm"),
+ [DISPLAY_RMT_HITM] = HEADER_BOTH("Rmt", "Hitm"),
+ [DISPLAY_TOT_HITM] = HEADER_BOTH("Tot", "Hitm"),
};

static struct c2c_dimension dim_percent_hitm = {
@@ -2018,15 +2018,15 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
c2c_he = container_of(he, struct c2c_hist_entry, he);

switch (c2c.display) {
- case DISPLAY_LCL:
+ case DISPLAY_LCL_HITM:
he->filtered = filter_display(c2c_he->stats.lcl_hitm,
stats->lcl_hitm);
break;
- case DISPLAY_RMT:
+ case DISPLAY_RMT_HITM:
he->filtered = filter_display(c2c_he->stats.rmt_hitm,
stats->rmt_hitm);
break;
- case DISPLAY_TOT:
+ case DISPLAY_TOT_HITM:
he->filtered = filter_display(c2c_he->stats.tot_hitm,
stats->tot_hitm);
break;
@@ -2049,13 +2049,13 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
return true;

switch (c2c.display) {
- case DISPLAY_LCL:
+ case DISPLAY_LCL_HITM:
has_record = !!c2c_he->stats.lcl_hitm;
break;
- case DISPLAY_RMT:
+ case DISPLAY_RMT_HITM:
has_record = !!c2c_he->stats.rmt_hitm;
break;
- case DISPLAY_TOT:
+ case DISPLAY_TOT_HITM:
has_record = !!c2c_he->stats.tot_hitm;
break;
default:
@@ -2752,11 +2752,11 @@ static int setup_display(const char *str)
const char *display = str ?: "tot";

if (!strcmp(display, "tot"))
- c2c.display = DISPLAY_TOT;
+ c2c.display = DISPLAY_TOT_HITM;
else if (!strcmp(display, "rmt"))
- c2c.display = DISPLAY_RMT;
+ c2c.display = DISPLAY_RMT_HITM;
else if (!strcmp(display, "lcl"))
- c2c.display = DISPLAY_LCL;
+ c2c.display = DISPLAY_LCL_HITM;
else {
pr_err("failed: unknown display type: %s\n", str);
return -1;
@@ -2846,9 +2846,9 @@ static int setup_coalesce(const char *coalesce, bool no_source)
return -1;

if (asprintf(&c2c.cl_resort, "offset,%s",
- c2c.display == DISPLAY_TOT ?
+ c2c.display == DISPLAY_TOT_HITM ?
"tot_hitm" :
- c2c.display == DISPLAY_RMT ?
+ c2c.display == DISPLAY_RMT_HITM ?
"rmt_hitm,lcl_hitm" :
"lcl_hitm,rmt_hitm") < 0)
return -ENOMEM;
@@ -3005,11 +3005,11 @@ static int perf_c2c__report(int argc, const char **argv)
"ld_rmthit,rmt_hitm,"
"dram_lcl,dram_rmt";

- if (c2c.display == DISPLAY_TOT)
+ if (c2c.display == DISPLAY_TOT_HITM)
sort_str = "tot_hitm";
- else if (c2c.display == DISPLAY_RMT)
+ else if (c2c.display == DISPLAY_RMT_HITM)
sort_str = "rmt_hitm";
- else if (c2c.display == DISPLAY_LCL)
+ else if (c2c.display == DISPLAY_LCL_HITM)
sort_str = "lcl_hitm";

c2c_hists__reinit(&c2c.hists, output_str, sort_str);
--
2.25.1

2022-04-27 16:21:50

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 08/11] perf c2c: Refactor node header

The node header array contains 3 items, each item is used for one of
the 3 flavors for node accessing info. To extend sorting on other
snooping type and not always stick to HITMs, the second header string
"Node{cpus %hitms %stores}" should be adjusted (e.g. it's changed as
"Node{cpus %peer %stores}").

For this reason, this patch changes the node header array to three
flat variables and uses switch-case in function setup_nodes_header(),
thus it is easier for altering the header string.

Signed-off-by: Leo Yan <[email protected]>
Tested-by: Ali Saidi <[email protected]>
---
tools/perf/builtin-c2c.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index b90696ebfbc9..52542cfec80c 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1653,12 +1653,6 @@ static struct c2c_dimension dim_dso = {
.se = &sort_dso,
};

-static struct c2c_header header_node[3] = {
- HEADER_LOW("Node"),
- HEADER_LOW("Node{cpus %hitms %stores}"),
- HEADER_LOW("Node{cpu list}"),
-};
-
static struct c2c_dimension dim_node = {
.name = "node",
.cmp = empty_cmp,
@@ -2146,9 +2140,27 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
return 0;
}

+static struct c2c_header header_node_0 = HEADER_LOW("Node");
+static struct c2c_header header_node_1 = HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_2 = HEADER_LOW("Node{cpu list}");
+
static void setup_nodes_header(void)
{
- dim_node.header = header_node[c2c.node_info];
+ switch (c2c.node_info) {
+ case 0:
+ dim_node.header = header_node_0;
+ break;
+ case 1:
+ dim_node.header = header_node_1;
+ break;
+ case 2:
+ dim_node.header = header_node_2;
+ break;
+ default:
+ break;
+ }
+
+ return;
}

static int setup_nodes(struct perf_session *session)
--
2.25.1

2022-04-27 16:22:00

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 10/11] perf c2c: Update documentation for new display option 'peer'

Since the new display option 'peer' is introduced, this patch is to
update the documentation to reflect it.

Signed-off-by: Leo Yan <[email protected]>
Tested-by: Ali Saidi <[email protected]>
---
tools/perf/Documentation/perf-c2c.txt | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index b39e3f3df272..aa560ce1a192 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -109,7 +109,8 @@ REPORT OPTIONS

-d::
--display::
- Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
+ Switch to HITM type (rmt, lcl) or peer snooping type (peer) to display
+ and sort on. Total HITMs (tot) as default.

--stitch-lbr::
Show callgraph with stitched LBRs, which may have more complete
@@ -174,12 +175,18 @@ For each cacheline in the 1) list we display following data:
Cacheline
- cacheline address (hex number)

- Rmt/Lcl Hitm
+ Rmt/Lcl Hitm (For display with HITM types)
- cacheline percentage of all Remote/Local HITM accesses

+ Snoop Peer (For display with peer type)
+ - cacheline percentage of peer access
+
LLC Load Hitm - Total, LclHitm, RmtHitm
- count of Total/Local/Remote load HITMs

+ Snoop Peer
+ - count of peer access
+
Total records
- sum of all cachelines accesses

@@ -211,6 +218,9 @@ For each offset in the 2) list we display following data:
HITM - Rmt, Lcl
- % of Remote/Local HITM accesses for given offset within cacheline

+ Snoop Peer
+ - % of peer accesses for given offset within cacheline
+
Store Refs - L1 Hit, L1 Miss, Any Lvl
- % of store accesses that hit L1, missed L1 and any cache level for given
offset within cacheline
@@ -227,8 +237,9 @@ For each offset in the 2) list we display following data:
Code address
- code address responsible for the accesses

- cycles - rmt hitm, lcl hitm, load
- - sum of cycles for given accesses - Remote/Local HITM and generic load
+ cycles - rmt hitm, lcl hitm, load, peer
+ - sum of cycles for given accesses - Remote/Local HITM, generic load and
+ peer access

cpu cnt
- number of cpus that participated on the access
@@ -251,7 +262,8 @@ The 'Node' field displays nodes that accesses given cacheline
offset. Its output comes in 3 flavors:
- node IDs separated by ','
- node IDs with stats for each ID, in following format:
- Node{cpus %hitms %stores}
+ Node{cpus %hitms %stores} (For display with HITM types)
+ Node{cpus %peers %stores} (For display with "peer" type)
- node IDs with list of affected CPUs in following format:
Node{cpu list}

--
2.25.1

2022-04-27 16:22:01

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 03/11] perf c2c: Update documentation for store metric 'Any Lvl'

The 'Any Lvl' metric is added for store operations, update documentation
to reflect changes in the report table.

Signed-off-by: Leo Yan <[email protected]>
Tested-by: Ali Saidi <[email protected]>
---
tools/perf/Documentation/perf-c2c.txt | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index 3b6a2c84ea02..b39e3f3df272 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -189,9 +189,10 @@ For each cacheline in the 1) list we display following data:
Total stores
- sum of all store accesses

- Store Reference - L1Hit, L1Miss
+ Store Reference - L1Hit, L1Miss, AnyLvl
L1Hit - store accesses that hit L1
L1Miss - store accesses that missed L1
+ AnyLvl - store accesses which is possible to hit any cache level

Core Load Hit - FB, L1, L2
- count of load hits in FB (Fill Buffer), L1 and L2 cache
@@ -210,8 +211,9 @@ For each offset in the 2) list we display following data:
HITM - Rmt, Lcl
- % of Remote/Local HITM accesses for given offset within cacheline

- Store Refs - L1 Hit, L1 Miss
- - % of store accesses that hit/missed L1 for given offset within cacheline
+ Store Refs - L1 Hit, L1 Miss, Any Lvl
+ - % of store accesses that hit L1, missed L1 and any cache level for given
+ offset within cacheline

Data address - Offset
- offset address
--
2.25.1

2022-04-27 16:22:10

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 09/11] perf c2c: Sort on peer snooping for load operations

Except the existed three display options 'tot', 'rmt', 'lcl', this patch
adds a new option 'peer' so can sort on the cache hit for peer snooping.

For displaying with option 'peer', the "Shared Data Cache Line Table" and
"Shared Cache Line Distribution Pareto" both sort with the metrics
"ld_peer". As result, we can get the 'peer' display as below:

# perf c2c report -d peer --coalesce tid,pid,iaddr,dso -N --stdio

[...]

=================================================
Shared Data Cache Line Table
=================================================
#
# ----------- Cacheline ---------- Snoop ------- Load Hitm ------- Snoop Total Total Total --------- Stores -------- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
# Index Address Node PA cnt Peer Total LclHitm RmtHitm Peer records Loads Stores L1Hit L1Miss Anylvl FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
# ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
#
0 0xaaaac17d6000 N/A 0 100.00% 0 0 0 99 18851 18851 0 0 0 0 0 18752 0 0 0 0 0 0 0

=================================================
Shared Cache Line Distribution Pareto
=================================================
#
# ----- HITM ----- Snoop ------- Store Refs ------ --------- Data address --------- --------------- cycles --------------- Total cpu Shared
# Num RmtHitm LclHitm Peer L1 Hit L1 Miss Any Lvl Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load peer records cnt Symbol Object Source:Line Node{cpus %peers %stores}
# ..... ....... ....... ....... ....... ....... ....... .................. .... ...... ....... ................. .................. ........ ........ ........ ........ ....... ........ ...................... ................ ............... ....
#
-------------------------------------------------------------------------------
0 0 0 99 0 0 0 0xaaaac17d6000
-------------------------------------------------------------------------------
0.00% 0.00% 3.03% 0.00% 0.00% 0.00% 0x20 N/A 0 3603 3603:memstress 0xaaaac17c25ac 0 0 41 376 9314 2 [.] 0x00000000000025ac memstress memstress[25ac] 0{ 2 100.0% n/a}
0.00% 0.00% 3.03% 0.00% 0.00% 0.00% 0x20 N/A 0 3603 3606:memstress 0xaaaac17c25ac 0 0 44 375 9155 1 [.] 0x00000000000025ac memstress memstress[25ac] 0{ 1 100.0% n/a}
0.00% 0.00% 48.48% 0.00% 0.00% 0.00% 0x29 N/A 0 3603 3606:memstress 0xaaaac17c3e88 0 0 170 180 65 1 [.] 0x0000000000003e88 memstress memstress[3e88] 0{ 1 100.0% n/a}
0.00% 0.00% 45.45% 0.00% 0.00% 0.00% 0x29 N/A 0 3603 3603:memstress 0xaaaac17c3e88 0 0 175 180 70 2 [.] 0x0000000000003e88 memstress memstress[3e88] 0{ 2 100.0% n/a}

[...]

Signed-off-by: Leo Yan <[email protected]>
Tested-by: Ali Saidi <[email protected]>
---
tools/perf/builtin-c2c.c | 63 +++++++++++++++++++++++++++++++---------
1 file changed, 49 insertions(+), 14 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 52542cfec80c..bd4516e486c0 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -117,13 +117,15 @@ enum {
DISPLAY_LCL_HITM,
DISPLAY_RMT_HITM,
DISPLAY_TOT_HITM,
+ DISPLAY_SNP_PEER,
DISPLAY_MAX,
};

static const char *display_str[DISPLAY_MAX] = {
- [DISPLAY_LCL_HITM] = "Local",
- [DISPLAY_RMT_HITM] = "Remote",
- [DISPLAY_TOT_HITM] = "Total",
+ [DISPLAY_LCL_HITM] = "Local HITMs",
+ [DISPLAY_RMT_HITM] = "Remote HITMs",
+ [DISPLAY_TOT_HITM] = "Total HITMs",
+ [DISPLAY_SNP_PEER] = "Snoop Peers",
};

static const struct option c2c_options[] = {
@@ -818,6 +820,11 @@ static double percent_costly_snoop(struct c2c_hist_entry *c2c_he)
case DISPLAY_TOT_HITM:
st = stats->tot_hitm;
tot = total->tot_hitm;
+ break;
+ case DISPLAY_SNP_PEER:
+ st = stats->ld_peer;
+ tot = total->ld_peer;
+ break;
default:
break;
}
@@ -1193,6 +1200,10 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
ret = display_metrics(hpp, stats->tot_hitm,
c2c_he->stats.tot_hitm);
break;
+ case DISPLAY_SNP_PEER:
+ ret = display_metrics(hpp, stats->ld_peer,
+ c2c_he->stats.ld_peer);
+ break;
default:
break;
}
@@ -1548,6 +1559,7 @@ static struct c2c_header percent_costly_snoop_header[] = {
[DISPLAY_LCL_HITM] = HEADER_BOTH("Lcl", "Hitm"),
[DISPLAY_RMT_HITM] = HEADER_BOTH("Rmt", "Hitm"),
[DISPLAY_TOT_HITM] = HEADER_BOTH("Tot", "Hitm"),
+ [DISPLAY_SNP_PEER] = HEADER_BOTH("Snoop", "Peer"),
};

static struct c2c_dimension dim_percent_costly_snoop = {
@@ -2024,6 +2036,10 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
he->filtered = filter_display(c2c_he->stats.tot_hitm,
stats->tot_hitm);
break;
+ case DISPLAY_SNP_PEER:
+ he->filtered = filter_display(c2c_he->stats.ld_peer,
+ stats->ld_peer);
+ break;
default:
break;
}
@@ -2052,6 +2068,8 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
case DISPLAY_TOT_HITM:
has_record = !!c2c_he->stats.tot_hitm;
break;
+ case DISPLAY_SNP_PEER:
+ has_record = !!c2c_he->stats.ld_peer;
default:
break;
}
@@ -2141,7 +2159,10 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
}

static struct c2c_header header_node_0 = HEADER_LOW("Node");
-static struct c2c_header header_node_1 = HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_1_hitms_stores =
+ HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_1_peers_stores =
+ HEADER_LOW("Node{cpus %peers %stores}");
static struct c2c_header header_node_2 = HEADER_LOW("Node{cpu list}");

static void setup_nodes_header(void)
@@ -2151,7 +2172,10 @@ static void setup_nodes_header(void)
dim_node.header = header_node_0;
break;
case 1:
- dim_node.header = header_node_1;
+ if (c2c.display == DISPLAY_SNP_PEER)
+ dim_node.header = header_node_1_peers_stores;
+ else
+ dim_node.header = header_node_1_hitms_stores;
break;
case 2:
dim_node.header = header_node_2;
@@ -2225,13 +2249,15 @@ static int setup_nodes(struct perf_session *session)
}

#define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)
+#define HAS_PEER(__h) ((__h)->stats.ld_peer)

static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
{
struct c2c_hist_entry *c2c_he;
c2c_he = container_of(he, struct c2c_hist_entry, he);

- if (HAS_HITMS(c2c_he)) {
+ if ((c2c.display != DISPLAY_SNP_PEER && HAS_HITMS(c2c_he)) ||
+ (c2c.display == DISPLAY_SNP_PEER && HAS_PEER(c2c_he))) {
c2c.shared_clines++;
c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
}
@@ -2406,7 +2432,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
fprintf(out, "%-36s: %s\n", first ? " Events" : "", evsel__name(evsel));
first = false;
}
- fprintf(out, " Cachelines sort on : %s HITMs\n",
+ fprintf(out, " Cachelines sort on : %s\n",
display_str[c2c.display]);
fprintf(out, " Cacheline data grouping : %s\n", c2c.cl_sort);
}
@@ -2563,7 +2589,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
{
scnprintf(bf, size,
"Shared Data Cache Line Table "
- "(%lu entries, sorted on %s HITMs)",
+ "(%lu entries, sorted on %s)",
browser->nr_non_filtered_entries,
display_str[c2c.display]);
return 0;
@@ -2769,6 +2795,8 @@ static int setup_display(const char *str)
c2c.display = DISPLAY_RMT_HITM;
else if (!strcmp(display, "lcl"))
c2c.display = DISPLAY_LCL_HITM;
+ else if (!strcmp(display, "peer"))
+ c2c.display = DISPLAY_SNP_PEER;
else {
pr_err("failed: unknown display type: %s\n", str);
return -1;
@@ -2850,6 +2878,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
static int setup_coalesce(const char *coalesce, bool no_source)
{
const char *c = coalesce ?: coalesce_default;
+ const char *sort_str = NULL;

if (asprintf(&c2c.cl_sort, "offset,%s", c) < 0)
return -ENOMEM;
@@ -2857,12 +2886,16 @@ static int setup_coalesce(const char *coalesce, bool no_source)
if (build_cl_output(c2c.cl_sort, no_source))
return -1;

- if (asprintf(&c2c.cl_resort, "offset,%s",
- c2c.display == DISPLAY_TOT_HITM ?
- "tot_hitm" :
- c2c.display == DISPLAY_RMT_HITM ?
- "rmt_hitm,lcl_hitm" :
- "lcl_hitm,rmt_hitm") < 0)
+ if (c2c.display == DISPLAY_TOT_HITM)
+ sort_str = "tot_hitm";
+ else if (c2c.display == DISPLAY_RMT_HITM)
+ sort_str = "rmt_hitm,lcl_hitm";
+ else if (c2c.display == DISPLAY_LCL_HITM)
+ sort_str = "lcl_hitm,rmt_hitm";
+ else if (c2c.display == DISPLAY_SNP_PEER)
+ sort_str = "ld_peer";
+
+ if (asprintf(&c2c.cl_resort, "offset,%s", sort_str) < 0)
return -ENOMEM;

pr_debug("coalesce sort fields: %s\n", c2c.cl_sort);
@@ -3023,6 +3056,8 @@ static int perf_c2c__report(int argc, const char **argv)
sort_str = "rmt_hitm";
else if (c2c.display == DISPLAY_LCL_HITM)
sort_str = "lcl_hitm";
+ else if (c2c.display == DISPLAY_SNP_PEER)
+ sort_str = "ld_peer";

c2c_hists__reinit(&c2c.hists, output_str, sort_str);

--
2.25.1

2022-04-27 16:22:11

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 05/11] perf c2c: Add dimensions for peer load operations

This patch is to add dimensions for peer load operations, include a
dimension for the total statistics for metric 'ld_peer', and also add
dimensions for the single cache line view.

Same as HTIM metrics, this patch also adds the dimension for mean value
for peer load operations.

Signed-off-by: Leo Yan <[email protected]>
Tested-by: Ali Saidi <[email protected]>
---
tools/perf/builtin-c2c.c | 93 +++++++++++++++++++++++++++++++++++++---
1 file changed, 88 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 9ef439610a2b..cef6513012e2 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -55,6 +55,7 @@ struct c2c_hists {
struct compute_stats {
struct stats lcl_hitm;
struct stats rmt_hitm;
+ struct stats ld_peer;
struct stats load;
};

@@ -154,6 +155,7 @@ static void *c2c_he_zalloc(size_t size)

init_stats(&c2c_he->cstats.lcl_hitm);
init_stats(&c2c_he->cstats.rmt_hitm);
+ init_stats(&c2c_he->cstats.ld_peer);
init_stats(&c2c_he->cstats.load);

return &c2c_he->he;
@@ -253,6 +255,8 @@ static void compute_stats(struct c2c_hist_entry *c2c_he,
update_stats(&cstats->rmt_hitm, weight);
else if (stats->lcl_hitm)
update_stats(&cstats->lcl_hitm, weight);
+ else if (stats->ld_peer)
+ update_stats(&cstats->ld_peer, weight);
else if (stats->load)
update_stats(&cstats->load, weight);
}
@@ -658,6 +662,7 @@ STAT_FN(ld_fbhit)
STAT_FN(ld_l1hit)
STAT_FN(ld_l2hit)
STAT_FN(ld_llchit)
+STAT_FN(ld_peer)
STAT_FN(rmt_hit)

static uint64_t total_records(struct c2c_stats *stats)
@@ -674,7 +679,8 @@ static uint64_t total_records(struct c2c_stats *stats)
stats->ld_l1hit +
stats->ld_l2hit +
stats->ld_llchit +
- stats->lcl_hitm;
+ stats->lcl_hitm +
+ stats->ld_peer;

total = ldcnt +
stats->st_l1hit +
@@ -730,7 +736,8 @@ static uint64_t total_loads(struct c2c_stats *stats)
stats->ld_l1hit +
stats->ld_l2hit +
stats->ld_llchit +
- stats->lcl_hitm;
+ stats->lcl_hitm +
+ stats->ld_peer;

return ldcnt;
}
@@ -899,6 +906,7 @@ static double percent_ ## __f(struct c2c_hist_entry *c2c_he) \

PERCENT_FN(rmt_hitm)
PERCENT_FN(lcl_hitm)
+PERCENT_FN(ld_peer)
PERCENT_FN(st_l1hit)
PERCENT_FN(st_l1miss)
PERCENT_FN(st_anylvl)
@@ -965,6 +973,37 @@ percent_lcl_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
return per_left - per_right;
}

+static int
+percent_ld_peer_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ int width = c2c_width(fmt, hpp, he->hists);
+ double per = PERCENT(he, ld_peer);
+ char buf[10];
+
+ return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_ld_peer_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ return percent_color(fmt, hpp, he, percent_ld_peer);
+}
+
+static int64_t
+percent_ld_peer_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+ struct hist_entry *left, struct hist_entry *right)
+{
+ double per_left;
+ double per_right;
+
+ per_left = PERCENT(left, ld_peer);
+ per_right = PERCENT(right, ld_peer);
+
+ return per_left - per_right;
+}
+
static int
percent_stores_l1hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
struct hist_entry *he)
@@ -1213,6 +1252,7 @@ __func(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp, struct hist_entry *he) \
MEAN_ENTRY(mean_rmt_entry, rmt_hitm);
MEAN_ENTRY(mean_lcl_entry, lcl_hitm);
MEAN_ENTRY(mean_load_entry, load);
+MEAN_ENTRY(mean_peer_entry, ld_peer);

static int
cpucnt_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
@@ -1360,6 +1400,14 @@ static struct c2c_dimension dim_rmt_hitm = {
.width = 7,
};

+static struct c2c_dimension dim_ld_peer = {
+ .header = HEADER_BOTH("Snoop", "Peer"),
+ .name = "ld_peer",
+ .cmp = ld_peer_cmp,
+ .entry = ld_peer_entry,
+ .width = 7,
+};
+
static struct c2c_dimension dim_cl_rmt_hitm = {
.header = HEADER_SPAN("----- HITM -----", "Rmt", 1),
.name = "cl_rmt_hitm",
@@ -1376,6 +1424,14 @@ static struct c2c_dimension dim_cl_lcl_hitm = {
.width = 7,
};

+static struct c2c_dimension dim_cl_ld_peer = {
+ .header = HEADER_BOTH("Snoop", "Peer"),
+ .name = "cl_ld_peer",
+ .cmp = ld_peer_cmp,
+ .entry = ld_peer_entry,
+ .width = 7,
+};
+
static struct c2c_dimension dim_tot_stores = {
.header = HEADER_BOTH("Total", "Stores"),
.name = "tot_stores",
@@ -1520,6 +1576,15 @@ static struct c2c_dimension dim_percent_lcl_hitm = {
.width = 7,
};

+static struct c2c_dimension dim_percent_ld_peer = {
+ .header = HEADER_BOTH("Snoop", "Peer"),
+ .name = "percent_ld_peer",
+ .cmp = percent_ld_peer_cmp,
+ .entry = percent_ld_peer_entry,
+ .color = percent_ld_peer_color,
+ .width = 7,
+};
+
static struct c2c_dimension dim_percent_stores_l1hit = {
.header = HEADER_SPAN("------- Store Refs ------", "L1 Hit", 2),
.name = "percent_stores_l1hit",
@@ -1602,7 +1667,7 @@ static struct c2c_dimension dim_node = {
};

static struct c2c_dimension dim_mean_rmt = {
- .header = HEADER_SPAN("---------- cycles ----------", "rmt hitm", 2),
+ .header = HEADER_SPAN("--------------- cycles ---------------", "rmt hitm", 3),
.name = "mean_rmt",
.cmp = empty_cmp,
.entry = mean_rmt_entry,
@@ -1625,6 +1690,14 @@ static struct c2c_dimension dim_mean_load = {
.width = 8,
};

+static struct c2c_dimension dim_mean_peer = {
+ .header = HEADER_SPAN_LOW("peer"),
+ .name = "mean_peer",
+ .cmp = empty_cmp,
+ .entry = mean_peer_entry,
+ .width = 8,
+};
+
static struct c2c_dimension dim_cpucnt = {
.header = HEADER_BOTH("cpu", "cnt"),
.name = "cpucnt",
@@ -1672,8 +1745,10 @@ static struct c2c_dimension *dimensions[] = {
&dim_tot_hitm,
&dim_lcl_hitm,
&dim_rmt_hitm,
+ &dim_ld_peer,
&dim_cl_lcl_hitm,
&dim_cl_rmt_hitm,
+ &dim_cl_ld_peer,
&dim_tot_stores,
&dim_stores_l1hit,
&dim_stores_l1miss,
@@ -1691,6 +1766,7 @@ static struct c2c_dimension *dimensions[] = {
&dim_percent_hitm,
&dim_percent_rmt_hitm,
&dim_percent_lcl_hitm,
+ &dim_percent_ld_peer,
&dim_percent_stores_l1hit,
&dim_percent_stores_l1miss,
&dim_percent_stores_anylvl,
@@ -1704,6 +1780,7 @@ static struct c2c_dimension *dimensions[] = {
&dim_mean_rmt,
&dim_mean_lcl,
&dim_mean_load,
+ &dim_mean_peer,
&dim_cpucnt,
&dim_srcline,
&dim_dcacheline_idx,
@@ -2192,6 +2269,7 @@ static void print_c2c__display_stats(FILE *out)
fprintf(out, " Load L1D hit : %10d\n", stats->ld_l1hit);
fprintf(out, " Load L2D hit : %10d\n", stats->ld_l2hit);
fprintf(out, " Load LLC hit : %10d\n", stats->ld_llchit + stats->lcl_hitm);
+ fprintf(out, " Load HIT Peer : %10d\n", stats->ld_peer);
fprintf(out, " Load Local HITM : %10d\n", stats->lcl_hitm);
fprintf(out, " Load Remote HITM : %10d\n", stats->rmt_hitm);
fprintf(out, " Load Remote HIT : %10d\n", stats->rmt_hit);
@@ -2229,6 +2307,7 @@ static void print_shared_cacheline_info(FILE *out)
fprintf(out, " Fill Buffer Hits on shared lines : %10d\n", stats->ld_fbhit);
fprintf(out, " L1D hits on shared lines : %10d\n", stats->ld_l1hit);
fprintf(out, " L2D hits on shared lines : %10d\n", stats->ld_l2hit);
+ fprintf(out, " Load HITs on peer cache lines : %10d\n", stats->ld_peer);
fprintf(out, " LLC hits on shared lines : %10d\n", stats->ld_llchit + stats->lcl_hitm);
fprintf(out, " Locked Access on shared lines : %10d\n", stats->locks);
fprintf(out, " Blocked Access on shared lines : %10d\n", stats->blk_data + stats->blk_addr);
@@ -2257,10 +2336,10 @@ static void print_cacheline(struct c2c_hists *c2c_hists,
fprintf(out, "\n");
}

- fprintf(out, " ----------------------------------------------------------------------\n");
+ fprintf(out, " -------------------------------------------------------------------------------\n");
__hist_entry__snprintf(he_cl, &hpp, hpp_list);
fprintf(out, "%s\n", bf);
- fprintf(out, " ----------------------------------------------------------------------\n");
+ fprintf(out, " -------------------------------------------------------------------------------\n");

hists__fprintf(&c2c_hists->hists, false, 0, 0, 0, out, false);
}
@@ -2275,6 +2354,7 @@ static void print_pareto(FILE *out)
cl_output = "cl_num,"
"cl_rmt_hitm,"
"cl_lcl_hitm,"
+ "cl_ld_peer,"
"cl_stores_l1hit,"
"cl_stores_l1miss,"
"cl_stores_anylvl,"
@@ -2727,6 +2807,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
c2c.use_stdio ? "cl_num_empty," : "",
"percent_rmt_hitm,"
"percent_lcl_hitm,"
+ "percent_ld_peer,"
"percent_stores_l1hit,"
"percent_stores_l1miss,"
"percent_stores_anylvl,"
@@ -2737,6 +2818,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
"mean_rmt,"
"mean_lcl,"
"mean_load,"
+ "mean_peer,"
"tot_recs,"
"cpucnt,",
add_sym ? "symbol," : "",
@@ -2913,6 +2995,7 @@ static int perf_c2c__report(int argc, const char **argv)
"dcacheline_count,"
"percent_hitm,"
"tot_hitm,lcl_hitm,rmt_hitm,"
+ "ld_peer,"
"tot_recs,"
"tot_loads,"
"tot_stores,"
--
2.25.1

2022-04-27 16:22:18

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 11/11] perf c2c: Use 'peer' as default display for Arm64

Since Arm64 arch doesn't support HITMs flags, so use 'peer' as default
if user doesn't specify display type; for other arches, it still uses
'tot' as default display type if user doesn't specify it.

Suggested-by: Ali Saidi <[email protected]>
Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index bd4516e486c0..c944631fc505 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2787,7 +2787,7 @@ static int setup_callchain(struct evlist *evlist)

static int setup_display(const char *str)
{
- const char *display = str ?: "tot";
+ const char *display = str;

if (!strcmp(display, "tot"))
c2c.display = DISPLAY_TOT_HITM;
@@ -2973,9 +2973,6 @@ static int perf_c2c__report(int argc, const char **argv)
data.path = input_name;
data.force = symbol_conf.force;

- err = setup_display(display);
- if (err)
- goto out;

err = setup_coalesce(coalesce, no_source);
if (err) {
@@ -2996,6 +2993,22 @@ static int perf_c2c__report(int argc, const char **argv)
goto out;
}

+ /*
+ * Use the 'tot' as default display type if user doesn't specify it;
+ * since Arm64 platform doesn't support HITMs flag, use 'peer' as the
+ * default display type.
+ */
+ if (!display) {
+ if (!strcmp(perf_env__arch(&session->header.env), "arm64"))
+ display = "peer";
+ else
+ display = "tot";
+ }
+
+ err = setup_display(display);
+ if (err)
+ goto out_session;
+
session->itrace_synth_opts = &itrace_synth_opts;

err = setup_nodes(session);
--
2.25.1

2022-04-27 16:22:17

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 04/11] perf mem: Add statistics for peer snooping

Since we have added the flag PERF_MEM_SNOOPX_PEER to support cache
snooping from peer core or cluster, this patch is to add statistics for
this new flag.

Signed-off-by: Leo Yan <[email protected]>
Tested-by: Ali Saidi <[email protected]>
---
tools/perf/util/mem-events.c | 11 ++++++++++-
tools/perf/util/mem-events.h | 1 +
2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index bfbac365e1e4..2086f067359b 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -525,6 +525,7 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi)
u64 op = data_src->mem_op;
u64 lvl = data_src->mem_lvl;
u64 snoop = data_src->mem_snoop;
+ u64 snoopx = data_src->mem_snoopx;
u64 lock = data_src->mem_lock;
u64 blk = data_src->mem_blk;
u64 lvl_num = data_src->mem_lvl_num;
@@ -568,10 +569,17 @@ do { \
if (lvl & P(LVL, IO)) stats->ld_io++;
if (lvl & P(LVL, LFB)) stats->ld_fbhit++;
if (lvl & P(LVL, L1 )) stats->ld_l1hit++;
- if (lvl & P(LVL, L2 )) stats->ld_l2hit++;
+ if (lvl & P(LVL, L2)) {
+ if (snoopx & P(SNOOPX, PEER))
+ stats->ld_peer++;
+ else
+ stats->ld_l2hit++;
+ }
if (lvl & P(LVL, L3 )) {
if (snoop & P(SNOOP, HITM))
HITM_INC(lcl_hitm);
+ else if (snoopx & P(SNOOPX, PEER))
+ stats->ld_peer++;
else
stats->ld_llchit++;
}
@@ -662,6 +670,7 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add)
stats->ld_l1hit += add->ld_l1hit;
stats->ld_l2hit += add->ld_l2hit;
stats->ld_llchit += add->ld_llchit;
+ stats->ld_peer += add->ld_peer;
stats->lcl_hitm += add->lcl_hitm;
stats->rmt_hitm += add->rmt_hitm;
stats->tot_hitm += add->tot_hitm;
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index e0e8057c52e8..7b6c74e74354 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -75,6 +75,7 @@ struct c2c_stats {
u32 ld_l1hit; /* count of loads that hit L1D */
u32 ld_l2hit; /* count of loads that hit L2D */
u32 ld_llchit; /* count of loads that hit LLC */
+ u32 ld_peer; /* count of loads that hit peer core or cluster cache */
u32 lcl_hitm; /* count of loads with local HITM */
u32 rmt_hitm; /* count of loads with remote HITM */
u32 tot_hitm; /* count of loads with local and remote HITM */
--
2.25.1

2022-04-27 16:23:20

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 02/11] perf c2c: Add dimensions for 'anylvl' metrics of store operation

Since now we have the statistics 'st_anylvl' for store operations, add
dimensions for the 'anylvl' metrics and also associated percentage
calculation for the single cache line view.

Signed-off-by: Leo Yan <[email protected]>
Tested-by: Ali Saidi <[email protected]>
---
tools/perf/builtin-c2c.c | 80 ++++++++++++++++++++++++++++++++++++----
1 file changed, 73 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index fbbed434014f..9ef439610a2b 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -653,6 +653,7 @@ STAT_FN(lcl_hitm)
STAT_FN(store)
STAT_FN(st_l1hit)
STAT_FN(st_l1miss)
+STAT_FN(st_anylvl)
STAT_FN(ld_fbhit)
STAT_FN(ld_l1hit)
STAT_FN(ld_l2hit)
@@ -677,7 +678,8 @@ static uint64_t total_records(struct c2c_stats *stats)

total = ldcnt +
stats->st_l1hit +
- stats->st_l1miss;
+ stats->st_l1miss +
+ stats->st_anylvl;

return total;
}
@@ -899,6 +901,7 @@ PERCENT_FN(rmt_hitm)
PERCENT_FN(lcl_hitm)
PERCENT_FN(st_l1hit)
PERCENT_FN(st_l1miss)
+PERCENT_FN(st_anylvl)

static int
percent_rmt_hitm_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
@@ -1024,6 +1027,37 @@ percent_stores_l1miss_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
return per_left - per_right;
}

+static int
+percent_stores_anylvl_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ int width = c2c_width(fmt, hpp, he->hists);
+ double per = PERCENT(he, st_anylvl);
+ char buf[10];
+
+ return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_stores_anylvl_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ return percent_color(fmt, hpp, he, percent_st_anylvl);
+}
+
+static int64_t
+percent_stores_anylvl_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+ struct hist_entry *left, struct hist_entry *right)
+{
+ double per_left;
+ double per_right;
+
+ per_left = PERCENT(left, st_anylvl);
+ per_right = PERCENT(right, st_anylvl);
+
+ return per_left - per_right;
+}
+
STAT_FN(lcl_dram)
STAT_FN(rmt_dram)

@@ -1351,7 +1385,7 @@ static struct c2c_dimension dim_tot_stores = {
};

static struct c2c_dimension dim_stores_l1hit = {
- .header = HEADER_SPAN("---- Stores ----", "L1Hit", 1),
+ .header = HEADER_SPAN("--------- Stores --------", "L1Hit", 2),
.name = "stores_l1hit",
.cmp = st_l1hit_cmp,
.entry = st_l1hit_entry,
@@ -1366,8 +1400,16 @@ static struct c2c_dimension dim_stores_l1miss = {
.width = 7,
};

+static struct c2c_dimension dim_stores_anylvl = {
+ .header = HEADER_SPAN_LOW("Anylvl"),
+ .name = "stores_anylvl",
+ .cmp = st_anylvl_cmp,
+ .entry = st_anylvl_entry,
+ .width = 7,
+};
+
static struct c2c_dimension dim_cl_stores_l1hit = {
- .header = HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
+ .header = HEADER_SPAN("------- Store Refs ------", "L1 Hit", 2),
.name = "cl_stores_l1hit",
.cmp = st_l1hit_cmp,
.entry = st_l1hit_entry,
@@ -1382,6 +1424,14 @@ static struct c2c_dimension dim_cl_stores_l1miss = {
.width = 7,
};

+static struct c2c_dimension dim_cl_stores_anylvl = {
+ .header = HEADER_SPAN_LOW("Any Lvl"),
+ .name = "cl_stores_anylvl",
+ .cmp = st_anylvl_cmp,
+ .entry = st_anylvl_entry,
+ .width = 7,
+};
+
static struct c2c_dimension dim_ld_fbhit = {
.header = HEADER_SPAN("----- Core Load Hit -----", "FB", 2),
.name = "ld_fbhit",
@@ -1471,7 +1521,7 @@ static struct c2c_dimension dim_percent_lcl_hitm = {
};

static struct c2c_dimension dim_percent_stores_l1hit = {
- .header = HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
+ .header = HEADER_SPAN("------- Store Refs ------", "L1 Hit", 2),
.name = "percent_stores_l1hit",
.cmp = percent_stores_l1hit_cmp,
.entry = percent_stores_l1hit_entry,
@@ -1488,6 +1538,15 @@ static struct c2c_dimension dim_percent_stores_l1miss = {
.width = 7,
};

+static struct c2c_dimension dim_percent_stores_anylvl = {
+ .header = HEADER_SPAN_LOW("Any Lvl"),
+ .name = "percent_stores_anylvl",
+ .cmp = percent_stores_anylvl_cmp,
+ .entry = percent_stores_anylvl_entry,
+ .color = percent_stores_anylvl_color,
+ .width = 7,
+};
+
static struct c2c_dimension dim_dram_lcl = {
.header = HEADER_SPAN("--- Load Dram ----", "Lcl", 1),
.name = "dram_lcl",
@@ -1618,8 +1677,10 @@ static struct c2c_dimension *dimensions[] = {
&dim_tot_stores,
&dim_stores_l1hit,
&dim_stores_l1miss,
+ &dim_stores_anylvl,
&dim_cl_stores_l1hit,
&dim_cl_stores_l1miss,
+ &dim_cl_stores_anylvl,
&dim_ld_fbhit,
&dim_ld_l1hit,
&dim_ld_l2hit,
@@ -1632,6 +1693,7 @@ static struct c2c_dimension *dimensions[] = {
&dim_percent_lcl_hitm,
&dim_percent_stores_l1hit,
&dim_percent_stores_l1miss,
+ &dim_percent_stores_anylvl,
&dim_dram_lcl,
&dim_dram_rmt,
&dim_pid,
@@ -2149,6 +2211,7 @@ static void print_c2c__display_stats(FILE *out)
fprintf(out, " Store - no mapping : %10d\n", stats->st_noadrs);
fprintf(out, " Store L1D Hit : %10d\n", stats->st_l1hit);
fprintf(out, " Store L1D Miss : %10d\n", stats->st_l1miss);
+ fprintf(out, " Store Any cache Level : %10d\n", stats->st_anylvl);
fprintf(out, " No Page Map Rejects : %10d\n", stats->nomap);
fprintf(out, " Unable to parse data source : %10d\n", stats->noparse);
}
@@ -2171,6 +2234,7 @@ static void print_shared_cacheline_info(FILE *out)
fprintf(out, " Blocked Access on shared lines : %10d\n", stats->blk_data + stats->blk_addr);
fprintf(out, " Store HITs on shared lines : %10d\n", stats->store);
fprintf(out, " Store L1D hits on shared lines : %10d\n", stats->st_l1hit);
+ fprintf(out, " Store Any cache Level : %10d\n", stats->st_anylvl);
fprintf(out, " Total Merged records : %10d\n", hitm_cnt + stats->store);
}

@@ -2193,10 +2257,10 @@ static void print_cacheline(struct c2c_hists *c2c_hists,
fprintf(out, "\n");
}

- fprintf(out, " -------------------------------------------------------------\n");
+ fprintf(out, " ----------------------------------------------------------------------\n");
__hist_entry__snprintf(he_cl, &hpp, hpp_list);
fprintf(out, "%s\n", bf);
- fprintf(out, " -------------------------------------------------------------\n");
+ fprintf(out, " ----------------------------------------------------------------------\n");

hists__fprintf(&c2c_hists->hists, false, 0, 0, 0, out, false);
}
@@ -2213,6 +2277,7 @@ static void print_pareto(FILE *out)
"cl_lcl_hitm,"
"cl_stores_l1hit,"
"cl_stores_l1miss,"
+ "cl_stores_anylvl,"
"dcacheline";

perf_hpp_list__init(&hpp_list);
@@ -2664,6 +2729,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
"percent_lcl_hitm,"
"percent_stores_l1hit,"
"percent_stores_l1miss,"
+ "percent_stores_anylvl,"
"offset,offset_node,dcacheline_count,",
add_pid ? "pid," : "",
add_tid ? "tid," : "",
@@ -2850,7 +2916,7 @@ static int perf_c2c__report(int argc, const char **argv)
"tot_recs,"
"tot_loads,"
"tot_stores,"
- "stores_l1hit,stores_l1miss,"
+ "stores_l1hit,stores_l1miss,stores_anylvl,"
"ld_fbhit,ld_l1hit,ld_l2hit,"
"ld_lclhit,lcl_hitm,"
"ld_rmthit,rmt_hitm,"
--
2.25.1

2022-04-27 16:24:09

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 01/11] perf mem: Add any cache level statistics for store operation

Sometimes we don't know memory store operations happen on exactly which
cache level, so set the memory level flag PERF_MEM_LVLNUM_ANY_CACHE for
this case. An usage case is that Arm SPE trace data sets this flag for
all the store operations due to we have no sufficient info for cache
level.

This patch is to add a new item "st_anylvl" in structure c2c_stats so
that support any cache level statistics for store operations.

Signed-off-by: Leo Yan <[email protected]>
Tested-by: Ali Saidi <[email protected]>
---
tools/perf/util/mem-events.c | 4 ++++
tools/perf/util/mem-events.h | 1 +
2 files changed, 5 insertions(+)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index db5225caaabe..bfbac365e1e4 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -527,6 +527,7 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi)
u64 snoop = data_src->mem_snoop;
u64 lock = data_src->mem_lock;
u64 blk = data_src->mem_blk;
+ u64 lvl_num = data_src->mem_lvl_num;
/*
* Skylake might report unknown remote level via this
* bit, consider it when evaluating remote HITMs.
@@ -621,6 +622,8 @@ do { \
}
if (lvl & P(LVL, MISS))
if (lvl & P(LVL, L1)) stats->st_l1miss++;
+ if (lvl_num == P(LVLNUM, ANY_CACHE))
+ stats->st_anylvl++;
} else {
/* unparsable data_src? */
stats->noparse++;
@@ -647,6 +650,7 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add)
stats->st_noadrs += add->st_noadrs;
stats->st_l1hit += add->st_l1hit;
stats->st_l1miss += add->st_l1miss;
+ stats->st_anylvl += add->st_anylvl;
stats->load += add->load;
stats->ld_excl += add->ld_excl;
stats->ld_shared += add->ld_shared;
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 916242f8020a..e0e8057c52e8 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -63,6 +63,7 @@ struct c2c_stats {
u32 st_noadrs; /* cacheable store with no address */
u32 st_l1hit; /* count of stores that hit L1D */
u32 st_l1miss; /* count of stores that miss L1D */
+ u32 st_anylvl; /* count of stores with any cache level */
u32 load; /* count of all loads in trace */
u32 ld_excl; /* exclusive loads, rmt/lcl DRAM - snp none/miss */
u32 ld_shared; /* shared loads, rmt/lcl DRAM - snp hit */
--
2.25.1