2020-10-15 14:52:45

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load

If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
rely on HITM display to report cache false sharing. Alternatively, we
can use the LLC access and multi-threads info to locate the potential
false sharing's data address, and if we connect with source code and
analyze the multi-threads' execution timing, if can conclude load and
store the same cache line at the meantime, thus this can be helpful for
resolve the cache false sharing issue.

This patch set is to enable the display with sorting on LLC load
accesses; it adds dimensions for total LLC hit and LLC load accesses,
and these dimensions are used for shared cache line table and pareto.

This patch set is dependend on the patch set "perf c2c: Refine the
organization of metrics" [1].

[1] https://lore.kernel.org/patchwork/cover/1321499/

With this patch set, we can get display 'llc' as follows:

# perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio

[...]

=================================================
Shared Data Cache Line Table
=================================================
#
# ----------- Cacheline ---------- LLC Hit LLC Hit Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
# Index Address Node PA cnt Pct Total records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
# ..... .................. .... ...... ....... ........ ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
#
0 0x563b01e83100 0 1401 65.32% 648 7011 3738 3273 2582 691 515 2516 59 143 505 0 0 0 0
1 0x563b01e830c0 0 1 26.51% 263 400 400 0 0 0 130 3 4 262 1 0 0 0 0
2 0x563b01e83080 0 1 7.76% 77 650 650 0 0 0 180 348 45 14 63 0 0 0 0
3 0xffff88c3d74e82c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0
4 0xffffa587c11e38c0 N/A 0 0.10% 1 2 1 1 1 0 0 0 0 1 0 0 0 0 0
5 0xffffffffbd5e6fc0 0 1 0.10% 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0
6 0x7f90a4d6c2c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0

=================================================
Shared Cache Line Distribution Pareto
=================================================
#
# ---- LLC LD ---- -- Store Refs -- --------- Data address --------- ---------- cycles ---------- Total cpu Shared
# Num LclHit LclHitm L1 Hit L1 Miss Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load records cnt Symbol Object Source:Line Node
# ..... ....... ....... ....... ....... .................. .... ...... ....... .................. .................. ........ ........ ........ ....... ........ ................... ................. ........................... ....
#
-------------------------------------------------------------
0 143 505 2582 691 0x563b01e83100
-------------------------------------------------------------
96.50% 7.72% 46.79% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c16 0 1949 1331 1876 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
0.00% 35.05% 0.00% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c1d 0 2651 975 748 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
0.00% 30.89% 0.00% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c1d 0 1425 1003 762 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
2.10% 7.52% 49.19% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c16 0 1585 1053 2037 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
0.00% 0.00% 2.52% 44.86% 0x0 0 1 14100 14102:lock_th 0x563b01c81c28 0 0 0 375 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
0.00% 0.00% 1.51% 55.14% 0x0 0 1 14100 14103:lock_th 0x563b01c81c28 0 0 0 420 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
1.40% 12.87% 0.00% 0.00% 0x20 0 1 14100 14104:reader_thd 0x563b01c81c73 0 166 99 417 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
0.00% 5.94% 0.00% 0.00% 0x20 0 1 14100 14105:reader_thd 0x563b01c81c73 0 144 85 376 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0

[...]


Leo Yan (8):
perf mem: Add structure field c2c_stats::tot_llchit
perf c2c: Add dimensions for total LLC hit
perf c2c: Add dimensions for LLC load hit
perf c2c: Change to general naming for macros
perf c2c: Rename for shared cache line stats
perf c2c: Refactor hist entry validation
perf c2c: Add option '-d llc' for sorting with LLC load
perf c2c: Update documentation for display option 'llc'

tools/perf/Documentation/perf-c2c.txt | 18 +-
tools/perf/builtin-c2c.c | 333 +++++++++++++++++++++-----
tools/perf/util/mem-events.c | 3 +
tools/perf/util/mem-events.h | 1 +
4 files changed, 286 insertions(+), 69 deletions(-)

--
2.17.1


2020-10-15 14:53:11

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 4/8] perf c2c: Change to general naming for macros

The display and filter macros are named with suffix "_HITM", it is bound
to HITM metrics and isn't general if we want to sort cache lines with
other metrics (like LLC load hit) rather than HITM metrics.

This patch changes to use more general naming for macros:

s/DISPLAY_HITM/DISPLAY_ENTRY
s/FILTER_HITM/FILTER_ENTRY

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 3f6271a779c4..59f141adea3e 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1176,7 +1176,7 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
ret = scnprintf(hpp->buf, hpp->size, "%2d{%2d ", node, num);
advance_hpp(hpp, ret);

- #define DISPLAY_HITM(__h) \
+ #define DISPLAY_ENTRY(__h) \
if (c2c_he->stats.__h> 0) { \
ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ", \
percent(stats->__h, c2c_he->stats.__h));\
@@ -1186,18 +1186,18 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,

switch (c2c.display) {
case DISPLAY_RMT:
- DISPLAY_HITM(rmt_hitm);
+ DISPLAY_ENTRY(rmt_hitm);
break;
case DISPLAY_LCL:
- DISPLAY_HITM(lcl_hitm);
+ DISPLAY_ENTRY(lcl_hitm);
break;
case DISPLAY_TOT:
- DISPLAY_HITM(tot_hitm);
+ DISPLAY_ENTRY(tot_hitm);
default:
break;
}

- #undef DISPLAY_HITM
+ #undef DISPLAY_ENTRY

advance_hpp(hpp, ret);

@@ -1984,7 +1984,7 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)

c2c_he = container_of(he, struct c2c_hist_entry, he);

-#define FILTER_HITM(__h) \
+#define FILTER_ENTRY(__h) \
if (stats->__h) { \
ld_dist = ((double)c2c_he->stats.__h / stats->__h); \
if (ld_dist < DISPLAY_LINE_LIMIT) \
@@ -1995,18 +1995,18 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)

switch (c2c.display) {
case DISPLAY_LCL:
- FILTER_HITM(lcl_hitm);
+ FILTER_ENTRY(lcl_hitm);
break;
case DISPLAY_RMT:
- FILTER_HITM(rmt_hitm);
+ FILTER_ENTRY(rmt_hitm);
break;
case DISPLAY_TOT:
- FILTER_HITM(tot_hitm);
+ FILTER_ENTRY(tot_hitm);
default:
break;
}

-#undef FILTER_HITM
+#undef FILTER_ENTRY

return he->filtered == 0;
}
--
2.17.1

2020-10-15 14:53:32

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 1/8] perf mem: Add structure field c2c_stats::tot_llchit

Add new field c2c_stats::tot_llchit to count total number for LLC hit:

c2c_stats::tot_llchit = c2c_stats::lcl_hitm + c2c_stats::ld_llchit

This is the preparation for additional sorting with total LLC hit, and
will be used in perf c2c report in following patches.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/util/mem-events.c | 3 +++
tools/perf/util/mem-events.h | 1 +
2 files changed, 4 insertions(+)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index ea0af0bc4314..0ad27bef0698 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -371,6 +371,8 @@ do { \
HITM_INC(lcl_hitm);
else
stats->ld_llchit++;
+
+ stats->tot_llchit++;
}

if (lvl & P(LVL, LOC_RAM)) {
@@ -455,6 +457,7 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add)
stats->ld_fbhit += add->ld_fbhit;
stats->ld_l1hit += add->ld_l1hit;
stats->ld_l2hit += add->ld_l2hit;
+ stats->tot_llchit += add->tot_llchit;
stats->ld_llchit += add->ld_llchit;
stats->lcl_hitm += add->lcl_hitm;
stats->rmt_hitm += add->rmt_hitm;
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 904dad34f7f7..a1fa1c312ddb 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -68,6 +68,7 @@ struct c2c_stats {
u32 ld_fbhit; /* count of loads hitting Fill Buffer */
u32 ld_l1hit; /* count of loads that hit L1D */
u32 ld_l2hit; /* count of loads that hit L2D */
+ u32 tot_llchit; /* count of all loads that hit LLC */
u32 ld_llchit; /* count of loads that hit LLC */
u32 lcl_hitm; /* count of loads with local HITM */
u32 rmt_hitm; /* count of loads with remote HITM */
--
2.17.1

2020-10-15 14:54:41

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 5/8] perf c2c: Rename for shared cache line stats

In the structure perf_c2c for shared cache line stats, replace
"hitm_stats" with "shared_clines_stats", and rename function
resort_hitm_cb() to resort_shared_cl_cb().

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 59f141adea3e..7962cc70b855 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -97,8 +97,8 @@ struct perf_c2c {
bool symbol_full;
bool stitch_lbr;

- /* HITM shared clines stats */
- struct c2c_stats hitm_stats;
+ /* Shared clines stats */
+ struct c2c_stats shared_clines_stats;
int shared_clines;

int display;
@@ -2084,7 +2084,7 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
{
struct c2c_hist_entry *c2c_he;
struct c2c_hists *c2c_hists;
- bool display = he__display(he, &c2c.hitm_stats);
+ bool display = he__display(he, &c2c.shared_clines_stats);

c2c_he = container_of(he, struct c2c_hist_entry, he);
c2c_hists = c2c_he->hists;
@@ -2171,14 +2171,14 @@ static int setup_nodes(struct perf_session *session)

#define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)

-static int resort_hitm_cb(struct hist_entry *he, void *arg __maybe_unused)
+static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
{
struct c2c_hist_entry *c2c_he;
c2c_he = container_of(he, struct c2c_hist_entry, he);

if (HAS_HITMS(c2c_he)) {
c2c.shared_clines++;
- c2c_add_stats(&c2c.hitm_stats, &c2c_he->stats);
+ c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
}

return 0;
@@ -2249,7 +2249,7 @@ static void print_c2c__display_stats(FILE *out)

static void print_shared_cacheline_info(FILE *out)
{
- struct c2c_stats *stats = &c2c.hitm_stats;
+ struct c2c_stats *stats = &c2c.shared_clines_stats;
int hitm_cnt = stats->lcl_hitm + stats->rmt_hitm;

fprintf(out, "=================================================\n");
@@ -2942,7 +2942,7 @@ static int perf_c2c__report(int argc, const char **argv)
ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");

hists__collapse_resort(&c2c.hists.hists, NULL);
- hists__output_resort_cb(&c2c.hists.hists, &prog, resort_hitm_cb);
+ hists__output_resort_cb(&c2c.hists.hists, &prog, resort_shared_cl_cb);
hists__iterate_cb(&c2c.hists.hists, resort_cl_cb);

ui_progress__finish();
--
2.17.1

2020-10-15 14:54:43

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 8/8] perf c2c: Update documentation for display option 'llc'

Since the new display option 'llc' is introduced, this patch is to
update the documentation to reflect it.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/Documentation/perf-c2c.txt | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index c81d72e3eecf..eadce62ecd28 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -109,7 +109,8 @@ REPORT OPTIONS

-d::
--display::
- Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
+ Switch to HITM type (rmt, lcl) or LLC load access (llc) to display
+ and sort on. Total HITMs as default.

--stitch-lbr::
Show callgraph with stitched LBRs, which may have more complete
@@ -174,12 +175,18 @@ For each cacheline in the 1) list we display following data:
Cacheline
- cacheline address (hex number)

- Rmt/Lcl Hitm
+ Rmt/Lcl Hitm (For display with HITM types)
- cacheline percentage of all Remote/Local HITM accesses

- LLC Load Hitm - Total, LclHitm, RmtHitm
+ LLC Load Hitm - Total, LclHitm, RmtHitm (For display with HITM types)
- count of Total/Local/Remote load HITMs

+ LLC Hit Pct (For display 'llc')
+ - cacheline percentage of all LLC load accesses
+
+ LLC Hit Total (For display 'llc')
+ - sum of all LLC load accesses
+
Total records
- sum of all cachelines accesses

@@ -207,9 +214,12 @@ For each cacheline in the 1) list we display following data:

For each offset in the 2) list we display following data:

- HITM - Rmt, Lcl
+ HITM - Rmt, Lcl (For display with HITM types)
- % of Remote/Local HITM accesses for given offset within cacheline

+ LLC LD - LclHit, LclHitm (For display 'llc')
+ - % of LLC hits and LLC HITMs accesses for given offset within cacheline
+
Store Refs - L1 Hit, L1 Miss
- % of store accesses that hit/missed L1 for given offset within cacheline

--
2.17.1

2020-10-15 14:55:17

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 6/8] perf c2c: Refactor hist entry validation

This patch has no any functionality change but only refactors hist entry
validation for cacheline resorting.

It renames function "valid_hitm_or_store()" to "is_valid_hist_entry()",
changes return type from integer type to bool type. In the function,
it uses switch-case instead of ternary operators for better readability.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 7962cc70b855..d15a6220bfd0 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2011,16 +2011,32 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
return he->filtered == 0;
}

-static inline int valid_hitm_or_store(struct hist_entry *he)
+static inline bool is_valid_hist_entry(struct hist_entry *he)
{
struct c2c_hist_entry *c2c_he;
- bool has_hitm;
+ bool has_record = false;

c2c_he = container_of(he, struct c2c_hist_entry, he);
- has_hitm = c2c.display == DISPLAY_TOT ? c2c_he->stats.tot_hitm :
- c2c.display == DISPLAY_LCL ? c2c_he->stats.lcl_hitm :
- c2c_he->stats.rmt_hitm;
- return has_hitm || c2c_he->stats.store;
+
+ /* It's a valid entry if contains stores */
+ if (c2c_he->stats.store)
+ return true;
+
+ switch (c2c.display) {
+ case DISPLAY_LCL:
+ has_record = !!c2c_he->stats.lcl_hitm;
+ break;
+ case DISPLAY_RMT:
+ has_record = !!c2c_he->stats.rmt_hitm;
+ break;
+ case DISPLAY_TOT:
+ has_record = !!c2c_he->stats.tot_hitm;
+ break;
+ default:
+ break;
+ }
+
+ return has_record;
}

static void set_node_width(struct c2c_hist_entry *c2c_he, int len)
@@ -2074,7 +2090,7 @@ static int filter_cb(struct hist_entry *he, void *arg __maybe_unused)

calc_width(c2c_he);

- if (!valid_hitm_or_store(he))
+ if (!is_valid_hist_entry(he))
he->filtered = HIST_FILTER__C2C;

return 0;
--
2.17.1

2020-10-15 14:55:24

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 2/8] perf c2c: Add dimensions for total LLC hit

Since Arm64 SPE trace data doesn't support HITM, we still want to
explore "perf c2c" tool to analyze cache false sharing. If without HITM
tag, the tool cannot give out accurate result for cache false sharing,
a candidate solution is to sort the LLC hit and connect with the info of
multiple threads, e.g. if multiple threads hit the LLC on the same
cacheline for many times, this hints that it's likely to cause false
sharing issue. Though this solution is not perfect due to lacking HITM
tag, it's pragmatic for detecting false sharing.

To support the sorting with LLC hit, this patch adds dimensions for
total LLC hit and the associated percentage calculation.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 76 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 76 insertions(+)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 9c2183957c50..5fb77fcd3c9c 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -651,6 +651,7 @@ STAT_FN(ld_l1hit)
STAT_FN(ld_l2hit)
STAT_FN(ld_llchit)
STAT_FN(rmt_hit)
+STAT_FN(tot_llchit)

static uint64_t total_records(struct c2c_stats *stats)
{
@@ -856,6 +857,62 @@ percent_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
return per_left - per_right;
}

+static double percent_llchit(struct c2c_hist_entry *c2c_he)
+{
+ struct c2c_hists *hists;
+ struct c2c_stats *stats;
+ struct c2c_stats *total;
+ int tot = 0, st = 0;
+
+ hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
+ stats = &c2c_he->stats;
+ total = &hists->stats;
+
+ st = stats->tot_llchit;
+ tot = total->tot_llchit;
+
+ return tot ? (double) st * 100 / tot : 0;
+}
+
+static int
+percent_llchit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ struct c2c_hist_entry *c2c_he;
+ int width = c2c_width(fmt, hpp, he->hists);
+ char buf[10];
+ double per;
+
+ c2c_he = container_of(he, struct c2c_hist_entry, he);
+ per = percent_llchit(c2c_he);
+ return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_llchit_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ return percent_color(fmt, hpp, he, percent_llchit);
+}
+
+static int64_t
+percent_llchit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+ struct hist_entry *left, struct hist_entry *right)
+{
+ struct c2c_hist_entry *c2c_left;
+ struct c2c_hist_entry *c2c_right;
+ double per_left;
+ double per_right;
+
+ c2c_left = container_of(left, struct c2c_hist_entry, he);
+ c2c_right = container_of(right, struct c2c_hist_entry, he);
+
+ per_left = percent_llchit(c2c_left);
+ per_right = percent_llchit(c2c_right);
+
+ return per_left - per_right;
+}
+
static struct c2c_stats *he_stats(struct hist_entry *he)
{
struct c2c_hist_entry *c2c_he;
@@ -1392,6 +1449,14 @@ static struct c2c_dimension dim_ld_l2hit = {
.width = 7,
};

+static struct c2c_dimension dim_tot_llchit = {
+ .header = HEADER_BOTH("LLC Hit", "Total"),
+ .name = "tot_llchit",
+ .cmp = tot_llchit_cmp,
+ .entry = tot_llchit_entry,
+ .width = 8,
+};
+
static struct c2c_dimension dim_ld_llchit = {
.header = HEADER_SPAN("- LLC Load Hit --", "LclHit", 1),
.name = "ld_lclhit",
@@ -1438,6 +1503,15 @@ static struct c2c_dimension dim_percent_hitm = {
.width = 7,
};

+static struct c2c_dimension dim_percent_llchit = {
+ .header = HEADER_BOTH("LLC Hit", "Pct"),
+ .name = "percent_llchit",
+ .cmp = percent_llchit_cmp,
+ .entry = percent_llchit_entry,
+ .color = percent_llchit_color,
+ .width = 7,
+};
+
static struct c2c_dimension dim_percent_rmt_hitm = {
.header = HEADER_SPAN("----- HITM -----", "RmtHitm", 1),
.name = "percent_rmt_hitm",
@@ -1611,9 +1685,11 @@ static struct c2c_dimension *dimensions[] = {
&dim_ld_l2hit,
&dim_ld_llchit,
&dim_ld_rmthit,
+ &dim_tot_llchit,
&dim_tot_recs,
&dim_tot_loads,
&dim_percent_hitm,
+ &dim_percent_llchit,
&dim_percent_rmt_hitm,
&dim_percent_lcl_hitm,
&dim_percent_stores_l1hit,
--
2.17.1

2020-10-15 14:56:44

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load

Except the existed three display options 'tot', 'rmt', 'lcl', this patch
adds option 'llc' so that can sort on LLC load metrics. The new
introduced option can work as a choice if the memory event doesn't
contain HITM tags.

For the display with option 'llc', both the "Shared Data Cache Line
Table" and "Shared Cache Line Distribution Pareto" have difference
comparing to other three display options.

For the "Shared Data Cache Line Table", instead of sorting HITM metrics,
it sorts with the LLC hit metrics "tot_llchit". In this case, users
should be interested in LLC related statistics, so the dimensions of
total LLC hit is used to replace HITM related dimensions.

For Pareto, every single cache line shows the metrics "cl_llc_hit"
instead of "cl_rmt_hitm", and the single cache line view is sorted by
metrics "tot_llchit".

As result, we can get the 'llc' display as follows:

# perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio

[...]

=================================================
Shared Data Cache Line Table
=================================================
#
# ----------- Cacheline ---------- LLC Hit LLC Hit Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
# Index Address Node PA cnt Pct Total records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
# ..... .................. .... ...... ....... ........ ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
#
0 0x563b01e83100 0 1401 65.32% 648 7011 3738 3273 2582 691 515 2516 59 143 505 0 0 0 0
1 0x563b01e830c0 0 1 26.51% 263 400 400 0 0 0 130 3 4 262 1 0 0 0 0
2 0x563b01e83080 0 1 7.76% 77 650 650 0 0 0 180 348 45 14 63 0 0 0 0
3 0xffff88c3d74e82c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0
4 0xffffa587c11e38c0 N/A 0 0.10% 1 2 1 1 1 0 0 0 0 1 0 0 0 0 0
5 0xffffffffbd5e6fc0 0 1 0.10% 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0
6 0x7f90a4d6c2c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0

=================================================
Shared Cache Line Distribution Pareto
=================================================
#
# ---- LLC LD ---- -- Store Refs -- --------- Data address --------- ---------- cycles ---------- Total cpu Shared
# Num LclHit LclHitm L1 Hit L1 Miss Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load records cnt Symbol Object Source:Line Node
# ..... ....... ....... ....... ....... .................. .... ...... ....... .................. .................. ........ ........ ........ ....... ........ ................... ................. ........................... ....
#
-------------------------------------------------------------
0 143 505 2582 691 0x563b01e83100
-------------------------------------------------------------
96.50% 7.72% 46.79% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c16 0 1949 1331 1876 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
0.00% 35.05% 0.00% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c1d 0 2651 975 748 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
0.00% 30.89% 0.00% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c1d 0 1425 1003 762 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
2.10% 7.52% 49.19% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c16 0 1585 1053 2037 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
0.00% 0.00% 2.52% 44.86% 0x0 0 1 14100 14102:lock_th 0x563b01c81c28 0 0 0 375 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
0.00% 0.00% 1.51% 55.14% 0x0 0 1 14100 14103:lock_th 0x563b01c81c28 0 0 0 420 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
1.40% 12.87% 0.00% 0.00% 0x20 0 1 14100 14104:reader_thd 0x563b01c81c73 0 166 99 417 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
0.00% 5.94% 0.00% 0.00% 0x20 0 1 14100 14105:reader_thd 0x563b01c81c73 0 144 85 376 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0

[...]

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 142 ++++++++++++++++++++++++++++-----------
1 file changed, 101 insertions(+), 41 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index d15a6220bfd0..0147c23ff918 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -113,13 +113,15 @@ enum {
DISPLAY_LCL,
DISPLAY_RMT,
DISPLAY_TOT,
+ DISPLAY_LLC,
DISPLAY_MAX,
};

static const char *display_str[DISPLAY_MAX] = {
- [DISPLAY_LCL] = "Local",
- [DISPLAY_RMT] = "Remote",
- [DISPLAY_TOT] = "Total",
+ [DISPLAY_LCL] = "Local HITMs",
+ [DISPLAY_RMT] = "Remote HITMs",
+ [DISPLAY_TOT] = "Total HITMs",
+ [DISPLAY_LLC] = "LLC Access",
};

static const struct option c2c_options[] = {
@@ -1193,6 +1195,10 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
break;
case DISPLAY_TOT:
DISPLAY_ENTRY(tot_hitm);
+ break;
+ case DISPLAY_LLC:
+ DISPLAY_ENTRY(tot_llchit);
+ break;
default:
break;
}
@@ -1533,6 +1539,7 @@ static struct c2c_header percent_hitm_header[] = {
[DISPLAY_LCL] = HEADER_BOTH("Lcl", "Hitm"),
[DISPLAY_RMT] = HEADER_BOTH("Rmt", "Hitm"),
[DISPLAY_TOT] = HEADER_BOTH("Tot", "Hitm"),
+ [DISPLAY_LLC] = HEADER_BOTH("LLC", "Hit"),
};

static struct c2c_dimension dim_percent_hitm = {
@@ -2002,6 +2009,10 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
break;
case DISPLAY_TOT:
FILTER_ENTRY(tot_hitm);
+ break;
+ case DISPLAY_LLC:
+ FILTER_ENTRY(tot_llchit);
+ break;
default:
break;
}
@@ -2032,6 +2043,9 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
case DISPLAY_TOT:
has_record = !!c2c_he->stats.tot_hitm;
break;
+ case DISPLAY_LLC:
+ has_record = !!c2c_he->stats.tot_llchit;
+ break;
default:
break;
}
@@ -2186,17 +2200,20 @@ static int setup_nodes(struct perf_session *session)
}

#define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)
+#define HAS_LLCHIT(__h) ((__h)->stats.tot_llchit)

static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
{
struct c2c_hist_entry *c2c_he;
c2c_he = container_of(he, struct c2c_hist_entry, he);

- if (HAS_HITMS(c2c_he)) {
+ if (c2c.display == DISPLAY_LLC && HAS_LLCHIT(c2c_he)) {
+ c2c.shared_clines++;
+ c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
+ } else if (HAS_HITMS(c2c_he)) {
c2c.shared_clines++;
c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
}
-
return 0;
}

@@ -2315,16 +2332,26 @@ static void print_pareto(FILE *out)
struct perf_hpp_list hpp_list;
struct rb_node *nd;
int ret;
+ const char *cl_output;
+
+ if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
+ c2c.display == DISPLAY_RMT)
+ cl_output = "cl_num,"
+ "cl_rmt_hitm,"
+ "cl_lcl_hitm,"
+ "cl_stores_l1hit,"
+ "cl_stores_l1miss,"
+ "dcacheline";
+ else /* c2c.display == DISPLAY_LLC */
+ cl_output = "cl_num,"
+ "cl_llc_hit,"
+ "cl_lcl_hitm,"
+ "cl_stores_l1hit,"
+ "cl_stores_l1miss,"
+ "dcacheline";

perf_hpp_list__init(&hpp_list);
- ret = hpp_list__parse(&hpp_list,
- "cl_num,"
- "cl_rmt_hitm,"
- "cl_lcl_hitm,"
- "cl_stores_l1hit,"
- "cl_stores_l1miss,"
- "dcacheline",
- NULL);
+ ret = hpp_list__parse(&hpp_list, cl_output, NULL);

if (WARN_ONCE(ret, "failed to setup sort entries\n"))
return;
@@ -2357,7 +2384,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
fprintf(out, "%-36s: %s\n", first ? " Events" : "", evsel__name(evsel));
first = false;
}
- fprintf(out, " Cachelines sort on : %s HITMs\n",
+ fprintf(out, " Cachelines sort on : %s\n",
display_str[c2c.display]);
fprintf(out, " Cacheline data grouping : %s\n", c2c.cl_sort);
}
@@ -2514,7 +2541,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
{
scnprintf(bf, size,
"Shared Data Cache Line Table "
- "(%lu entries, sorted on %s HITMs)",
+ "(%lu entries, sorted on %s)",
browser->nr_non_filtered_entries,
display_str[c2c.display]);
return 0;
@@ -2720,6 +2747,8 @@ static int setup_display(const char *str)
c2c.display = DISPLAY_RMT;
else if (!strcmp(display, "lcl"))
c2c.display = DISPLAY_LCL;
+ else if (!strcmp(display, "llc"))
+ c2c.display = DISPLAY_LLC;
else {
pr_err("failed: unknown display type: %s\n", str);
return -1;
@@ -2766,9 +2795,10 @@ static int build_cl_output(char *cl_sort, bool no_source)
}

if (asprintf(&c2c.cl_output,
- "%s%s%s%s%s%s%s%s%s%s",
+ "%s%s%s%s%s%s%s%s%s%s%s",
c2c.use_stdio ? "cl_num_empty," : "",
- "percent_rmt_hitm,"
+ c2c.display == DISPLAY_LLC ? "percent_llc_hit," :
+ "percent_rmt_hitm,",
"percent_lcl_hitm,"
"percent_stores_l1hit,"
"percent_stores_l1miss,"
@@ -2798,6 +2828,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
static int setup_coalesce(const char *coalesce, bool no_source)
{
const char *c = coalesce ?: coalesce_default;
+ const char *sort_str;

if (asprintf(&c2c.cl_sort, "offset,%s", c) < 0)
return -ENOMEM;
@@ -2805,12 +2836,16 @@ static int setup_coalesce(const char *coalesce, bool no_source)
if (build_cl_output(c2c.cl_sort, no_source))
return -1;

- if (asprintf(&c2c.cl_resort, "offset,%s",
- c2c.display == DISPLAY_TOT ?
- "tot_hitm" :
- c2c.display == DISPLAY_RMT ?
- "rmt_hitm,lcl_hitm" :
- "lcl_hitm,rmt_hitm") < 0)
+ if (c2c.display == DISPLAY_TOT)
+ sort_str = "tot_hitm";
+ else if (c2c.display == DISPLAY_RMT)
+ sort_str = "rmt_hitm,lcl_hitm";
+ else if (c2c.display == DISPLAY_LCL)
+ sort_str = "lcl_hitm,rmt_hitm";
+ else if (c2c.display == DISPLAY_LLC)
+ sort_str = "tot_llchit";
+
+ if (asprintf(&c2c.cl_resort, "offset,%s", sort_str) < 0)
return -ENOMEM;

pr_debug("coalesce sort fields: %s\n", c2c.cl_sort);
@@ -2862,6 +2897,7 @@ static int perf_c2c__report(int argc, const char **argv)
OPT_END()
};
int err = 0;
+ const char *output_str, *sort_str;

argc = parse_options(argc, argv, options, report_c2c_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
@@ -2936,24 +2972,48 @@ static int perf_c2c__report(int argc, const char **argv)
goto out_mem2node;
}

- c2c_hists__reinit(&c2c.hists,
- "cl_idx,"
- "dcacheline,"
- "dcacheline_node,"
- "dcacheline_count,"
- "percent_hitm,"
- "tot_hitm,lcl_hitm,rmt_hitm,"
- "tot_recs,"
- "tot_loads,"
- "tot_stores,"
- "stores_l1hit,stores_l1miss,"
- "ld_fbhit,ld_l1hit,ld_l2hit,"
- "ld_lclhit,lcl_hitm,"
- "ld_rmthit,rmt_hitm,"
- "dram_lcl,dram_rmt",
- c2c.display == DISPLAY_TOT ? "tot_hitm" :
- c2c.display == DISPLAY_LCL ? "lcl_hitm" : "rmt_hitm"
- );
+ if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
+ c2c.display == DISPLAY_RMT)
+ output_str = "cl_idx,"
+ "dcacheline,"
+ "dcacheline_node,"
+ "dcacheline_count,"
+ "percent_hitm,"
+ "tot_hitm,lcl_hitm,rmt_hitm,"
+ "tot_recs,"
+ "tot_loads,"
+ "tot_stores,"
+ "stores_l1hit,stores_l1miss,"
+ "ld_fbhit,ld_l1hit,ld_l2hit,"
+ "ld_lclhit,lcl_hitm,"
+ "ld_rmthit,rmt_hitm,"
+ "dram_lcl,dram_rmt";
+ else /* c2c.display == DISPLAY_LLC */
+ output_str = "cl_idx,"
+ "dcacheline,"
+ "dcacheline_node,"
+ "dcacheline_count,"
+ "percent_llchit,"
+ "tot_llchit,"
+ "tot_recs,"
+ "tot_loads,"
+ "tot_stores,"
+ "stores_l1hit,stores_l1miss,"
+ "ld_fbhit,ld_l1hit,ld_l2hit,"
+ "ld_lclhit,lcl_hitm,"
+ "ld_rmthit,rmt_hitm,"
+ "dram_lcl,dram_rmt";
+
+ if (c2c.display == DISPLAY_TOT)
+ sort_str = "tot_hitm";
+ else if (c2c.display == DISPLAY_RMT)
+ sort_str = "rmt_hitm";
+ else if (c2c.display == DISPLAY_LCL)
+ sort_str = "lcl_hitm";
+ else if (c2c.display == DISPLAY_LLC)
+ sort_str = "tot_llchit";
+
+ c2c_hists__reinit(&c2c.hists, output_str, sort_str);

ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");

--
2.17.1

2020-10-15 15:14:26

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load

Em Thu, Oct 15, 2020 at 03:50:33PM +0100, Leo Yan escreveu:
> If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
> rely on HITM display to report cache false sharing. Alternatively, we
> can use the LLC access and multi-threads info to locate the potential
> false sharing's data address, and if we connect with source code and
> analyze the multi-threads' execution timing, if can conclude load and
> store the same cache line at the meantime, thus this can be helpful for
> resolve the cache false sharing issue.
>
> This patch set is to enable the display with sorting on LLC load
> accesses; it adds dimensions for total LLC hit and LLC load accesses,
> and these dimensions are used for shared cache line table and pareto.
>
> This patch set is dependend on the patch set "perf c2c: Refine the
> organization of metrics" [1].
>
> [1] https://lore.kernel.org/patchwork/cover/1321499/

Ok, that one is applied and will appear publicly as soon as it goes thru
my usual set of build tests.

- Arnaldo

> With this patch set, we can get display 'llc' as follows:
>
> # perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio
>
> [...]
>
> =================================================
> Shared Data Cache Line Table
> =================================================
> #
> # ----------- Cacheline ---------- LLC Hit LLC Hit Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
> # Index Address Node PA cnt Pct Total records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
> # ..... .................. .... ...... ....... ........ ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
> #
> 0 0x563b01e83100 0 1401 65.32% 648 7011 3738 3273 2582 691 515 2516 59 143 505 0 0 0 0
> 1 0x563b01e830c0 0 1 26.51% 263 400 400 0 0 0 130 3 4 262 1 0 0 0 0
> 2 0x563b01e83080 0 1 7.76% 77 650 650 0 0 0 180 348 45 14 63 0 0 0 0
> 3 0xffff88c3d74e82c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0
> 4 0xffffa587c11e38c0 N/A 0 0.10% 1 2 1 1 1 0 0 0 0 1 0 0 0 0 0
> 5 0xffffffffbd5e6fc0 0 1 0.10% 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0
> 6 0x7f90a4d6c2c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0
>
> =================================================
> Shared Cache Line Distribution Pareto
> =================================================
> #
> # ---- LLC LD ---- -- Store Refs -- --------- Data address --------- ---------- cycles ---------- Total cpu Shared
> # Num LclHit LclHitm L1 Hit L1 Miss Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load records cnt Symbol Object Source:Line Node
> # ..... ....... ....... ....... ....... .................. .... ...... ....... .................. .................. ........ ........ ........ ....... ........ ................... ................. ........................... ....
> #
> -------------------------------------------------------------
> 0 143 505 2582 691 0x563b01e83100
> -------------------------------------------------------------
> 96.50% 7.72% 46.79% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c16 0 1949 1331 1876 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
> 0.00% 35.05% 0.00% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c1d 0 2651 975 748 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 0.00% 30.89% 0.00% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c1d 0 1425 1003 762 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 2.10% 7.52% 49.19% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c16 0 1585 1053 2037 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
> 0.00% 0.00% 2.52% 44.86% 0x0 0 1 14100 14102:lock_th 0x563b01c81c28 0 0 0 375 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 0.00% 0.00% 1.51% 55.14% 0x0 0 1 14100 14103:lock_th 0x563b01c81c28 0 0 0 420 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 1.40% 12.87% 0.00% 0.00% 0x20 0 1 14100 14104:reader_thd 0x563b01c81c73 0 166 99 417 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
> 0.00% 5.94% 0.00% 0.00% 0x20 0 1 14100 14105:reader_thd 0x563b01c81c73 0 144 85 376 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
>
> [...]
>
>
> Leo Yan (8):
> perf mem: Add structure field c2c_stats::tot_llchit
> perf c2c: Add dimensions for total LLC hit
> perf c2c: Add dimensions for LLC load hit
> perf c2c: Change to general naming for macros
> perf c2c: Rename for shared cache line stats
> perf c2c: Refactor hist entry validation
> perf c2c: Add option '-d llc' for sorting with LLC load
> perf c2c: Update documentation for display option 'llc'
>
> tools/perf/Documentation/perf-c2c.txt | 18 +-
> tools/perf/builtin-c2c.c | 333 +++++++++++++++++++++-----
> tools/perf/util/mem-events.c | 3 +
> tools/perf/util/mem-events.h | 1 +
> 4 files changed, 286 insertions(+), 69 deletions(-)
>
> --
> 2.17.1
>

--

- Arnaldo

2020-10-15 15:17:08

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load

On Thu, Oct 15, 2020 at 12:05:06PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Oct 15, 2020 at 03:50:33PM +0100, Leo Yan escreveu:
> > If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
> > rely on HITM display to report cache false sharing. Alternatively, we
> > can use the LLC access and multi-threads info to locate the potential
> > false sharing's data address, and if we connect with source code and
> > analyze the multi-threads' execution timing, if can conclude load and
> > store the same cache line at the meantime, thus this can be helpful for
> > resolve the cache false sharing issue.
> >
> > This patch set is to enable the display with sorting on LLC load
> > accesses; it adds dimensions for total LLC hit and LLC load accesses,
> > and these dimensions are used for shared cache line table and pareto.
> >
> > This patch set is dependend on the patch set "perf c2c: Refine the
> > organization of metrics" [1].
> >
> > [1] https://lore.kernel.org/patchwork/cover/1321499/
>
> Ok, that one is applied and will appear publicly as soon as it goes thru
> my usual set of build tests.

Thank you, Arnaldo!

Leo

2020-10-15 18:28:57

by Leo Yan

[permalink] [raw]
Subject: [PATCH v1 3/8] perf c2c: Add dimensions for LLC load hit

Add dimensions for LLC load hit and its associated percentage
calculation, which is to be displayed in the single cache line output.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 51 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 5fb77fcd3c9c..3f6271a779c4 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -947,6 +947,7 @@ static double percent_ ## __f(struct c2c_hist_entry *c2c_he) \

PERCENT_FN(rmt_hitm)
PERCENT_FN(lcl_hitm)
+PERCENT_FN(ld_llchit)
PERCENT_FN(st_l1hit)
PERCENT_FN(st_l1miss)

@@ -1012,6 +1013,37 @@ percent_lcl_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
return per_left - per_right;
}

+static int
+percent_llc_hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ int width = c2c_width(fmt, hpp, he->hists);
+ double per = PERCENT(he, ld_llchit);
+ char buf[10];
+
+ return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_llc_hit_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ return percent_color(fmt, hpp, he, percent_ld_llchit);
+}
+
+static int64_t
+percent_llc_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+ struct hist_entry *left, struct hist_entry *right)
+{
+ double per_left;
+ double per_right;
+
+ per_left = PERCENT(left, ld_llchit);
+ per_right = PERCENT(right, ld_llchit);
+
+ return per_left - per_right;
+}
+
static int
percent_stores_l1hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
struct hist_entry *he)
@@ -1377,6 +1409,14 @@ static struct c2c_dimension dim_cl_rmt_hitm = {
.width = 7,
};

+static struct c2c_dimension dim_cl_llc_hit = {
+ .header = HEADER_SPAN("--- LLC Load ---", "LclHit", 1),
+ .name = "cl_llc_hit",
+ .cmp = ld_llchit_cmp,
+ .entry = ld_llchit_entry,
+ .width = 7,
+};
+
static struct c2c_dimension dim_cl_lcl_hitm = {
.header = HEADER_SPAN_LOW("Lcl"),
.name = "cl_lcl_hitm",
@@ -1530,6 +1570,15 @@ static struct c2c_dimension dim_percent_lcl_hitm = {
.width = 7,
};

+static struct c2c_dimension dim_percent_llc_hit = {
+ .header = HEADER_SPAN("---- LLC LD ----", "LclHit", 1),
+ .name = "percent_llc_hit",
+ .cmp = percent_llc_hit_cmp,
+ .entry = percent_llc_hit_entry,
+ .color = percent_llc_hit_color,
+ .width = 7,
+};
+
static struct c2c_dimension dim_percent_stores_l1hit = {
.header = HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
.name = "percent_stores_l1hit",
@@ -1673,6 +1722,7 @@ static struct c2c_dimension *dimensions[] = {
&dim_tot_hitm,
&dim_lcl_hitm,
&dim_rmt_hitm,
+ &dim_cl_llc_hit,
&dim_cl_lcl_hitm,
&dim_cl_rmt_hitm,
&dim_tot_stores,
@@ -1692,6 +1742,7 @@ static struct c2c_dimension *dimensions[] = {
&dim_percent_llchit,
&dim_percent_rmt_hitm,
&dim_percent_lcl_hitm,
+ &dim_percent_llc_hit,
&dim_percent_stores_l1hit,
&dim_percent_stores_l1miss,
&dim_dram_lcl,
--
2.17.1

2020-10-20 09:55:20

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load

On Thu, Oct 15, 2020 at 03:50:40PM +0100, Leo Yan wrote:

SNIP

> @@ -1533,6 +1539,7 @@ static struct c2c_header percent_hitm_header[] = {
> [DISPLAY_LCL] = HEADER_BOTH("Lcl", "Hitm"),
> [DISPLAY_RMT] = HEADER_BOTH("Rmt", "Hitm"),
> [DISPLAY_TOT] = HEADER_BOTH("Tot", "Hitm"),
> + [DISPLAY_LLC] = HEADER_BOTH("LLC", "Hit"),
> };
>
> static struct c2c_dimension dim_percent_hitm = {
> @@ -2002,6 +2009,10 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
> break;
> case DISPLAY_TOT:
> FILTER_ENTRY(tot_hitm);
> + break;
> + case DISPLAY_LLC:
> + FILTER_ENTRY(tot_llchit);
> + break;
> default:
> break;
> }
> @@ -2032,6 +2043,9 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
> case DISPLAY_TOT:
> has_record = !!c2c_he->stats.tot_hitm;
> break;
> + case DISPLAY_LLC:
> + has_record = !!c2c_he->stats.tot_llchit;
> + break;
> default:
> break;
> }

there's one more switch with c2c.display in percent_hitm function,
where you did not add DISPLAY_LLC case.. I guess it should not ever
because we will not use that column in llc display mode, but we
should add at least some warning or that

SNIP

> - "cl_rmt_hitm,"
> - "cl_lcl_hitm,"
> - "cl_stores_l1hit,"
> - "cl_stores_l1miss,"
> - "dcacheline",
> - NULL);
> + ret = hpp_list__parse(&hpp_list, cl_output, NULL);
>
> if (WARN_ONCE(ret, "failed to setup sort entries\n"))
> return;
> @@ -2357,7 +2384,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
> fprintf(out, "%-36s: %s\n", first ? " Events" : "", evsel__name(evsel));
> first = false;
> }
> - fprintf(out, " Cachelines sort on : %s HITMs\n",
> + fprintf(out, " Cachelines sort on : %s\n",
> display_str[c2c.display]);
> fprintf(out, " Cacheline data grouping : %s\n", c2c.cl_sort);
> }
> @@ -2514,7 +2541,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
> {
> scnprintf(bf, size,
> "Shared Data Cache Line Table "
> - "(%lu entries, sorted on %s HITMs)",
> + "(%lu entries, sorted on %s)",
> browser->nr_non_filtered_entries,
> display_str[c2c.display]);
> return 0;
> @@ -2720,6 +2747,8 @@ static int setup_display(const char *str)
> c2c.display = DISPLAY_RMT;
> else if (!strcmp(display, "lcl"))
> c2c.display = DISPLAY_LCL;
> + else if (!strcmp(display, "llc"))
> + c2c.display = DISPLAY_LLC;

please update man page with this

> else {
> pr_err("failed: unknown display type: %s\n", str);
> return -1;
> @@ -2766,9 +2795,10 @@ static int build_cl_output(char *cl_sort, bool no_source)
> }
>
> if (asprintf(&c2c.cl_output,
> - "%s%s%s%s%s%s%s%s%s%s",
> + "%s%s%s%s%s%s%s%s%s%s%s",

why is there extra '%s' when we did not add new argument.. ?

SNIP

> + "ld_fbhit,ld_l1hit,ld_l2hit,"
> + "ld_lclhit,lcl_hitm,"
> + "ld_rmthit,rmt_hitm,"
> + "dram_lcl,dram_rmt";
> + else /* c2c.display == DISPLAY_LLC */
> + output_str = "cl_idx,"
> + "dcacheline,"
> + "dcacheline_node,"
> + "dcacheline_count,"
> + "percent_llchit,"
> + "tot_llchit,"
> + "tot_recs,"
> + "tot_loads,"
> + "tot_stores,"
> + "stores_l1hit,stores_l1miss,"
> + "ld_fbhit,ld_l1hit,ld_l2hit,"
> + "ld_lclhit,lcl_hitm,"
> + "ld_rmthit,rmt_hitm,"
> + "dram_lcl,dram_rmt";
> +
> + if (c2c.display == DISPLAY_TOT)
> + sort_str = "tot_hitm";
> + else if (c2c.display == DISPLAY_RMT)
> + sort_str = "rmt_hitm";
> + else if (c2c.display == DISPLAY_LCL)
> + sort_str = "lcl_hitm";
> + else if (c2c.display == DISPLAY_LLC)
> + sort_str = "tot_llchit";
> +

could you please split addition of output_str/sort_str into separate
patch and then add DISPLAY_LLC support? it'd ease up review

perhaps include also print_pareto changes in it

thanks,
jirka

> + c2c_hists__reinit(&c2c.hists, output_str, sort_str);
>
> ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
>
> --
> 2.17.1
>

2020-10-20 09:55:46

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load

On Thu, Oct 15, 2020 at 03:50:40PM +0100, Leo Yan wrote:
> Except the existed three display options 'tot', 'rmt', 'lcl', this patch
> adds option 'llc' so that can sort on LLC load metrics. The new
> introduced option can work as a choice if the memory event doesn't
> contain HITM tags.
>
> For the display with option 'llc', both the "Shared Data Cache Line
> Table" and "Shared Cache Line Distribution Pareto" have difference
> comparing to other three display options.
>
> For the "Shared Data Cache Line Table", instead of sorting HITM metrics,
> it sorts with the LLC hit metrics "tot_llchit". In this case, users
> should be interested in LLC related statistics, so the dimensions of
> total LLC hit is used to replace HITM related dimensions.
>
> For Pareto, every single cache line shows the metrics "cl_llc_hit"
> instead of "cl_rmt_hitm", and the single cache line view is sorted by
> metrics "tot_llchit".

hi,
I'm getting compilation error on newer gcc:

CC builtin-c2c.o
builtin-c2c.c: In function ‘perf_c2c__report’:
builtin-c2c.c:1979:9: error: ‘sort_str’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
1979 | return hpp_list__parse(&c2c_hists->list, output, sort);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
builtin-c2c.c:2900:27: note: ‘sort_str’ was declared here
2900 | const char *output_str, *sort_str;
| ^~~~~~~~


jirka

2020-10-20 09:56:05

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v1 8/8] perf c2c: Update documentation for display option 'llc'

On Thu, Oct 15, 2020 at 03:50:41PM +0100, Leo Yan wrote:
> Since the new display option 'llc' is introduced, this patch is to
> update the documentation to reflect it.
>
> Signed-off-by: Leo Yan <[email protected]>
> ---
> tools/perf/Documentation/perf-c2c.txt | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
> index c81d72e3eecf..eadce62ecd28 100644
> --- a/tools/perf/Documentation/perf-c2c.txt
> +++ b/tools/perf/Documentation/perf-c2c.txt
> @@ -109,7 +109,8 @@ REPORT OPTIONS
>
> -d::
> --display::
> - Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
> + Switch to HITM type (rmt, lcl) or LLC load access (llc) to display
> + and sort on. Total HITMs as default.

ok, forget the doc update I asked in previous reply ;-)

thanks,
jirka

>
> --stitch-lbr::
> Show callgraph with stitched LBRs, which may have more complete
> @@ -174,12 +175,18 @@ For each cacheline in the 1) list we display following data:
> Cacheline
> - cacheline address (hex number)
>
> - Rmt/Lcl Hitm
> + Rmt/Lcl Hitm (For display with HITM types)
> - cacheline percentage of all Remote/Local HITM accesses
>
> - LLC Load Hitm - Total, LclHitm, RmtHitm
> + LLC Load Hitm - Total, LclHitm, RmtHitm (For display with HITM types)
> - count of Total/Local/Remote load HITMs
>
> + LLC Hit Pct (For display 'llc')
> + - cacheline percentage of all LLC load accesses
> +
> + LLC Hit Total (For display 'llc')
> + - sum of all LLC load accesses
> +
> Total records
> - sum of all cachelines accesses
>
> @@ -207,9 +214,12 @@ For each cacheline in the 1) list we display following data:
>
> For each offset in the 2) list we display following data:
>
> - HITM - Rmt, Lcl
> + HITM - Rmt, Lcl (For display with HITM types)
> - % of Remote/Local HITM accesses for given offset within cacheline
>
> + LLC LD - LclHit, LclHitm (For display 'llc')
> + - % of LLC hits and LLC HITMs accesses for given offset within cacheline
> +
> Store Refs - L1 Hit, L1 Miss
> - % of store accesses that hit/missed L1 for given offset within cacheline
>
> --
> 2.17.1
>

2020-10-20 10:16:31

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load

Hi Jiri,

On Tue, Oct 20, 2020 at 09:25:53AM +0200, Jiri Olsa wrote:
> On Thu, Oct 15, 2020 at 03:50:40PM +0100, Leo Yan wrote:
>
> SNIP
>
> > @@ -1533,6 +1539,7 @@ static struct c2c_header percent_hitm_header[] = {
> > [DISPLAY_LCL] = HEADER_BOTH("Lcl", "Hitm"),
> > [DISPLAY_RMT] = HEADER_BOTH("Rmt", "Hitm"),
> > [DISPLAY_TOT] = HEADER_BOTH("Tot", "Hitm"),
> > + [DISPLAY_LLC] = HEADER_BOTH("LLC", "Hit"),
> > };
> >
> > static struct c2c_dimension dim_percent_hitm = {
> > @@ -2002,6 +2009,10 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
> > break;
> > case DISPLAY_TOT:
> > FILTER_ENTRY(tot_hitm);
> > + break;
> > + case DISPLAY_LLC:
> > + FILTER_ENTRY(tot_llchit);
> > + break;
> > default:
> > break;
> > }
> > @@ -2032,6 +2043,9 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
> > case DISPLAY_TOT:
> > has_record = !!c2c_he->stats.tot_hitm;
> > break;
> > + case DISPLAY_LLC:
> > + has_record = !!c2c_he->stats.tot_llchit;
> > + break;
> > default:
> > break;
> > }
>
> there's one more switch with c2c.display in percent_hitm function,
> where you did not add DISPLAY_LLC case.. I guess it should not ever
> because we will not use that column in llc display mode, but we
> should add at least some warning or that

Exactly, for "DISPLAY_LLC" case, it will not run in the function
percent_hitm(); will add warning for that.

> SNIP
>
> > - "cl_rmt_hitm,"
> > - "cl_lcl_hitm,"
> > - "cl_stores_l1hit,"
> > - "cl_stores_l1miss,"
> > - "dcacheline",
> > - NULL);
> > + ret = hpp_list__parse(&hpp_list, cl_output, NULL);
> >
> > if (WARN_ONCE(ret, "failed to setup sort entries\n"))
> > return;
> > @@ -2357,7 +2384,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
> > fprintf(out, "%-36s: %s\n", first ? " Events" : "", evsel__name(evsel));
> > first = false;
> > }
> > - fprintf(out, " Cachelines sort on : %s HITMs\n",
> > + fprintf(out, " Cachelines sort on : %s\n",
> > display_str[c2c.display]);
> > fprintf(out, " Cacheline data grouping : %s\n", c2c.cl_sort);
> > }
> > @@ -2514,7 +2541,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
> > {
> > scnprintf(bf, size,
> > "Shared Data Cache Line Table "
> > - "(%lu entries, sorted on %s HITMs)",
> > + "(%lu entries, sorted on %s)",
> > browser->nr_non_filtered_entries,
> > display_str[c2c.display]);
> > return 0;
> > @@ -2720,6 +2747,8 @@ static int setup_display(const char *str)
> > c2c.display = DISPLAY_RMT;
> > else if (!strcmp(display, "lcl"))
> > c2c.display = DISPLAY_LCL;
> > + else if (!strcmp(display, "llc"))
> > + c2c.display = DISPLAY_LLC;
>
> please update man page with this
>
> > else {
> > pr_err("failed: unknown display type: %s\n", str);
> > return -1;
> > @@ -2766,9 +2795,10 @@ static int build_cl_output(char *cl_sort, bool no_source)
> > }
> >
> > if (asprintf(&c2c.cl_output,
> > - "%s%s%s%s%s%s%s%s%s%s",
> > + "%s%s%s%s%s%s%s%s%s%s%s",
>
> why is there extra '%s' when we did not add new argument.. ?

This is deliberate. The change is as below:

if (asprintf(&c2c.cl_output,
- "%s%s%s%s%s%s%s%s%s%s",
+ "%s%s%s%s%s%s%s%s%s%s%s",
c2c.use_stdio ? "cl_num_empty," : "",
- "percent_rmt_hitm,"
+ c2c.display == DISPLAY_LLC ? "percent_llc_hit," :
+ "percent_rmt_hitm,",
"percent_lcl_hitm,"

In the old code the string "percent_rmt_hitm," is merged with later
lines (the second string is "percent_lcl_hitm,") into single string.

In this patch, it needs to check condition c2c.display and pass
different string ("percent_llc_hit," vs "percent_rmt_hitm,"), thus
the string ("percent_llc_hit," or "percent_rmt_hitm,") is passed
independently, it's _NOT_ jointed with sequnetial lines.

> SNIP
>
> > + "ld_fbhit,ld_l1hit,ld_l2hit,"
> > + "ld_lclhit,lcl_hitm,"
> > + "ld_rmthit,rmt_hitm,"
> > + "dram_lcl,dram_rmt";
> > + else /* c2c.display == DISPLAY_LLC */
> > + output_str = "cl_idx,"
> > + "dcacheline,"
> > + "dcacheline_node,"
> > + "dcacheline_count,"
> > + "percent_llchit,"
> > + "tot_llchit,"
> > + "tot_recs,"
> > + "tot_loads,"
> > + "tot_stores,"
> > + "stores_l1hit,stores_l1miss,"
> > + "ld_fbhit,ld_l1hit,ld_l2hit,"
> > + "ld_lclhit,lcl_hitm,"
> > + "ld_rmthit,rmt_hitm,"
> > + "dram_lcl,dram_rmt";
> > +
> > + if (c2c.display == DISPLAY_TOT)
> > + sort_str = "tot_hitm";
> > + else if (c2c.display == DISPLAY_RMT)
> > + sort_str = "rmt_hitm";
> > + else if (c2c.display == DISPLAY_LCL)
> > + sort_str = "lcl_hitm";
> > + else if (c2c.display == DISPLAY_LLC)
> > + sort_str = "tot_llchit";
> > +
>
> could you please split addition of output_str/sort_str into separate
> patch and then add DISPLAY_LLC support? it'd ease up review
>
> perhaps include also print_pareto changes in it

Will do.

Thanks a lot for reviewing,
Leo

2020-10-20 10:19:33

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load

On Tue, Oct 20, 2020 at 09:26:03AM +0200, Jiri Olsa wrote:
> On Thu, Oct 15, 2020 at 03:50:40PM +0100, Leo Yan wrote:
> > Except the existed three display options 'tot', 'rmt', 'lcl', this patch
> > adds option 'llc' so that can sort on LLC load metrics. The new
> > introduced option can work as a choice if the memory event doesn't
> > contain HITM tags.
> >
> > For the display with option 'llc', both the "Shared Data Cache Line
> > Table" and "Shared Cache Line Distribution Pareto" have difference
> > comparing to other three display options.
> >
> > For the "Shared Data Cache Line Table", instead of sorting HITM metrics,
> > it sorts with the LLC hit metrics "tot_llchit". In this case, users
> > should be interested in LLC related statistics, so the dimensions of
> > total LLC hit is used to replace HITM related dimensions.
> >
> > For Pareto, every single cache line shows the metrics "cl_llc_hit"
> > instead of "cl_rmt_hitm", and the single cache line view is sorted by
> > metrics "tot_llchit".
>
> hi,
> I'm getting compilation error on newer gcc:
>
> CC builtin-c2c.o
> builtin-c2c.c: In function ‘perf_c2c__report’:
> builtin-c2c.c:1979:9: error: ‘sort_str’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
> 1979 | return hpp_list__parse(&c2c_hists->list, output, sort);
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> builtin-c2c.c:2900:27: note: ‘sort_str’ was declared here
> 2900 | const char *output_str, *sort_str;
> | ^~~~~~~~

Sorry for this building error. I didn't detect this error on both
Arm64 and x86; will fix it.

Thanks,
Leo

2020-10-20 10:23:41

by Leo Yan

[permalink] [raw]
Subject: Re: [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load

On Tue, Oct 20, 2020 at 05:13:01PM +0900, Namhyung Kim wrote:
> Hello,
>
> On Thu, Oct 15, 2020 at 11:51 PM Leo Yan <[email protected]> wrote:
> >
> > If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
> > rely on HITM display to report cache false sharing. Alternatively, we
> > can use the LLC access and multi-threads info to locate the potential
> > false sharing's data address, and if we connect with source code and
> > analyze the multi-threads' execution timing, if can conclude load and
> > store the same cache line at the meantime, thus this can be helpful for
> > resolve the cache false sharing issue.
> >
> > This patch set is to enable the display with sorting on LLC load
> > accesses; it adds dimensions for total LLC hit and LLC load accesses,
> > and these dimensions are used for shared cache line table and pareto.
> >
> > This patch set is dependend on the patch set "perf c2c: Refine the
> > organization of metrics" [1].
> >
> > [1] https://lore.kernel.org/patchwork/cover/1321499/
> >
> > With this patch set, we can get display 'llc' as follows:
> >
> > # perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio
>
> I'm not sure if you ran the test on x86 or ARM.
> IIUC ARM should have 0 local hitm, right?

Yes, on Arm64 the local HITM and remote HITM both are zeros. Below is
the testing result on x86.

Thanks,
Leo

> > [...]
> >
> > =================================================
> > Shared Data Cache Line Table
> > =================================================
> > #
> > # ----------- Cacheline ---------- LLC Hit LLC Hit Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
> > # Index Address Node PA cnt Pct Total records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
> > # ..... .................. .... ...... ....... ........ ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
> > #
> > 0 0x563b01e83100 0 1401 65.32% 648 7011 3738 3273 2582 691 515 2516 59 143 505 0 0 0 0
> > 1 0x563b01e830c0 0 1 26.51% 263 400 400 0 0 0 130 3 4 262 1 0 0 0 0
> > 2 0x563b01e83080 0 1 7.76% 77 650 650 0 0 0 180 348 45 14 63 0 0 0 0
> > 3 0xffff88c3d74e82c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0
> > 4 0xffffa587c11e38c0 N/A 0 0.10% 1 2 1 1 1 0 0 0 0 1 0 0 0 0 0
> > 5 0xffffffffbd5e6fc0 0 1 0.10% 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0
> > 6 0x7f90a4d6c2c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0
> >
> > =================================================
> > Shared Cache Line Distribution Pareto
> > =================================================
> > #
> > # ---- LLC LD ---- -- Store Refs -- --------- Data address --------- ---------- cycles ---------- Total cpu Shared
> > # Num LclHit LclHitm L1 Hit L1 Miss Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load records cnt Symbol Object Source:Line Node
> > # ..... ....... ....... ....... ....... .................. .... ...... ....... .................. .................. ........ ........ ........ ....... ........ ................... ................. ........................... ....
> > #
> > -------------------------------------------------------------
> > 0 143 505 2582 691 0x563b01e83100
> > -------------------------------------------------------------
> > 96.50% 7.72% 46.79% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c16 0 1949 1331 1876 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
> > 0.00% 35.05% 0.00% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c1d 0 2651 975 748 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> > 0.00% 30.89% 0.00% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c1d 0 1425 1003 762 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> > 2.10% 7.52% 49.19% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c16 0 1585 1053 2037 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
> > 0.00% 0.00% 2.52% 44.86% 0x0 0 1 14100 14102:lock_th 0x563b01c81c28 0 0 0 375 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> > 0.00% 0.00% 1.51% 55.14% 0x0 0 1 14100 14103:lock_th 0x563b01c81c28 0 0 0 420 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> > 1.40% 12.87% 0.00% 0.00% 0x20 0 1 14100 14104:reader_thd 0x563b01c81c73 0 166 99 417 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
> > 0.00% 5.94% 0.00% 0.00% 0x20 0 1 14100 14105:reader_thd 0x563b01c81c73 0 144 85 376 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
> >
> > [...]
> >
> >
> > Leo Yan (8):
> > perf mem: Add structure field c2c_stats::tot_llchit
> > perf c2c: Add dimensions for total LLC hit
> > perf c2c: Add dimensions for LLC load hit
> > perf c2c: Change to general naming for macros
> > perf c2c: Rename for shared cache line stats
> > perf c2c: Refactor hist entry validation
> > perf c2c: Add option '-d llc' for sorting with LLC load
> > perf c2c: Update documentation for display option 'llc'
> >
> > tools/perf/Documentation/perf-c2c.txt | 18 +-
> > tools/perf/builtin-c2c.c | 333 +++++++++++++++++++++-----
> > tools/perf/util/mem-events.c | 3 +
> > tools/perf/util/mem-events.h | 1 +
> > 4 files changed, 286 insertions(+), 69 deletions(-)
> >
> > --
> > 2.17.1
> >

2020-10-20 20:02:24

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load

Hello,

On Thu, Oct 15, 2020 at 11:51 PM Leo Yan <[email protected]> wrote:
>
> If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
> rely on HITM display to report cache false sharing. Alternatively, we
> can use the LLC access and multi-threads info to locate the potential
> false sharing's data address, and if we connect with source code and
> analyze the multi-threads' execution timing, if can conclude load and
> store the same cache line at the meantime, thus this can be helpful for
> resolve the cache false sharing issue.
>
> This patch set is to enable the display with sorting on LLC load
> accesses; it adds dimensions for total LLC hit and LLC load accesses,
> and these dimensions are used for shared cache line table and pareto.
>
> This patch set is dependend on the patch set "perf c2c: Refine the
> organization of metrics" [1].
>
> [1] https://lore.kernel.org/patchwork/cover/1321499/
>
> With this patch set, we can get display 'llc' as follows:
>
> # perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio

I'm not sure if you ran the test on x86 or ARM.
IIUC ARM should have 0 local hitm, right?

Thanks
Namhyung

>
> [...]
>
> =================================================
> Shared Data Cache Line Table
> =================================================
> #
> # ----------- Cacheline ---------- LLC Hit LLC Hit Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
> # Index Address Node PA cnt Pct Total records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
> # ..... .................. .... ...... ....... ........ ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
> #
> 0 0x563b01e83100 0 1401 65.32% 648 7011 3738 3273 2582 691 515 2516 59 143 505 0 0 0 0
> 1 0x563b01e830c0 0 1 26.51% 263 400 400 0 0 0 130 3 4 262 1 0 0 0 0
> 2 0x563b01e83080 0 1 7.76% 77 650 650 0 0 0 180 348 45 14 63 0 0 0 0
> 3 0xffff88c3d74e82c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0
> 4 0xffffa587c11e38c0 N/A 0 0.10% 1 2 1 1 1 0 0 0 0 1 0 0 0 0 0
> 5 0xffffffffbd5e6fc0 0 1 0.10% 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0
> 6 0x7f90a4d6c2c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0
>
> =================================================
> Shared Cache Line Distribution Pareto
> =================================================
> #
> # ---- LLC LD ---- -- Store Refs -- --------- Data address --------- ---------- cycles ---------- Total cpu Shared
> # Num LclHit LclHitm L1 Hit L1 Miss Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load records cnt Symbol Object Source:Line Node
> # ..... ....... ....... ....... ....... .................. .... ...... ....... .................. .................. ........ ........ ........ ....... ........ ................... ................. ........................... ....
> #
> -------------------------------------------------------------
> 0 143 505 2582 691 0x563b01e83100
> -------------------------------------------------------------
> 96.50% 7.72% 46.79% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c16 0 1949 1331 1876 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
> 0.00% 35.05% 0.00% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c1d 0 2651 975 748 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 0.00% 30.89% 0.00% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c1d 0 1425 1003 762 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 2.10% 7.52% 49.19% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c16 0 1585 1053 2037 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
> 0.00% 0.00% 2.52% 44.86% 0x0 0 1 14100 14102:lock_th 0x563b01c81c28 0 0 0 375 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 0.00% 0.00% 1.51% 55.14% 0x0 0 1 14100 14103:lock_th 0x563b01c81c28 0 0 0 420 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 1.40% 12.87% 0.00% 0.00% 0x20 0 1 14100 14104:reader_thd 0x563b01c81c73 0 166 99 417 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
> 0.00% 5.94% 0.00% 0.00% 0x20 0 1 14100 14105:reader_thd 0x563b01c81c73 0 144 85 376 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
>
> [...]
>
>
> Leo Yan (8):
> perf mem: Add structure field c2c_stats::tot_llchit
> perf c2c: Add dimensions for total LLC hit
> perf c2c: Add dimensions for LLC load hit
> perf c2c: Change to general naming for macros
> perf c2c: Rename for shared cache line stats
> perf c2c: Refactor hist entry validation
> perf c2c: Add option '-d llc' for sorting with LLC load
> perf c2c: Update documentation for display option 'llc'
>
> tools/perf/Documentation/perf-c2c.txt | 18 +-
> tools/perf/builtin-c2c.c | 333 +++++++++++++++++++++-----
> tools/perf/util/mem-events.c | 3 +
> tools/perf/util/mem-events.h | 1 +
> 4 files changed, 286 insertions(+), 69 deletions(-)
>
> --
> 2.17.1
>

2020-10-22 09:18:48

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load

On Tue, Oct 20, 2020 at 04:08:39PM +0800, Leo Yan wrote:

SNIP

> >
> > please update man page with this
> >
> > > else {
> > > pr_err("failed: unknown display type: %s\n", str);
> > > return -1;
> > > @@ -2766,9 +2795,10 @@ static int build_cl_output(char *cl_sort, bool no_source)
> > > }
> > >
> > > if (asprintf(&c2c.cl_output,
> > > - "%s%s%s%s%s%s%s%s%s%s",
> > > + "%s%s%s%s%s%s%s%s%s%s%s",
> >
> > why is there extra '%s' when we did not add new argument.. ?
>
> This is deliberate. The change is as below:
>
> if (asprintf(&c2c.cl_output,
> - "%s%s%s%s%s%s%s%s%s%s",
> + "%s%s%s%s%s%s%s%s%s%s%s",
> c2c.use_stdio ? "cl_num_empty," : "",
> - "percent_rmt_hitm,"
> + c2c.display == DISPLAY_LLC ? "percent_llc_hit," :
> + "percent_rmt_hitm,",
> "percent_lcl_hitm,"
>
> In the old code the string "percent_rmt_hitm," is merged with later
> lines (the second string is "percent_lcl_hitm,") into single string.
>
> In this patch, it needs to check condition c2c.display and pass
> different string ("percent_llc_hit," vs "percent_rmt_hitm,"), thus
> the string ("percent_llc_hit," or "percent_rmt_hitm,") is passed
> independently, it's _NOT_ jointed with sequnetial lines.

ah right, it's now 2 arguments instead of one

thanks,
jirka