2021-01-14 15:48:34

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 0/6] perf c2c: Code refactoring

This patch series is for several minor code refactoring, which is
extracted from the patch series "perf c2c: Sort cacheline with all
loads" [1].

There has a known issue for Arm SPE store operations and Arm SPE is
the only consumer for soring with all loads, this is the reason in this
series drops the changes for dimensions and sorting, and only extracts
the patches related with code refactoring. So this series doesn't
introduce any functionality change.

The patches have been tested on x86_64 and compared the result before
and after applying the patches, and confirmed no difference for the
output result.

Changes from v3:
* Refined patch 03/06 to remove unnecessary parentheses and test and
return early in the function filter_display() (Joe Perches);
* Added new patch 04/06 to make argument type as u32 for percent().

Changes from v2:
* Changed to use static functions to replace macros (Namhyung);
* Added Jiri's Ack tags in the unchanged patches;
* Minor improvement in the commit logs.

[1] https://lore.kernel.org/patchwork/cover/1353064/


Leo Yan (6):
perf c2c: Rename for shared cache line stats
perf c2c: Refactor hist entry validation
perf c2c: Refactor display filter
perf c2c: Fix argument type for percent()
perf c2c: Refactor node display
perf c2c: Add local variables for output metrics

tools/perf/builtin-c2c.c | 168 +++++++++++++++++++++++----------------
1 file changed, 99 insertions(+), 69 deletions(-)

--
2.25.1


2021-01-14 15:49:35

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 1/6] perf c2c: Rename for shared cache line stats

For shared cache line statistics, it relies on HITM. We can use more
general naming rather than only binding to HITM, so replace "hitm_stats"
with "shared_clines_stats" in structure perf_c2c, and rename function
resort_hitm_cb() to resort_shared_cl_cb().

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-c2c.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index c5babeaa3b38..2d0c71300dbf 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -97,8 +97,8 @@ struct perf_c2c {
bool symbol_full;
bool stitch_lbr;

- /* HITM shared clines stats */
- struct c2c_stats hitm_stats;
+ /* Shared cache line stats */
+ struct c2c_stats shared_clines_stats;
int shared_clines;

int display;
@@ -1961,7 +1961,7 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
{
struct c2c_hist_entry *c2c_he;
struct c2c_hists *c2c_hists;
- bool display = he__display(he, &c2c.hitm_stats);
+ bool display = he__display(he, &c2c.shared_clines_stats);

c2c_he = container_of(he, struct c2c_hist_entry, he);
c2c_hists = c2c_he->hists;
@@ -2048,14 +2048,14 @@ static int setup_nodes(struct perf_session *session)

#define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)

-static int resort_hitm_cb(struct hist_entry *he, void *arg __maybe_unused)
+static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
{
struct c2c_hist_entry *c2c_he;
c2c_he = container_of(he, struct c2c_hist_entry, he);

if (HAS_HITMS(c2c_he)) {
c2c.shared_clines++;
- c2c_add_stats(&c2c.hitm_stats, &c2c_he->stats);
+ c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
}

return 0;
@@ -2126,7 +2126,7 @@ static void print_c2c__display_stats(FILE *out)

static void print_shared_cacheline_info(FILE *out)
{
- struct c2c_stats *stats = &c2c.hitm_stats;
+ struct c2c_stats *stats = &c2c.shared_clines_stats;
int hitm_cnt = stats->lcl_hitm + stats->rmt_hitm;

fprintf(out, "=================================================\n");
@@ -2827,7 +2827,7 @@ static int perf_c2c__report(int argc, const char **argv)
ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");

hists__collapse_resort(&c2c.hists.hists, NULL);
- hists__output_resort_cb(&c2c.hists.hists, &prog, resort_hitm_cb);
+ hists__output_resort_cb(&c2c.hists.hists, &prog, resort_shared_cl_cb);
hists__iterate_cb(&c2c.hists.hists, resort_cl_cb);

ui_progress__finish();
--
2.25.1

2021-01-14 15:49:43

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 2/6] perf c2c: Refactor hist entry validation

This patch has no any functionality changes but refactors hist entry
validation for cache line resorting.

It renames function "valid_hitm_or_store()" to "is_valid_hist_entry()",
changes return type from integer type to bool type. In the function,
it uses switch-case instead of ternary operators, which is easier
to extend for more display types.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-c2c.c | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 2d0c71300dbf..bc2ee84298ff 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1888,16 +1888,32 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
return he->filtered == 0;
}

-static inline int valid_hitm_or_store(struct hist_entry *he)
+static inline bool is_valid_hist_entry(struct hist_entry *he)
{
struct c2c_hist_entry *c2c_he;
- bool has_hitm;
+ bool has_record = false;

c2c_he = container_of(he, struct c2c_hist_entry, he);
- has_hitm = c2c.display == DISPLAY_TOT ? c2c_he->stats.tot_hitm :
- c2c.display == DISPLAY_LCL ? c2c_he->stats.lcl_hitm :
- c2c_he->stats.rmt_hitm;
- return has_hitm || c2c_he->stats.store;
+
+ /* It's a valid entry if contains stores */
+ if (c2c_he->stats.store)
+ return true;
+
+ switch (c2c.display) {
+ case DISPLAY_LCL:
+ has_record = !!c2c_he->stats.lcl_hitm;
+ break;
+ case DISPLAY_RMT:
+ has_record = !!c2c_he->stats.rmt_hitm;
+ break;
+ case DISPLAY_TOT:
+ has_record = !!c2c_he->stats.tot_hitm;
+ break;
+ default:
+ break;
+ }
+
+ return has_record;
}

static void set_node_width(struct c2c_hist_entry *c2c_he, int len)
@@ -1951,7 +1967,7 @@ static int filter_cb(struct hist_entry *he, void *arg __maybe_unused)

calc_width(c2c_he);

- if (!valid_hitm_or_store(he))
+ if (!is_valid_hist_entry(he))
he->filtered = HIST_FILTER__C2C;

return 0;
--
2.25.1

2021-01-14 15:50:09

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 3/6] perf c2c: Refactor display filter

When sort on the respective metrics (lcl_hitm, rmt_hitm, tot_hitm),
macro FILTER_HITM is to filter out the cache line entries if its
overhead is less than 1%.

This patch introduces static function filter_display() to replace macro;
and refines its parameter with more flexbile way, rather than passing
field name, it changes to pass the cache line's statistic value and the
sum value.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index bc2ee84298ff..7aaccea00883 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1851,40 +1851,40 @@ static int c2c_hists__reinit(struct c2c_hists *c2c_hists,

#define DISPLAY_LINE_LIMIT 0.001

+static u8 filter_display(u32 val, u32 sum)
+{
+ if (sum == 0 || ((double)val / sum) < DISPLAY_LINE_LIMIT)
+ return HIST_FILTER__C2C;
+
+ return 0;
+}
+
static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
{
struct c2c_hist_entry *c2c_he;
- double ld_dist;

if (c2c.show_all)
return true;

c2c_he = container_of(he, struct c2c_hist_entry, he);

-#define FILTER_HITM(__h) \
- if (stats->__h) { \
- ld_dist = ((double)c2c_he->stats.__h / stats->__h); \
- if (ld_dist < DISPLAY_LINE_LIMIT) \
- he->filtered = HIST_FILTER__C2C; \
- } else { \
- he->filtered = HIST_FILTER__C2C; \
- }
-
switch (c2c.display) {
case DISPLAY_LCL:
- FILTER_HITM(lcl_hitm);
+ he->filtered = filter_display(c2c_he->stats.lcl_hitm,
+ stats->lcl_hitm);
break;
case DISPLAY_RMT:
- FILTER_HITM(rmt_hitm);
+ he->filtered = filter_display(c2c_he->stats.rmt_hitm,
+ stats->rmt_hitm);
break;
case DISPLAY_TOT:
- FILTER_HITM(tot_hitm);
+ he->filtered = filter_display(c2c_he->stats.tot_hitm,
+ stats->tot_hitm);
+ break;
default:
break;
}

-#undef FILTER_HITM
-
return he->filtered == 0;
}

--
2.25.1

2021-01-14 15:50:17

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 4/6] perf c2c: Fix argument type for percent()

For percent(), its arguments are defined as integer type; this is not
consistent with its consumers which pass arguments with u32 type.

Thus this patch makes argument type as u32 for percent().

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 7aaccea00883..7702f9599162 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -876,7 +876,7 @@ static struct c2c_stats *total_stats(struct hist_entry *he)
return &hists->stats;
}

-static double percent(int st, int tot)
+static double percent(u32 st, u32 tot)
{
return tot ? 100. * (double) st / (double) tot : 0;
}
--
2.25.1

2021-01-14 15:50:32

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 6/6] perf c2c: Add local variables for output metrics

This patch adds several local variables:

"cl_output": pointer for outputting single cache line metrics;
"output_str": pointer for outputting cache line metrics;
"sort_str": pointer to the sorting metrics.

This can improve readability for the code and it's more flexible when
later extend to different strings for the output metrics.

Signed-off-by: Leo Yan <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
---
tools/perf/builtin-c2c.c | 59 ++++++++++++++++++++++------------------
1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 62213bef7b98..d247f9878948 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2199,16 +2199,17 @@ static void print_pareto(FILE *out)
struct perf_hpp_list hpp_list;
struct rb_node *nd;
int ret;
+ const char *cl_output;
+
+ cl_output = "cl_num,"
+ "cl_rmt_hitm,"
+ "cl_lcl_hitm,"
+ "cl_stores_l1hit,"
+ "cl_stores_l1miss,"
+ "dcacheline";

perf_hpp_list__init(&hpp_list);
- ret = hpp_list__parse(&hpp_list,
- "cl_num,"
- "cl_rmt_hitm,"
- "cl_lcl_hitm,"
- "cl_stores_l1hit,"
- "cl_stores_l1miss,"
- "dcacheline",
- NULL);
+ ret = hpp_list__parse(&hpp_list, cl_output, NULL);

if (WARN_ONCE(ret, "failed to setup sort entries\n"))
return;
@@ -2752,6 +2753,7 @@ static int perf_c2c__report(int argc, const char **argv)
OPT_END()
};
int err = 0;
+ const char *output_str, *sort_str = NULL;

argc = parse_options(argc, argv, options, report_c2c_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
@@ -2828,24 +2830,29 @@ static int perf_c2c__report(int argc, const char **argv)
goto out_mem2node;
}

- c2c_hists__reinit(&c2c.hists,
- "cl_idx,"
- "dcacheline,"
- "dcacheline_node,"
- "dcacheline_count,"
- "percent_hitm,"
- "tot_hitm,lcl_hitm,rmt_hitm,"
- "tot_recs,"
- "tot_loads,"
- "tot_stores,"
- "stores_l1hit,stores_l1miss,"
- "ld_fbhit,ld_l1hit,ld_l2hit,"
- "ld_lclhit,lcl_hitm,"
- "ld_rmthit,rmt_hitm,"
- "dram_lcl,dram_rmt",
- c2c.display == DISPLAY_TOT ? "tot_hitm" :
- c2c.display == DISPLAY_LCL ? "lcl_hitm" : "rmt_hitm"
- );
+ output_str = "cl_idx,"
+ "dcacheline,"
+ "dcacheline_node,"
+ "dcacheline_count,"
+ "percent_hitm,"
+ "tot_hitm,lcl_hitm,rmt_hitm,"
+ "tot_recs,"
+ "tot_loads,"
+ "tot_stores,"
+ "stores_l1hit,stores_l1miss,"
+ "ld_fbhit,ld_l1hit,ld_l2hit,"
+ "ld_lclhit,lcl_hitm,"
+ "ld_rmthit,rmt_hitm,"
+ "dram_lcl,dram_rmt";
+
+ if (c2c.display == DISPLAY_TOT)
+ sort_str = "tot_hitm";
+ else if (c2c.display == DISPLAY_RMT)
+ sort_str = "rmt_hitm";
+ else if (c2c.display == DISPLAY_LCL)
+ sort_str = "lcl_hitm";
+
+ c2c_hists__reinit(&c2c.hists, output_str, sort_str);

ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");

--
2.25.1

2021-01-14 15:50:34

by Leo Yan

[permalink] [raw]
Subject: [PATCH v4 5/6] perf c2c: Refactor node display

The macro DISPLAY_HITM() is used to calculate HITM percentage introduced
by every node and it's shown for the node info.

This patch introduces the static function display_metrics() to replace
the macro, and the parameters are refined for passing the metric's
statistic value and the sum value.

Signed-off-by: Leo Yan <[email protected]>
---
tools/perf/builtin-c2c.c | 33 ++++++++++++++++++++-------------
1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 7702f9599162..62213bef7b98 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1048,6 +1048,19 @@ empty_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
return 0;
}

+static int display_metrics(struct perf_hpp *hpp, u32 val, u32 sum)
+{
+ int ret;
+
+ if (sum != 0)
+ ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ",
+ percent(val, sum));
+ else
+ ret = scnprintf(hpp->buf, hpp->size, "%6s ", "n/a");
+
+ return ret;
+}
+
static int
node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
struct hist_entry *he)
@@ -1091,29 +1104,23 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
ret = scnprintf(hpp->buf, hpp->size, "%2d{%2d ", node, num);
advance_hpp(hpp, ret);

- #define DISPLAY_HITM(__h) \
- if (c2c_he->stats.__h> 0) { \
- ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ", \
- percent(stats->__h, c2c_he->stats.__h));\
- } else { \
- ret = scnprintf(hpp->buf, hpp->size, "%6s ", "n/a"); \
- }
-
switch (c2c.display) {
case DISPLAY_RMT:
- DISPLAY_HITM(rmt_hitm);
+ ret = display_metrics(hpp, stats->rmt_hitm,
+ c2c_he->stats.rmt_hitm);
break;
case DISPLAY_LCL:
- DISPLAY_HITM(lcl_hitm);
+ ret = display_metrics(hpp, stats->lcl_hitm,
+ c2c_he->stats.lcl_hitm);
break;
case DISPLAY_TOT:
- DISPLAY_HITM(tot_hitm);
+ ret = display_metrics(hpp, stats->tot_hitm,
+ c2c_he->stats.tot_hitm);
+ break;
default:
break;
}

- #undef DISPLAY_HITM
-
advance_hpp(hpp, ret);

if (c2c_he->stats.store > 0) {
--
2.25.1

2021-01-15 08:38:46

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v4 0/6] perf c2c: Code refactoring

Hello,

On Fri, Jan 15, 2021 at 12:46 AM Leo Yan <[email protected]> wrote:
>
> This patch series is for several minor code refactoring, which is
> extracted from the patch series "perf c2c: Sort cacheline with all
> loads" [1].
>
> There has a known issue for Arm SPE store operations and Arm SPE is
> the only consumer for soring with all loads, this is the reason in this
> series drops the changes for dimensions and sorting, and only extracts
> the patches related with code refactoring. So this series doesn't
> introduce any functionality change.
>
> The patches have been tested on x86_64 and compared the result before
> and after applying the patches, and confirmed no difference for the
> output result.
>
> Changes from v3:
> * Refined patch 03/06 to remove unnecessary parentheses and test and
> return early in the function filter_display() (Joe Perches);
> * Added new patch 04/06 to make argument type as u32 for percent().
>
> Changes from v2:
> * Changed to use static functions to replace macros (Namhyung);
> * Added Jiri's Ack tags in the unchanged patches;
> * Minor improvement in the commit logs.
>
> [1] https://lore.kernel.org/patchwork/cover/1353064/
>
>
> Leo Yan (6):
> perf c2c: Rename for shared cache line stats
> perf c2c: Refactor hist entry validation
> perf c2c: Refactor display filter
> perf c2c: Fix argument type for percent()
> perf c2c: Refactor node display
> perf c2c: Add local variables for output metrics

Acked-by: Namhyung Kim <[email protected]>

Thanks,
Namhyung

2021-01-15 08:50:10

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v4 0/6] perf c2c: Code refactoring

On Thu, Jan 14, 2021 at 11:46:40PM +0800, Leo Yan wrote:
> This patch series is for several minor code refactoring, which is
> extracted from the patch series "perf c2c: Sort cacheline with all
> loads" [1].
>
> There has a known issue for Arm SPE store operations and Arm SPE is
> the only consumer for soring with all loads, this is the reason in this
> series drops the changes for dimensions and sorting, and only extracts
> the patches related with code refactoring. So this series doesn't
> introduce any functionality change.
>
> The patches have been tested on x86_64 and compared the result before
> and after applying the patches, and confirmed no difference for the
> output result.
>
> Changes from v3:
> * Refined patch 03/06 to remove unnecessary parentheses and test and
> return early in the function filter_display() (Joe Perches);
> * Added new patch 04/06 to make argument type as u32 for percent().

Acked-by: Jiri Olsa <[email protected]>

thanks,
jirka

>
> Changes from v2:
> * Changed to use static functions to replace macros (Namhyung);
> * Added Jiri's Ack tags in the unchanged patches;
> * Minor improvement in the commit logs.
>
> [1] https://lore.kernel.org/patchwork/cover/1353064/
>
>
> Leo Yan (6):
> perf c2c: Rename for shared cache line stats
> perf c2c: Refactor hist entry validation
> perf c2c: Refactor display filter
> perf c2c: Fix argument type for percent()
> perf c2c: Refactor node display
> perf c2c: Add local variables for output metrics
>
> tools/perf/builtin-c2c.c | 168 +++++++++++++++++++++++----------------
> 1 file changed, 99 insertions(+), 69 deletions(-)
>
> --
> 2.25.1
>