2020-03-13 07:12:26

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 00/14] perf: Stream comparison

Sometimes, a small change in a hot function reducing the cycles of
this function, but the overall workload doesn't get faster. It is
interesting where the cycles are moved to.

What it would like is to diff before/after streams. A stream we think
is a callchain which is aggregated by the branch records from samples.

By browsing the hot streams, we can understand the hot code flow.
By comparing the cycles variation of same streams between old perf
data and new perf data, we can understand if the cycles are moved to
the unchanged code.

The before stream is the stream before source code changed
(e.g. streams in perf.data.old). The after stream is the stream
after source code changed (e.g. streams in perf.data).

Diffing before/after streams compares all streams (or compares top
N hot streams) between two perf data files.

If all entries of one stream in perf.data.old are fully matched with
all entries of another stream in perf.data, we think these two streams
are matched otherwise the streams are not matched.

For example,

cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
-------------------------- --------------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44

It looks that two streams are matched and we can see for the same
streams the cycles are equal and the callchain hit percents are
slightly changed. That's expected in the normal range.

But that's not always true if source code is changed in perf.data
(e.g. source line div.c:39 is changed). If the source line is changed,
they are different streams, we can't compare them. We will think the
stream in perf.data is a new stream.

The challenge is how to identify the changed source lines. The basic
idea is to use linux command "diff" to compare the source file A and
source file A* line by line (assume A is used in perf.data.old
and A* is updated in perf.data). According to "diff" output, we can
create a source line mapping table.

For example,

Execute 'diff ./before/div.c ./after/div.c'

25c25
< i = rand() % 2;
---
> i = rand() % 4;
39c39
< for (i = 0; i < 2000000000; i++) {
---
> for (i = 0; i < 20000000001; i++) {

div.c (after -> before) lines mapping:
0 -> 0
1 -> 1
2 -> 2
3 -> 3
4 -> 4
5 -> 5
6 -> 6
7 -> 7
8 -> 8
9 -> 9
...
24 -> 24
25 -> -1
26 -> 26
27 -> 27
28 -> 28
29 -> 29
30 -> 30
31 -> 31
32 -> 32
33 -> 33
34 -> 34
35 -> 35
36 -> 36
37 -> 37
38 -> 38
39 -> -1
40 -> 40
...

From the table, we can easily know div.c:39 is source line changed.
(mapped to -1). So these two streams are not matched.

Besides the hot streams comparison, this patch series also support
the top N hottest blocks comparison.

It's also useful to figure out the top N hottest blocks from old perf
data file and figure out the top N hottest blocks from new perf data file,
and then compare them for the cycles diff. It can let us easily know
how many cycles are moved from one block to another block.

Now let's see examples.

perf record -b ... Generate perf.data.old with branch data
perf record -b ... Generate perf.data with branch data
perf diff --stream --percent-limit 2

[ Matched hot chains between old perf data and new perf data) ]

hot chain pair 1:
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
--------------------------- --------------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44

hot chain pair 2:
cycles: 35, hits: 21.43% cycles: 33, hits: 19.37%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
main div.c:40 main div.c:40
main div.c:39 main div.c:39

hot chain pair 3:
cycles: 18, hits: 6.10% cycles: 19, hits: 6.51%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40

hot chain pair 4:
cycles: 9, hits: 5.95% cycles: 8, hits: 5.03%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380

[ Hot chains in old perf data but source line changed (*) in new perf data ]

[ Hot chains in old perf data only ]

hot chain 1:
cycles: 2, hits: 4.08%
--------------------------
main div.c:42
compute_flag div.c:28

[ Hot chains in new perf data only ]

hot chain 1:
cycles: 36, hits: 3.36%
--------------------------
__random_r random_r.c:357
__random random.c:293
__random random.c:293
__random random.c:291
__random random.c:291
__random random.c:291
__random random.c:288
rand rand.c:27
rand rand.c:26
rand@plt
rand@plt
compute_flag div.c:25
compute_flag div.c:22
main div.c:40
main div.c:40

Ignore the rightmost columns such as '[Program Block Range]' and 'Shared Object' for saving space

# Output based on old perf data:
#
# Sampled Cycles% Avg Cycles New Stream Diff(cycles%,cycles) New Stream Sampled Cycles% New Stream Avg Cycles
# ............... .......... ............................... .......................... .....................
#
25.20% 18 -0.36%, -1 - -
15.24% 7 -0.45%, 0 - -
5.07% 2 0.09%, 0 - -
4.84% 2 0.26%, 0 - -
4.72% 2 0.30%, 0 - -
3.91% 1 0.29%, 0 - -
3.05% 1 0.11%, 0 - -
2.90% 1 0.08%, 0 - -
2.71% 1 -0.11%, 0 - -
2.44% 1 0.09%, 0 - -
2.35% 1 -0.09%, 0 - -
2.27% 1 0.15%, 0 - -
2.27% 1 0.06%, 0 - -
2.17% 1 0.09%, 0 - -

If we enable the source line comparison, the output might be different.

perf diff --stream --before ./before --after ./after

[ Matched hot chains between old perf data and new perf data) ]

hot chain pair 1:
cycles: 18, hits: 6.10% cycles: 19, hits: 6.51%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40

hot chain pair 2:
cycles: 9, hits: 5.95% cycles: 8, hits: 5.03%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380

[ Hot chains in old perf data but source line changed (*) in new perf data ]

hot chain pair 1:
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
--------------------------- --------------------------
main div.c:39 main div.c:39*
main div.c:44 main div.c:44

hot chain pair 2:
cycles: 35, hits: 21.43% cycles: 33, hits: 19.37%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
main div.c:40 main div.c:40
main div.c:39 main div.c:39*

[ Hot chains in old perf data only ]

hot chain 1:
cycles: 2, hits: 4.08%
--------------------------
main div.c:42
compute_flag div.c:28

[ Hot chains in new perf data only ]

hot chain 1:
cycles: 36, hits: 3.36%
--------------------------
__random_r random_r.c:357
__random random.c:293
__random random.c:293
__random random.c:291
__random random.c:291
__random random.c:291
__random random.c:288
rand rand.c:27
rand rand.c:26
rand@plt
rand@plt
compute_flag div.c:25
compute_flag div.c:22
main div.c:40
main div.c:40

# Output based on old perf data:
#
# Sampled Cycles% Avg Cycles New Stream Diff(cycles%,cycles) New Stream Sampled Cycles% New Stream Avg Cycles
# ............... .......... ............................... .......................... .....................
#
25.20% 18 [block changed in new stream] 24.84% 17
15.24% 7 -0.45%, 0 - -
5.07% 2 0.09%, 0 - -
4.84% 2 0.26%, 0 - -
4.72% 2 0.30%, 0 - -
3.91% 1 0.29%, 0 - -
3.05% 1 0.11%, 0 - -
2.90% 1 0.08%, 0 - -
2.71% 1 -0.11%, 0 - -
2.44% 1 0.09%, 0 - -
2.35% 1 -0.09%, 0 - -
2.27% 1 0.15%, 0 - -
2.27% 1 0.06%, 0 - -
2.17% 1 0.09%, 0 - -

Sometime some changes are not reflected in the source code,
e.g. changing the compiler option. So for this, we can't get
the changes by diffing the source code lines.

This patch series also introduces a new perf-diff option "--changed-func".
It passes the names of changed functions then perf-diff can know what
functions are changed.

For example,
perf diff --stream --changed-func main --changed-func rand

NOTE:
-----
1. For the patches:

perf util: Create source line mapping table
perf util: Create streams for managing top N hottest callchains
perf util: Return per-event callchain streams
perf util: Compare two streams
perf util: Calculate the sum of all streams hits
perf util: Report hot streams
perf diff: Support hot streams comparison

These patches support the hot stream comparison.

2. For the patches:
perf util: Add new block info functions for top N hot blocks comparison
perf util: Add new block info fmts for showing hot blocks comparison
perf util: Enable block source line comparison
perf diff: support hot blocks comparison

These patches support the hot blocks comparison.

3. For the patches
perf util: Filter out streams by name of changed functions
perf util: Filter out blocks by name of changed functions
perf diff: Filter out streams by changed functions

These patches support a user specified function name list which let
perf-diff know these functions are changed.

v2:
---
Refine the codes for following patches:
perf util: Create source line mapping table
perf util: Create streams for managing top N hottest callchains
perf util: Calculate the sum of all streams hits
perf util: Add new block info functions for top N hot blocks comparison

Jin Yao (14):
perf util: Create source line mapping table
perf util: Create streams for managing top N hottest callchains
perf util: Return per-event callchain streams
perf util: Compare two streams
perf util: Calculate the sum of all streams hits
perf util: Report hot streams
perf diff: Support hot streams comparison
perf util: Add new block info functions for top N hot blocks
comparison
perf util: Add new block info fmts for showing hot blocks comparison
perf util: Enable block source line comparison
perf diff: support hot blocks comparison
perf util: Filter out streams by name of changed functions
perf util: Filter out blocks by name of changed functions
perf diff: Filter out streams by changed functions

tools/perf/Documentation/perf-diff.txt | 19 +
tools/perf/builtin-diff.c | 324 ++++++++++++---
tools/perf/util/Build | 1 +
tools/perf/util/block-info.c | 433 ++++++++++++++++++-
tools/perf/util/block-info.h | 38 +-
tools/perf/util/callchain.c | 517 +++++++++++++++++++++++
tools/perf/util/callchain.h | 34 ++
tools/perf/util/srclist.c | 555 +++++++++++++++++++++++++
tools/perf/util/srclist.h | 74 ++++
9 files changed, 1935 insertions(+), 60 deletions(-)
create mode 100644 tools/perf/util/srclist.c
create mode 100644 tools/perf/util/srclist.h

--
2.17.1


2020-03-13 07:12:31

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 02/14] perf util: Create streams for managing top N hottest callchains

We think the stream is a callchain which is aggregated by the LBR
records from samples. By browsing the stream, we can understand
the code flow.

The struct callchain_node represents one callchain and we use the
callchain_node->hit to measure the hot level of this callchain.
Higher is hotter.

Since in perf data file, there may be many callchains so we just
need to focus on the top N hottest callchains. N is a user defined
parameter or just a predefined default value.

This patch saves the top N hottest callchains in 'struct stream_node'
type array, which is defined in a per event 'struct callchain_streams'.

So now we can get the per-event top N hottest callchains.

v2:
---
Use zfree in free_evsel_streams().

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/callchain.c | 122 ++++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 16 +++++
2 files changed, 138 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 818aa4efd386..0c028caaeb19 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -31,6 +31,7 @@
#include "callchain.h"
#include "branch.h"
#include "symbol.h"
+#include "evlist.h"
#include "../perf.h"

#define CALLCHAIN_PARAM_DEFAULT \
@@ -1599,3 +1600,124 @@ void callchain_cursor_reset(struct callchain_cursor *cursor)
for (node = cursor->first; node != NULL; node = node->next)
map__zput(node->ms.map);
}
+
+static void free_evsel_streams(struct callchain_streams *callchain_streams,
+ int nr_evsel)
+{
+ for (int i = 0; i < nr_evsel; i++)
+ zfree(&callchain_streams[i].streams);
+
+ free(callchain_streams);
+}
+
+static struct callchain_streams *create_evsel_streams(int nr_evsel,
+ int nr_streams_max)
+{
+ struct callchain_streams *callchain_streams;
+
+ callchain_streams = calloc(nr_evsel, sizeof(struct callchain_streams));
+ if (!callchain_streams)
+ return NULL;
+
+ for (int i = 0; i < nr_evsel; i++) {
+ struct callchain_streams *s = &callchain_streams[i];
+
+ s->streams = calloc(nr_streams_max, sizeof(struct stream_node));
+ if (!s->streams)
+ goto err;
+
+ s->nr_streams_max = nr_streams_max;
+ s->evsel_idx = -1;
+ }
+
+ return callchain_streams;
+
+err:
+ free_evsel_streams(callchain_streams, nr_evsel);
+ return NULL;
+}
+
+/*
+ * The cnodes with high hit number are hot callchains.
+ */
+static void set_hot_cnode(struct callchain_streams *s,
+ struct callchain_node *cnode)
+{
+ int i, idx = 0;
+ u64 hit;
+
+ if (s->nr_streams < s->nr_streams_max) {
+ i = s->nr_streams;
+ s->streams[i].cnode = cnode;
+ s->nr_streams++;
+ return;
+ }
+
+ /*
+ * Since only a few number of hot streams, so only use simple
+ * way to find the cnode with smallest hit number and replace.
+ */
+ hit = (s->streams[0].cnode)->hit;
+ for (i = 1; i < s->nr_streams; i++) {
+ if ((s->streams[i].cnode)->hit < hit) {
+ hit = (s->streams[i].cnode)->hit;
+ idx = i;
+ }
+ }
+
+ if (cnode->hit > hit)
+ s->streams[idx].cnode = cnode;
+}
+
+static void update_hot_streams(struct hist_entry *he,
+ struct callchain_streams *s)
+{
+ struct rb_root *root = &he->sorted_chain;
+ struct rb_node *rb_node = rb_first(root);
+ struct callchain_node *node;
+
+ while (rb_node) {
+ node = rb_entry(rb_node, struct callchain_node, rb_node);
+ set_hot_cnode(s, node);
+ rb_node = rb_next(rb_node);
+ }
+}
+
+static void get_hot_streams(struct hists *hists,
+ struct callchain_streams *s)
+{
+ struct rb_node *next = rb_first_cached(&hists->entries);
+
+ while (next) {
+ struct hist_entry *he;
+
+ he = rb_entry(next, struct hist_entry, rb_node);
+ update_hot_streams(he, s);
+ next = rb_next(&he->rb_node);
+ }
+}
+
+struct callchain_streams *callchain_evsel_streams_create(struct evlist *evlist,
+ int nr_streams_max,
+ int *nr_evsel_streams)
+{
+ int nr_evsel = evlist->core.nr_entries, i = 0;
+ struct callchain_streams *callchain_streams;
+ struct evsel *pos;
+
+ callchain_streams = create_evsel_streams(nr_evsel, nr_streams_max);
+ if (!callchain_streams)
+ return NULL;
+
+ evlist__for_each_entry(evlist, pos) {
+ struct hists *hists = evsel__hists(pos);
+
+ hists__output_resort(hists, NULL);
+ get_hot_streams(hists, &callchain_streams[i]);
+ callchain_streams[i].evsel_idx = pos->idx;
+ i++;
+ }
+
+ *nr_evsel_streams = nr_evsel;
+ return callchain_streams;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 706bb7bbe1e1..5852990cdf60 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -13,6 +13,7 @@ struct ip_callchain;
struct map;
struct perf_sample;
struct thread;
+struct evlist;

#define HELP_PAD "\t\t\t\t"

@@ -159,6 +160,17 @@ struct callchain_cursor {
struct callchain_cursor_node *curr;
};

+struct stream_node {
+ struct callchain_node *cnode;
+};
+
+struct callchain_streams {
+ struct stream_node *streams;
+ int nr_streams_max;
+ int nr_streams;
+ int evsel_idx;
+};
+
extern __thread struct callchain_cursor callchain_cursor;

static inline void callchain_init(struct callchain_root *root)
@@ -289,4 +301,8 @@ int callchain_branch_counts(struct callchain_root *root,
u64 *branch_count, u64 *predicted_count,
u64 *abort_count, u64 *cycles_count);

+struct callchain_streams *callchain_evsel_streams_create(struct evlist *evlist,
+ int nr_streams_max,
+ int *nr_evsel_streams);
+
#endif /* __PERF_CALLCHAIN_H */
--
2.17.1

2020-03-13 07:12:34

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 03/14] perf util: Return per-event callchain streams

In previous patch, we have created a 'struct callchain_streams'
array and each array entry contains per-event callchain streams.

This patch returns the pointer of per-event callchain streams
according to the evsel_idx.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/callchain.c | 12 ++++++++++++
tools/perf/util/callchain.h | 4 ++++
2 files changed, 16 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 0c028caaeb19..bf66f33debd4 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1721,3 +1721,15 @@ struct callchain_streams *callchain_evsel_streams_create(struct evlist *evlist,
*nr_evsel_streams = nr_evsel;
return callchain_streams;
}
+
+struct callchain_streams *callchain_evsel_streams_get(struct callchain_streams *cs,
+ int nr_streams_max,
+ int evsel_idx)
+{
+ for (int i = 0; i < nr_streams_max; i++) {
+ if (cs[i].evsel_idx == evsel_idx)
+ return &cs[i];
+ }
+
+ return NULL;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 5852990cdf60..4e881361e154 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -305,4 +305,8 @@ struct callchain_streams *callchain_evsel_streams_create(struct evlist *evlist,
int nr_streams_max,
int *nr_evsel_streams);

+struct callchain_streams *callchain_evsel_streams_get(struct callchain_streams *cs,
+ int nr_streams_max,
+ int evsel_idx);
+
#endif /* __PERF_CALLCHAIN_H */
--
2.17.1

2020-03-13 07:12:49

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 06/14] perf util: Report hot streams

We show the streams separately. They are divided into different sections.

1. "Matched hot chains between old perf data and new perf data"

2. "Hot chains in old perf data but source line changed in new perf data"

3. "Hot chains in old perf data only"

4. "Hot chains in new perf data only".

For each stream, we report the cycles and hot percent (hits%).

For example,

cycles: 2, hits: 4.08%
--------------------------
main div.c:42
compute_flag div.c:28

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/callchain.c | 155 ++++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 3 +
2 files changed, 158 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index b0c8757c2dcf..84fe8e418532 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1777,6 +1777,9 @@ static bool chain_srclist_match(struct srclist *src_list, const char *srcline_a,
struct src_node *node;
struct line_pair *lp;

+ if (!srcline_a || !srcline_b)
+ return false;
+
if (sscanf(srcline_a, "%[^:]:%d", file_a, &line_a) != 2)
return false;

@@ -1817,6 +1820,10 @@ static bool chain_match(struct callchain_list *base_chain,

*src_changed = false;

+ /*
+ * Check sourceline first. If not matched,
+ * fallback to symbol match and address match.
+ */
if (src_list && chain_srclist_match(src_list, pair_chain->srcline,
base_chain->srcline, &src_found,
src_changed)) {
@@ -1939,3 +1946,151 @@ void callchain_match_streams(struct callchain_streams *cs_base,
if (src_list)
srclist__dump(src_list);
}
+
+static s64 callchain_avg_cycles(struct callchain_node *cnode)
+{
+ struct callchain_list *chain;
+ s64 cycles = 0;
+
+ list_for_each_entry(chain, &cnode->val, list) {
+ if (chain->srcline && chain->branch_count)
+ cycles += chain->cycles_count / chain->branch_count;
+ }
+
+ return cycles;
+}
+
+static void print_callchain_pair(struct stream_node *base_node, int idx,
+ struct callchain_streams *cs_base,
+ struct callchain_streams *cs_pair)
+{
+ struct callchain_node *base_cnode = base_node->cnode;
+ struct callchain_node *pair_cnode = base_node->pair_cnode;
+ struct callchain_list *base_chain, *pair_chain;
+ char buf1[512], buf2[512], cbuf1[256], cbuf2[256];
+ char *s1, *s2;
+ double pct;
+
+ printf("\nhot chain pair %d:\n", idx);
+
+ pct = (double)base_cnode->hit / (double)cs_base->chain_hits;
+ scnprintf(buf1, sizeof(buf1), "cycles: %ld, hits: %.2f%%",
+ callchain_avg_cycles(base_cnode), pct * 100.0);
+
+ pct = (double)pair_cnode->hit / (double)cs_pair->chain_hits;
+ scnprintf(buf2, sizeof(buf2), "cycles: %ld, hits: %.2f%%",
+ callchain_avg_cycles(pair_cnode), pct * 100.0);
+
+ printf("%35s\t%35s\n", buf1, buf2);
+
+ printf("%35s\t%35s\n",
+ "---------------------------",
+ "--------------------------");
+
+ pair_chain = list_first_entry(&pair_cnode->val,
+ struct callchain_list,
+ list);
+
+ list_for_each_entry(base_chain, &base_cnode->val, list) {
+ if (&pair_chain->list == &pair_cnode->val)
+ return;
+
+ s1 = callchain_list__sym_name(base_chain, cbuf1, sizeof(cbuf1),
+ false);
+ s2 = callchain_list__sym_name(pair_chain, cbuf2, sizeof(cbuf2),
+ false);
+
+ if (!pair_chain->src_changed)
+ scnprintf(buf1, sizeof(buf1), "%35s\t%35s", s1, s2);
+ else
+ scnprintf(buf1, sizeof(buf1), "%35s\t%35s*", s1, s2);
+
+ printf("%s\n", buf1);
+ pair_chain = list_next_entry(pair_chain, list);
+ }
+}
+
+static void print_stream_callchain(struct stream_node *node, int idx,
+ struct callchain_streams *cs, bool pair)
+{
+ struct callchain_node *cnode = node->cnode;
+ struct callchain_list *chain;
+ char buf[512], cbuf[256], *s;
+ double pct;
+
+ printf("\nhot chain %d:\n", idx);
+
+ pct = (double)cnode->hit / (double)cs->chain_hits;
+ scnprintf(buf, sizeof(buf), "cycles: %ld, hits: %.2f%%",
+ callchain_avg_cycles(cnode), pct * 100.0);
+
+ if (pair) {
+ printf("%35s\t%35s\n", "", buf);
+ printf("%35s\t%35s\n",
+ "", "--------------------------");
+ } else {
+ printf("%35s\n", buf);
+ printf("%35s\n", "--------------------------");
+ }
+
+ list_for_each_entry(chain, &cnode->val, list) {
+ s = callchain_list__sym_name(chain, cbuf, sizeof(cbuf), false);
+
+ if (pair)
+ scnprintf(buf, sizeof(buf), "%35s\t%35s", "", s);
+ else
+ scnprintf(buf, sizeof(buf), "%35s", s);
+
+ printf("%s\n", buf);
+ }
+}
+
+void callchain_stream_report(struct callchain_streams *cs_base,
+ struct callchain_streams *cs_pair)
+{
+ struct stream_node *base_node;
+ int i, idx = 0;
+
+ printf("[ Matched hot chains between old perf data and new perf data ]\n");
+ for (i = 0; i < cs_base->nr_streams; i++) {
+ base_node = &cs_base->streams[i];
+ if (base_node->pair_cnode) {
+ if (!base_node->src_changed) {
+ print_callchain_pair(base_node, ++idx,
+ cs_base, cs_pair);
+ }
+ }
+ }
+
+ idx = 0;
+ printf("\n[ Hot chains in old perf data but source line changed (*) in new perf data ]\n");
+ for (i = 0; i < cs_base->nr_streams; i++) {
+ base_node = &cs_base->streams[i];
+ if (base_node->pair_cnode) {
+ if (base_node->src_changed) {
+ print_callchain_pair(base_node, ++idx,
+ cs_base, cs_pair);
+ }
+ }
+ }
+
+ idx = 0;
+ printf("\n[ Hot chains in old perf data only ]\n");
+ for (i = 0; i < cs_base->nr_streams; i++) {
+ base_node = &cs_base->streams[i];
+ if (!base_node->pair_cnode) {
+ print_stream_callchain(base_node, ++idx,
+ cs_base, false);
+ }
+ }
+
+ idx = 0;
+ printf("\n[ Hot chains in new perf data only ]\n");
+ for (i = 0; i < cs_pair->nr_streams; i++) {
+ base_node = &cs_pair->streams[i];
+ if (!base_node->pair_cnode) {
+ print_stream_callchain(base_node, ++idx,
+ cs_pair, true);
+ }
+ }
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 3c0e0b45656b..90826a280476 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -318,4 +318,7 @@ void callchain_match_streams(struct callchain_streams *cs_base,
struct callchain_streams *cs_pair,
struct srclist *src_list);

+void callchain_stream_report(struct callchain_streams *cs_base,
+ struct callchain_streams *cs_pair);
+
#endif /* __PERF_CALLCHAIN_H */
--
2.17.1

2020-03-13 07:12:57

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 07/14] perf diff: Support hot streams comparison

This patch enables perf-diff with "--stream", "--before" and "--after"
options.

"--stream": Enable hot streams comparison
"--before": Source code directory corresponding to perf.data.old
"--after" : Source code directory corresponding to perf.data

Let's see two examples

perf record -b ... Generate perf.data.old with branch data
perf record -b ... Generate perf.data with branch data
perf diff --stream

[ Matched hot chains between old perf data and new perf data ]

hot chain pair 1:
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
--------------------------- --------------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44

hot chain pair 2:
cycles: 35, hits: 21.43% cycles: 33, hits: 19.37%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
main div.c:40 main div.c:40
main div.c:39 main div.c:39

hot chain pair 3:
cycles: 18, hits: 6.10% cycles: 19, hits: 6.51%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40

hot chain pair 4:
cycles: 9, hits: 5.95% cycles: 8, hits: 5.03%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380

[ Hot chains in old perf data but source line changed (*) in new perf data ]

[ Hot chains in old perf data only ]

hot chain 1:
cycles: 2, hits: 4.08%
--------------------------
main div.c:42
compute_flag div.c:28

[ Hot chains in new perf data only ]

hot chain 1:
cycles: 36, hits: 3.36%
--------------------------
__random_r random_r.c:357
__random random.c:293
__random random.c:293
__random random.c:291
__random random.c:291
__random random.c:291
__random random.c:288
rand rand.c:27
rand rand.c:26
rand@plt
rand@plt
compute_flag div.c:25
compute_flag div.c:22
main div.c:40
main div.c:40

If we enable the source line comparison, the output might be different.

perf diff --stream --before ./before --after ./after

[ Matched hot chains between old perf data and new perf data) ]

hot chain pair 1:
cycles: 18, hits: 6.10% cycles: 19, hits: 6.51%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40

hot chain pair 2:
cycles: 9, hits: 5.95% cycles: 8, hits: 5.03%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380

[ Hot chains in old perf data but source line changed (*) in new perf data ]

hot chain pair 1:
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
--------------------------- --------------------------
main div.c:39 main div.c:39*
main div.c:44 main div.c:44

hot chain pair 2:
cycles: 35, hits: 21.43% cycles: 33, hits: 19.37%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
main div.c:40 main div.c:40
main div.c:39 main div.c:39*

[ Hot chains in old perf data only ]

hot chain 1:
cycles: 2, hits: 4.08%
--------------------------
main div.c:42
compute_flag div.c:28

[ Hot chains in new perf data only ]

hot chain 1:
cycles: 36, hits: 3.36%
--------------------------
__random_r random_r.c:357
__random random.c:293
__random random.c:293
__random random.c:291
__random random.c:291
__random random.c:291
__random random.c:288
rand rand.c:27
rand rand.c:26
rand@plt
rand@plt
compute_flag div.c:25
compute_flag div.c:22
main div.c:40
main div.c:40
main div.c:39

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/Documentation/perf-diff.txt | 14 ++
tools/perf/builtin-diff.c | 170 +++++++++++++++++++++++--
2 files changed, 171 insertions(+), 13 deletions(-)

diff --git a/tools/perf/Documentation/perf-diff.txt b/tools/perf/Documentation/perf-diff.txt
index f50ca0fef0a4..296fea98ac07 100644
--- a/tools/perf/Documentation/perf-diff.txt
+++ b/tools/perf/Documentation/perf-diff.txt
@@ -182,6 +182,20 @@ OPTIONS
--tid=::
Only diff samples for given thread ID (comma separated list).

+--stream::
+ Enable hot streams comparison. Stream is a callchain which is
+ aggregated by the branch records from samples. The option can be
+ used with --before and --after if source line comparison is
+ required.
+
+--before::
+ Source code directory corresponding to perf.data.old. Should be
+ used with --stream and --after.
+
+--after::
+ Source code directory corresponding to perf.data. Should be
+ used with --stream and --before.
+
COMPARISON
----------
The comparison is governed by the baseline file. The baseline perf.data
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 5e697cd2224a..9c801b9bc5bb 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -25,6 +25,8 @@
#include "util/map.h"
#include "util/spark.h"
#include "util/block-info.h"
+#include "util/srclist.h"
+#include "util/callchain.h"
#include <linux/err.h>
#include <linux/zalloc.h>
#include <subcmd/pager.h>
@@ -38,10 +40,15 @@
struct perf_diff {
struct perf_tool tool;
const char *time_str;
+ const char *before_dir;
+ const char *after_dir;
struct perf_time_interval *ptime_range;
int range_size;
int range_num;
bool has_br_stack;
+ bool src_cmp;
+ bool stream;
+ struct srclist *src_list;
};

/* Diff command specific HPP columns. */
@@ -72,6 +79,8 @@ struct data__file {
struct perf_data data;
int idx;
struct hists *hists;
+ struct callchain_streams *evsel_streams;
+ int nr_evsel_streams;
struct diff_hpp_fmt fmt[PERF_HPP_DIFF__MAX_INDEX];
};

@@ -106,6 +115,7 @@ enum {
COMPUTE_DELTA_ABS,
COMPUTE_CYCLES,
COMPUTE_MAX,
+ COMPUTE_STREAM, /* After COMPUTE_MAX to avoid use current compute arrays */
};

const char *compute_names[COMPUTE_MAX] = {
@@ -393,6 +403,11 @@ static int diff__process_sample_event(struct perf_tool *tool,
struct perf_diff *pdiff = container_of(tool, struct perf_diff, tool);
struct addr_location al;
struct hists *hists = evsel__hists(evsel);
+ struct hist_entry_iter iter = {
+ .evsel = evsel,
+ .sample = sample,
+ .ops = &hist_iter_normal,
+ };
int ret = -1;

if (perf_time__ranges_skip_sample(pdiff->ptime_range, pdiff->range_num,
@@ -411,14 +426,8 @@ static int diff__process_sample_event(struct perf_tool *tool,
goto out_put;
}

- if (compute != COMPUTE_CYCLES) {
- if (!hists__add_entry(hists, &al, NULL, NULL, NULL, sample,
- true)) {
- pr_warning("problem incrementing symbol period, "
- "skipping event\n");
- goto out_put;
- }
- } else {
+ switch (compute) {
+ case COMPUTE_CYCLES:
if (!hists__add_entry_ops(hists, &block_hist_ops, &al, NULL,
NULL, NULL, sample, true)) {
pr_warning("problem incrementing symbol period, "
@@ -428,6 +437,23 @@ static int diff__process_sample_event(struct perf_tool *tool,

hist__account_cycles(sample->branch_stack, &al, sample, false,
NULL);
+ break;
+
+ case COMPUTE_STREAM:
+ if (hist_entry_iter__add(&iter, &al, PERF_MAX_STACK_DEPTH,
+ NULL)) {
+ pr_debug("problem adding hist entry, skipping event\n");
+ goto out_put;
+ }
+ break;
+
+ default:
+ if (!hists__add_entry(hists, &al, NULL, NULL, NULL, sample,
+ true)) {
+ pr_warning("problem incrementing symbol period, "
+ "skipping event\n");
+ goto out_put;
+ }
}

/*
@@ -995,6 +1021,53 @@ static void data_process(void)
}
}

+static int process_base_stream(struct data__file *data_base,
+ struct data__file *data_pair,
+ const char *title __maybe_unused,
+ const char *base_dir __maybe_unused,
+ const char *pair_dir __maybe_unused)
+{
+ struct evlist *evlist_base = data_base->session->evlist;
+ struct evlist *evlist_pair = data_pair->session->evlist;
+ struct evsel *evsel_base, *evsel_pair;
+ struct callchain_streams *es_base, *es_pair;
+
+ evlist__for_each_entry(evlist_base, evsel_base) {
+ evsel_pair = evsel_match(evsel_base, evlist_pair);
+ if (!evsel_pair)
+ continue;
+
+ es_base = callchain_evsel_streams_get(data_base->evsel_streams,
+ data_base->nr_evsel_streams,
+ evsel_base->idx);
+ if (!es_base)
+ return -1;
+
+ es_pair = callchain_evsel_streams_get(data_pair->evsel_streams,
+ data_pair->nr_evsel_streams,
+ evsel_pair->idx);
+ if (!es_pair)
+ return -1;
+
+ callchain_match_streams(es_base, es_pair, pdiff.src_list);
+ callchain_stream_report(es_base, es_pair);
+ }
+
+ return 0;
+}
+
+static void stream_process(void)
+{
+ /*
+ * Stream comparison only supports two data files.
+ * perf.data.old and perf.data. data__files[0] is perf.data.old,
+ * data__files[1] is perf.data.
+ */
+ process_base_stream(&data__files[0], &data__files[1],
+ "# Output based on old perf data:\n#\n",
+ pdiff.before_dir, pdiff.after_dir);
+}
+
static void data__free(struct data__file *d)
{
int col;
@@ -1108,6 +1181,19 @@ static int check_file_brstack(void)
return 0;
}

+static struct callchain_streams *create_evsel_streams(struct evlist *evlist,
+ int nr_streams_max,
+ int *nr_evsel_streams)
+{
+ struct callchain_streams *evsel_streams;
+
+ evsel_streams = callchain_evsel_streams_create(evlist,
+ nr_streams_max,
+ nr_evsel_streams);
+
+ return evsel_streams;
+}
+
static int __cmd_diff(void)
{
struct data__file *d;
@@ -1121,6 +1207,14 @@ static int __cmd_diff(void)
abstime_tmp = abstime_ostr;
ret = -EINVAL;

+ if (pdiff.src_cmp) {
+ pdiff.src_list = srclist__new(pdiff.before_dir,
+ pdiff.after_dir);
+
+ if (!pdiff.src_list)
+ goto out_free;
+ }
+
data__for_each_file(i, d) {
d->session = perf_session__new(&d->data, false, &pdiff.tool);
if (IS_ERR(d->session)) {
@@ -1152,9 +1246,21 @@ static int __cmd_diff(void)

if (pdiff.ptime_range)
zfree(&pdiff.ptime_range);
+
+ if (compute == COMPUTE_STREAM) {
+ d->evsel_streams = create_evsel_streams(
+ d->session->evlist,
+ 5,
+ &d->nr_evsel_streams);
+ if (!d->evsel_streams)
+ goto out_delete;
+ }
}

- data_process();
+ if (compute == COMPUTE_STREAM)
+ stream_process();
+ else
+ data_process();

out_delete:
data__for_each_file(i, d) {
@@ -1162,6 +1268,7 @@ static int __cmd_diff(void)
data__free(d);
}

+ out_free:
free(data__files);

if (pdiff.ptime_range)
@@ -1170,6 +1277,9 @@ static int __cmd_diff(void)
if (abstime_ostr)
free(abstime_ostr);

+ if (pdiff.src_list)
+ srclist__delete(pdiff.src_list);
+
return ret;
}

@@ -1227,6 +1337,15 @@ static const struct option options[] = {
"only consider symbols in these pids"),
OPT_STRING(0, "tid", &symbol_conf.tid_list_str, "tid[,tid...]",
"only consider symbols in these tids"),
+ OPT_BOOLEAN(0, "stream", &pdiff.stream,
+ "Enable hot streams comparison. "
+ "Can be used with --before and --after"),
+ OPT_STRING(0, "before", &pdiff.before_dir, "dir",
+ "Source code directory corresponding to perf.data.old. "
+ "WARNING: use with --after and --stream"),
+ OPT_STRING(0, "after", &pdiff.after_dir, "dir",
+ "Source code directory corresponding to perf.data. "
+ "WARNING: use with --before and --stream"),
OPT_END()
};

@@ -1886,6 +2005,19 @@ int cmd_diff(int argc, const char **argv)
if (cycles_hist && (compute != COMPUTE_CYCLES))
usage_with_options(diff_usage, options);

+ if (pdiff.stream) {
+ compute = COMPUTE_STREAM;
+ if ((pdiff.before_dir && !pdiff.after_dir) ||
+ (!pdiff.before_dir && pdiff.after_dir)) {
+ usage_with_options(diff_usage, options);
+ }
+
+ if (pdiff.before_dir && pdiff.after_dir)
+ pdiff.src_cmp = true;
+ } else if (pdiff.before_dir || pdiff.after_dir) {
+ usage_with_options(diff_usage, options);
+ }
+
symbol__annotation_init();

if (symbol__init(NULL) < 0)
@@ -1897,13 +2029,25 @@ int cmd_diff(int argc, const char **argv)
if (check_file_brstack() < 0)
return -1;

- if (compute == COMPUTE_CYCLES && !pdiff.has_br_stack)
+ if ((compute == COMPUTE_CYCLES || compute == COMPUTE_STREAM)
+ && !pdiff.has_br_stack) {
return -1;
+ }

- if (ui_init() < 0)
- return -1;
+ if (compute == COMPUTE_STREAM) {
+ symbol_conf.show_branchflag_count = true;
+ callchain_param.mode = CHAIN_FLAT;
+ callchain_param.key = CCKEY_SRCLINE;
+ callchain_param.branch_callstack = 1;
+ symbol_conf.use_callchain = true;
+ callchain_register_param(&callchain_param);
+ sort_order = "srcline,symbol,dso";
+ } else {
+ if (ui_init() < 0)
+ return -1;

- sort__mode = SORT_MODE__DIFF;
+ sort__mode = SORT_MODE__DIFF;
+ }

if (setup_sorting(NULL) < 0)
usage_with_options(diff_usage, options);
--
2.17.1

2020-03-13 07:12:58

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 08/14] perf util: Add new block info functions for top N hot blocks comparison

It's also useful to figure out the top N hottest blocks from old perf
data file and figure out the top N hottest blocks from new perf data file,
and then compare them for the cycles diff. It can let us easily know
how many cycles are moved from one block to another block.

This patch adds new helper functions and data structures for the block
comparison.

And it also updates the existing perf-diff to be compatible with
the new interface.

v2:
---
Use name block_line__zdelete to replace free_block_line

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/builtin-diff.c | 45 +-----
tools/perf/util/block-info.c | 288 ++++++++++++++++++++++++++++++++++-
tools/perf/util/block-info.h | 32 +++-
tools/perf/util/srclist.h | 9 ++
4 files changed, 328 insertions(+), 46 deletions(-)

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 9c801b9bc5bb..dcbc9bba4e61 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -598,27 +598,6 @@ static void init_block_hist(struct block_hist *bh)
bh->valid = true;
}

-static struct hist_entry *get_block_pair(struct hist_entry *he,
- struct hists *hists_pair)
-{
- struct rb_root_cached *root = hists_pair->entries_in;
- struct rb_node *next = rb_first_cached(root);
- int64_t cmp;
-
- while (next != NULL) {
- struct hist_entry *he_pair = rb_entry(next, struct hist_entry,
- rb_node_in);
-
- next = rb_next(&he_pair->rb_node_in);
-
- cmp = __block_info__cmp(he_pair, he);
- if (!cmp)
- return he_pair;
- }
-
- return NULL;
-}
-
static void init_spark_values(unsigned long *svals, int num)
{
for (int i = 0; i < num; i++)
@@ -665,26 +644,6 @@ static void compute_cycles_diff(struct hist_entry *he,
}
}

-static void block_hists_match(struct hists *hists_base,
- struct hists *hists_pair)
-{
- struct rb_root_cached *root = hists_base->entries_in;
- struct rb_node *next = rb_first_cached(root);
-
- while (next != NULL) {
- struct hist_entry *he = rb_entry(next, struct hist_entry,
- rb_node_in);
- struct hist_entry *pair = get_block_pair(he, hists_pair);
-
- next = rb_next(&he->rb_node_in);
-
- if (pair) {
- hist_entry__add_pair(pair, he);
- compute_cycles_diff(he, pair);
- }
- }
-}
-
static void hists__precompute(struct hists *hists)
{
struct rb_root_cached *root;
@@ -737,7 +696,9 @@ static void hists__precompute(struct hists *hists)

if (bh->valid && pair_bh->valid) {
block_hists_match(&bh->block_hists,
- &pair_bh->block_hists);
+ &pair_bh->block_hists,
+ NULL,
+ compute_cycles_diff);
hists__output_resort(&pair_bh->block_hists,
NULL);
}
diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c
index 423ec69bda6c..431cd97bae39 100644
--- a/tools/perf/util/block-info.c
+++ b/tools/perf/util/block-info.c
@@ -12,6 +12,7 @@
#include "evlist.h"
#include "hist.h"
#include "ui/browsers/hists.h"
+#include "debug.h"

static struct block_header_column {
const char *name;
@@ -50,10 +51,24 @@ struct block_info *block_info__get(struct block_info *bi)
return bi;
}

+static void block_line__zdelete(struct block_line **bl)
+{
+ if ((*bl)->start_file)
+ free((*bl)->start_file);
+
+ if ((*bl)->end_file)
+ free((*bl)->end_file);
+
+ zfree(bl);
+}
+
void block_info__put(struct block_info *bi)
{
- if (bi && refcount_dec_and_test(&bi->refcnt))
+ if (bi && refcount_dec_and_test(&bi->refcnt)) {
+ if (bi->line)
+ block_line__zdelete(&bi->line);
free(bi);
+ }
}

struct block_info *block_info__new(void)
@@ -65,7 +80,8 @@ struct block_info *block_info__new(void)
return bi;
}

-int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right)
+int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right,
+ struct srclist *src_list __maybe_unused)
{
struct block_info *bi_l = left->block_info;
struct block_info *bi_r = right->block_info;
@@ -93,7 +109,7 @@ int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right)
int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
struct hist_entry *left, struct hist_entry *right)
{
- return __block_info__cmp(left, right);
+ return __block_info__cmp(left, right, NULL);
}

static void init_block_info(struct block_info *bi, struct symbol *sym,
@@ -446,8 +462,10 @@ struct block_report *block_info__create_report(struct evlist *evlist,
evlist__for_each_entry(evlist, pos) {
struct hists *hists = evsel__hists(pos);

+ hists__output_resort(hists, NULL);
process_block_report(hists, &block_reports[i], total_cycles,
block_hpps, nr_hpps);
+ block_reports[i].evsel_idx = pos->idx;
i++;
}

@@ -496,3 +514,267 @@ float block_info__total_cycles_percent(struct hist_entry *he)

return 0.0;
}
+
+struct block_report *block_info__get_report(struct block_report *reps,
+ int nr_reps, int evsel_idx)
+{
+ for (int i = 0; i < nr_reps; i++) {
+ if (reps[i].evsel_idx == evsel_idx)
+ return &reps[i];
+ }
+
+ return NULL;
+}
+
+static char *move_to_relpath(char *path, const char *dir)
+{
+ const char *d = dir, *end = dir + strlen(dir);
+ char *s;
+
+ /* skip '.' and '/' */
+ while ((d != end) && ((*d == '.') || (*d == '/')))
+ d++;
+
+ if (d == end)
+ return NULL;
+
+ s = strstr(path, d);
+ if (!s)
+ return NULL;
+
+ s += strlen(d);
+
+ while (*s == '/' && *s != 0)
+ s++;
+
+ if (*s == 0)
+ return NULL;
+
+ return s;
+}
+
+static int block_addr2line(struct hist_entry *he, const char *dir)
+{
+ struct block_info *bi = he->block_info;
+ struct block_line *bl;
+
+ if (!he->ms.map || !he->ms.map->dso)
+ return -1;
+
+ bl = zalloc(sizeof(*bl));
+ if (!bl)
+ return -1;
+
+ symbol_conf.disable_add2line_warn = true;
+ bl->start_file = get_srcline_split(he->ms.map->dso,
+ map__rip_2objdump(he->ms.map,
+ bi->sym->start + bi->start),
+ &bl->start_nr);
+ if (!bl->start_file)
+ goto err;
+
+ if (dir)
+ bl->start_rel = move_to_relpath(bl->start_file, dir);
+
+ bl->end_file = get_srcline_split(he->ms.map->dso,
+ map__rip_2objdump(he->ms.map,
+ bi->sym->start + bi->end),
+ &bl->end_nr);
+ if (!bl->end_file)
+ goto err;
+
+ if (dir)
+ bl->end_rel = move_to_relpath(bl->end_file, dir);
+
+ bi->line = bl;
+ return 0;
+
+err:
+ block_line__zdelete(&bl);
+ return -1;
+}
+
+int block_hists_addr2line(struct hists *hists, const char *dir)
+{
+ struct rb_root_cached *root = hists->entries_in;
+ struct rb_node *next = rb_first_cached(root);
+
+ while (next != NULL) {
+ struct hist_entry *he = rb_entry(next, struct hist_entry,
+ rb_node_in);
+
+ if (!he)
+ break;
+
+ block_addr2line(he, dir);
+ next = rb_next(&he->rb_node_in);
+ }
+
+ return 0;
+}
+
+bool block_same_srcfiles(struct block_line *bl_a, struct block_line *bl_b)
+{
+ if (!bl_a->start_file || !bl_a->end_file ||
+ !bl_b->start_file || !bl_b->end_file) {
+ return false;
+ }
+
+ if (!strcmp(bl_a->start_file, bl_b->start_file) &&
+ !strcmp(bl_a->end_file, bl_b->end_file) &&
+ !strcmp(bl_a->start_file, bl_a->end_file)) {
+ return true;
+ }
+
+ if (!bl_a->start_rel || !bl_a->end_rel ||
+ !bl_b->start_rel || !bl_b->end_rel) {
+ return false;
+ }
+
+ if (!strcmp(bl_a->start_rel, bl_b->start_rel) &&
+ !strcmp(bl_a->end_rel, bl_b->end_rel) &&
+ !strcmp(bl_a->start_rel, bl_a->end_rel)) {
+ return true;
+ }
+
+ return false;
+}
+
+void block_line_dump(struct block_info *bi, const char *str)
+{
+ if (bi && bi->line) {
+ pr_debug("%s: %s:%d -> %s:%d\n",
+ str,
+ bi->line->start_file,
+ bi->line->start_nr,
+ bi->line->end_file,
+ bi->line->end_nr);
+ }
+}
+
+static bool block_line_matched(struct src_node *node,
+ int start_nr_a, int end_nr_a,
+ int start_nr_b, int end_nr_b,
+ bool *changed)
+{
+ int i, j, start_a, end_a, start_b, changed_nr = 0;
+ struct line_pair *lp;
+
+ *changed = false;
+
+ if (abs(end_nr_a - start_nr_a) !=
+ abs(end_nr_b - start_nr_b)) {
+ return false;
+ }
+
+ i = start_a = (start_nr_a < end_nr_a) ? start_nr_a : end_nr_a;
+ end_a = (end_nr_a > start_nr_a) ? end_nr_a : start_nr_a;
+
+ j = start_b = (start_nr_b < end_nr_b) ? start_nr_b : end_nr_b;
+
+ while (i <= end_a) {
+ lp = srclist__line_pair(node, i);
+ if (!lp)
+ return false;
+
+ if (lp->b_nr != j)
+ changed_nr++;
+
+ i++; j++;
+ }
+
+ if ((i == end_a + 1) && (changed_nr == 0))
+ return true;
+
+ /*
+ * At least one line is unchanged in this block,
+ * we think this block is changed (not a new block).
+ */
+ if (changed_nr < end_a - start_a + 1)
+ *changed = true;
+
+ return false;
+}
+
+bool block_srclist_matched(struct srclist *slist, char *rel_path,
+ int start_nr_a, int end_nr_a,
+ int start_nr_b, int end_nr_b,
+ bool *changed)
+{
+ struct src_node *node;
+ bool ret;
+
+ *changed = false;
+
+ node = srclist__find(slist, rel_path, true);
+ if (!node)
+ return false;
+
+ ret = block_line_matched(node, start_nr_a, end_nr_a,
+ start_nr_b, end_nr_b, changed);
+
+ return ret;
+}
+
+static struct hist_entry *get_block_pair(struct hist_entry *he,
+ struct hists *hists_pair,
+ struct srclist *src_list)
+{
+ struct rb_root_cached *root = hists_pair->entries_in;
+ struct rb_node *next = rb_first_cached(root);
+ int64_t cmp;
+
+ while (next != NULL) {
+ struct hist_entry *he_pair = rb_entry(next, struct hist_entry,
+ rb_node_in);
+
+ next = rb_next(&he_pair->rb_node_in);
+
+ cmp = __block_info__cmp(he_pair, he, src_list);
+ if (!cmp)
+ return he_pair;
+ }
+
+ return NULL;
+}
+
+void block_hists_match(struct hists *hists_base,
+ struct hists *hists_pair,
+ struct srclist *src_list,
+ void (*func)(struct hist_entry *,
+ struct hist_entry *))
+{
+ struct rb_root_cached *root = hists_base->entries_in;
+ struct rb_node *next = rb_first_cached(root);
+
+ while (next != NULL) {
+ struct hist_entry *he = rb_entry(next, struct hist_entry,
+ rb_node_in);
+ struct hist_entry *pair = get_block_pair(he, hists_pair,
+ src_list);
+
+ next = rb_next(&he->rb_node_in);
+
+ if (pair) {
+ hist_entry__add_pair(pair, he);
+
+ if (func)
+ (*func)(he, pair);
+ }
+ }
+}
+
+int block_info__match_report(struct block_report *rep_base,
+ struct block_report *rep_pair,
+ struct srclist *src_list,
+ void (*func)(struct hist_entry *,
+ struct hist_entry *))
+{
+ struct block_hist *bh_base = &rep_base->hist;
+ struct block_hist *bh_pair = &rep_pair->hist;
+
+ block_hists_match(&bh_base->block_hists, &bh_pair->block_hists,
+ src_list, func);
+
+ return 0;
+}
diff --git a/tools/perf/util/block-info.h b/tools/perf/util/block-info.h
index 42e9dcc4cf0a..458bd998089d 100644
--- a/tools/perf/util/block-info.h
+++ b/tools/perf/util/block-info.h
@@ -20,6 +20,8 @@ struct block_info {
int num;
int num_aggr;
refcount_t refcnt;
+ struct block_line *line;
+ bool srcline_matched;
};

struct block_fmt {
@@ -46,6 +48,7 @@ struct block_report {
u64 cycles;
struct block_fmt fmts[PERF_HPP_REPORT__BLOCK_MAX_INDEX];
int nr_fmts;
+ int evsel_idx;
};

struct block_hist;
@@ -62,7 +65,8 @@ static inline void __block_info__zput(struct block_info **bi)

#define block_info__zput(bi) __block_info__zput(&bi)

-int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right);
+int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right,
+ struct srclist *src_list __maybe_unused);

int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
struct hist_entry *left, struct hist_entry *right);
@@ -83,4 +87,30 @@ int report__browse_block_hists(struct block_hist *bh, float min_percent,

float block_info__total_cycles_percent(struct hist_entry *he);

+struct block_report *block_info__get_report(struct block_report *reps,
+ int nr_reps, int evsel_idx);
+
+int block_hists_addr2line(struct hists *hists, const char *dir);
+
+void block_line_dump(struct block_info *bi, const char *str);
+
+bool block_same_srcfiles(struct block_line *bl_a, struct block_line *bl_b);
+
+bool block_srclist_matched(struct srclist *slist, char *rel_path,
+ int start_nr_a, int end_nr_a,
+ int start_nr_b, int end_nr_b,
+ bool *changed);
+
+void block_hists_match(struct hists *hists_base,
+ struct hists *hists_pair,
+ struct srclist *src_list,
+ void (*func)(struct hist_entry *,
+ struct hist_entry *));
+
+int block_info__match_report(struct block_report *rep_base,
+ struct block_report *rep_pair,
+ struct srclist *src_list,
+ void (*func)(struct hist_entry *,
+ struct hist_entry *));
+
#endif /* __PERF_BLOCK_H */
diff --git a/tools/perf/util/srclist.h b/tools/perf/util/srclist.h
index f25b0de91a13..46866d5c51d7 100644
--- a/tools/perf/util/srclist.h
+++ b/tools/perf/util/srclist.h
@@ -33,6 +33,15 @@ struct srclist {
const char *after_dir;
};

+struct block_line {
+ char *start_file;
+ char *end_file;
+ char *start_rel;
+ char *end_rel;
+ unsigned int start_nr;
+ unsigned int end_nr;
+};
+
struct srclist *srclist__new(const char *before_dir, const char *after_dir);
void srclist__delete(struct srclist *slist);

--
2.17.1

2020-03-13 07:13:02

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 10/14] perf util: Enable block source line comparison

Previously we only supported address comparison, but it was not good
for the case that address might be changed if source code was updated.

This patch enables for the source line comparison.

1. If all of the source lines in a block are not changed (or only
moved some offsets as a whole), we think the block is not changed.

2. If we can aware any source line in this block is changed,
we think this block is changed.

3. If 1 and 2 are both not matched, fallback to address comparison.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/block-info.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c
index 96b7b81cabd5..4d0275fbd0df 100644
--- a/tools/perf/util/block-info.c
+++ b/tools/perf/util/block-info.c
@@ -93,11 +93,12 @@ struct block_info *block_info__new(void)
}

int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right,
- struct srclist *src_list __maybe_unused)
+ struct srclist *src_list)
{
struct block_info *bi_l = left->block_info;
struct block_info *bi_r = right->block_info;
int cmp;
+ bool changed;

if (!bi_l->sym || !bi_r->sym) {
if (!bi_l->sym && !bi_r->sym)
@@ -112,6 +113,27 @@ int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right,
if (cmp)
return cmp;

+ if (src_list && bi_l->line && bi_r->line) {
+ if (block_same_srcfiles(bi_l->line, bi_r->line) &&
+ bi_l->line->start_rel) {
+
+ if (block_srclist_matched(src_list,
+ bi_l->line->start_rel,
+ bi_l->line->start_nr,
+ bi_l->line->end_nr,
+ bi_r->line->start_nr,
+ bi_r->line->end_nr,
+ &changed)) {
+ bi_l->srcline_matched = true;
+ return 0;
+ } else if (changed) {
+ bi_l->block_changed = true;
+ return 0;
+ } else
+ return -1;
+ }
+ }
+
if (bi_l->start != bi_r->start)
return (int64_t)(bi_r->start - bi_l->start);

--
2.17.1

2020-03-13 07:13:35

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 11/14] perf diff: support hot blocks comparison

This patch reports the results of hot blocks comparison.

For example, following outputs are added to the end of
hot streams reports.

# Output based on old stream (perf.data.old):
#
# Sampled Cycles% Avg Cycles New Stream Diff(cycles%,cycles) New Stream Sampled Cycles% New Stream Avg Cycles
# ............... .......... ............................... .......................... .....................
#
24.76% 17 -4.54%, -7 - -
14.72% 7 0.13%, -1 - -
5.28% 2 [block not in new stream] - -
5.09% 2 0.52%, 0 - -
5.05% 2 -0.15%, -1 - -
4.18% 1 0.12%, 0 - -
3.16% 1 0.28%, 0 - -

If we enable the source line comparison, the output might be different.

# Output based on old stream (perf.data.old):
#
# Sampled Cycles% Avg Cycles New Stream Diff(cycles%,cycles) New Stream Sampled Cycles% New Stream Avg Cycles
# ............... .......... ............................... .......................... .....................
#
24.76% 17 [block changed in new stream] 20.22% 10
14.72% 7 0.13%, -1 - -
5.28% 2 [block not in new stream] - -
5.09% 2 0.52%, 0 - -
5.05% 2 -0.15%, -1 - -
4.18% 1 [block changed in new stream] 4.31% 1
3.16% 1 0.28%, 0 - -

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/builtin-diff.c | 85 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 84 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index dcbc9bba4e61..566e811054b1 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -49,6 +49,8 @@ struct perf_diff {
bool src_cmp;
bool stream;
struct srclist *src_list;
+ u64 total_cycles;
+ float min_percent;
};

/* Diff command specific HPP columns. */
@@ -76,8 +78,10 @@ struct diff_hpp_fmt {

struct data__file {
struct perf_session *session;
+ struct block_report *block_reports;
struct perf_data data;
int idx;
+ int nr_block_reports;
struct hists *hists;
struct callchain_streams *evsel_streams;
int nr_evsel_streams;
@@ -445,8 +449,10 @@ static int diff__process_sample_event(struct perf_tool *tool,
pr_debug("problem adding hist entry, skipping event\n");
goto out_put;
}
- break;

+ hist__account_cycles(sample->branch_stack, &al, sample, false,
+ &pdiff->total_cycles);
+ break;
default:
if (!hists__add_entry(hists, &al, NULL, NULL, NULL, sample,
true)) {
@@ -992,6 +998,7 @@ static int process_base_stream(struct data__file *data_base,
struct evlist *evlist_pair = data_pair->session->evlist;
struct evsel *evsel_base, *evsel_pair;
struct callchain_streams *es_base, *es_pair;
+ struct block_report *rep_base, *rep_pair;

evlist__for_each_entry(evlist_base, evsel_base) {
evsel_pair = evsel_match(evsel_base, evlist_pair);
@@ -1014,6 +1021,37 @@ static int process_base_stream(struct data__file *data_base,
callchain_stream_report(es_base, es_pair);
}

+ evlist__for_each_entry(evlist_base, evsel_base) {
+ rep_base = block_info__get_report(data_base->block_reports,
+ data_base->nr_block_reports,
+ evsel_base->idx);
+
+ if (!rep_base)
+ return -1;
+
+ block_hists_addr2line(&rep_base->hist.block_hists, base_dir);
+
+ evlist_pair = data_pair->session->evlist;
+ evsel_pair = evsel_match(evsel_base, evlist_pair);
+ if (!evsel_pair)
+ continue;
+
+ rep_pair = block_info__get_report(data_pair->block_reports,
+ data_pair->nr_block_reports,
+ evsel_pair->idx);
+
+ block_hists_addr2line(&rep_pair->hist.block_hists, pair_dir);
+
+ block_info__match_report(rep_base, rep_pair,
+ pdiff.src_list, NULL);
+
+ fprintf(stdout, "%s", title);
+
+ use_browser = 0;
+ report__browse_block_hists(&rep_base->hist, pdiff.min_percent,
+ evsel_base, NULL, NULL);
+ }
+
return 0;
}

@@ -1142,6 +1180,26 @@ static int check_file_brstack(void)
return 0;
}

+static struct block_report *create_block_reports(struct evlist *evlist,
+ u64 total_cycles,
+ int *nr_block_reports)
+{
+ struct block_report *reps;
+ int block_hpps[7] = {
+ PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_PCT,
+ PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
+ PERF_HPP__BLOCK_NEW_STREAM_DIFF,
+ PERF_HPP__BLOCK_NEW_STREAM_TOTAL_CYCLES_PCT,
+ PERF_HPP__BLOCK_NEW_STREAM_AVG_CYCLES,
+ PERF_HPP_REPORT__BLOCK_RANGE,
+ PERF_HPP_REPORT__BLOCK_DSO,
+ };
+
+ reps = block_info__create_report(evlist, total_cycles, block_hpps, 7,
+ nr_block_reports);
+ return reps;
+}
+
static struct callchain_streams *create_evsel_streams(struct evlist *evlist,
int nr_streams_max,
int *nr_evsel_streams)
@@ -1197,6 +1255,8 @@ static int __cmd_diff(void)
goto out_delete;
}

+ pdiff.total_cycles = 0;
+
ret = perf_session__process_events(d->session);
if (ret) {
pr_err("Failed to process %s\n", d->data.path);
@@ -1215,6 +1275,12 @@ static int __cmd_diff(void)
&d->nr_evsel_streams);
if (!d->evsel_streams)
goto out_delete;
+
+ d->block_reports = create_block_reports(d->session->evlist,
+ pdiff.total_cycles,
+ &d->nr_block_reports);
+ if (!d->block_reports)
+ goto out_delete;
}
}

@@ -1225,6 +1291,11 @@ static int __cmd_diff(void)

out_delete:
data__for_each_file(i, d) {
+ if (d->block_reports) {
+ block_info__free_report(d->block_reports,
+ d->nr_block_reports);
+ }
+
perf_session__delete(d->session);
data__free(d);
}
@@ -1244,6 +1315,15 @@ static int __cmd_diff(void)
return ret;
}

+static int parse_percent_limit(const struct option *opt, const char *arg,
+ int unset __maybe_unused)
+{
+ struct perf_diff *d = opt->value;
+
+ d->min_percent = strtof(arg, NULL);
+ return 0;
+}
+
static const char * const diff_usage[] = {
"perf diff [<options>] [old_file] [new_file]",
NULL,
@@ -1307,6 +1387,9 @@ static const struct option options[] = {
OPT_STRING(0, "after", &pdiff.after_dir, "dir",
"Source code directory corresponding to perf.data. "
"WARNING: use with --before and --stream"),
+ OPT_CALLBACK(0, "percent-limit", &pdiff, "percent",
+ "Don't show entries under that percent",
+ parse_percent_limit),
OPT_END()
};

--
2.17.1

2020-03-13 07:13:47

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 05/14] perf util: Calculate the sum of all streams hits

We have used callchain_node->hit to measure the hot level of one
stream. This patch calculates the sum of hits of all streams.

Then in next patch, we can use following formula to report hot
percent for one stream.

hot percent = callchain_node->hit / sum of all hits

v2:
---
Combine the variable decl line with its inital assignment
in total_callchain_hits().

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/callchain.c | 34 ++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 1 +
2 files changed, 35 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 1619f7fa4076..b0c8757c2dcf 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1683,6 +1683,38 @@ static void update_hot_streams(struct hist_entry *he,
}
}

+static u64 count_callchain_hits(struct hist_entry *he)
+{
+ struct rb_root *root = &he->sorted_chain;
+ struct rb_node *rb_node = rb_first(root);
+ struct callchain_node *node;
+ u64 chain_hits = 0;
+
+ while (rb_node) {
+ node = rb_entry(rb_node, struct callchain_node, rb_node);
+ chain_hits += node->hit;
+ rb_node = rb_next(rb_node);
+ }
+
+ return chain_hits;
+}
+
+static u64 total_callchain_hits(struct hists *hists)
+{
+ struct rb_node *next = rb_first_cached(&hists->entries);
+ u64 chain_hits = 0;
+
+ while (next) {
+ struct hist_entry *he = rb_entry(next, struct hist_entry,
+ rb_node);
+
+ chain_hits += count_callchain_hits(he);
+ next = rb_next(&he->rb_node);
+ }
+
+ return chain_hits;
+}
+
static void get_hot_streams(struct hists *hists,
struct callchain_streams *s)
{
@@ -1695,6 +1727,8 @@ static void get_hot_streams(struct hists *hists,
update_hot_streams(he, s);
next = rb_next(&he->rb_node);
}
+
+ s->chain_hits = total_callchain_hits(hists);
}

struct callchain_streams *callchain_evsel_streams_create(struct evlist *evlist,
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index c996ab4fb108..3c0e0b45656b 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -173,6 +173,7 @@ struct callchain_streams {
int nr_streams_max;
int nr_streams;
int evsel_idx;
+ u64 chain_hits;
};

extern __thread struct callchain_cursor callchain_cursor;
--
2.17.1

2020-03-13 07:13:51

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 04/14] perf util: Compare two streams

If all callchain entries of one stream are fully matched with
all entries of another stream, we think these two streams are
matched.

For example,

cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
----------------------- -----------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44

Above two streams are matched if div.c is not changed.

But the streams are not matched if div.c is source line changed
(e.g. div.c:39 is changed). If the source line is changed, they
are different streams.

So the challenge is how to identify the changed source lines.
For this purpose, we use the source line mapping table (see patch
"perf util: Create source line mapping table"). The source line
mapping information is saved in src_list.

So the matching logic is, if we can find the source lines, the
callchain entry comparison is based on source lines. Otherwise,
fallback to dso address comparison.

Once two streams are fully matched, they will be linked as
a pair. From the pair, we can know which streams are matched.
The pair information will be used in next patches.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/callchain.c | 172 ++++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 8 ++
2 files changed, 180 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index bf66f33debd4..1619f7fa4076 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1733,3 +1733,175 @@ struct callchain_streams *callchain_evsel_streams_get(struct callchain_streams *

return NULL;
}
+
+static bool chain_srclist_match(struct srclist *src_list, const char *srcline_a,
+ const char *srcline_b, bool *src_found,
+ bool *src_changed)
+{
+ char file_a[PATH_MAX], file_b[PATH_MAX];
+ int line_a, line_b;
+ struct src_node *node;
+ struct line_pair *lp;
+
+ if (sscanf(srcline_a, "%[^:]:%d", file_a, &line_a) != 2)
+ return false;
+
+ if (sscanf(srcline_b, "%[^:]:%d", file_b, &line_b) != 2)
+ return false;
+
+ if (strcmp(file_a, file_b))
+ return false;
+
+ node = srclist__find(src_list, file_a, true);
+ if (!node)
+ return false;
+
+ *src_found = true;
+
+ lp = srclist__line_pair(node, line_a);
+ if (!lp)
+ return false;
+
+ if (line_a == line_b && lp->b_nr == -1) {
+ *src_changed = true;
+ return true;
+ }
+
+ if (lp->b_nr == line_b)
+ return true;
+
+ return false;
+}
+
+static bool chain_match(struct callchain_list *base_chain,
+ struct callchain_list *pair_chain,
+ struct srclist *src_list,
+ bool *src_changed)
+{
+ enum match_result match;
+ bool src_found = false;
+
+ *src_changed = false;
+
+ if (src_list && chain_srclist_match(src_list, pair_chain->srcline,
+ base_chain->srcline, &src_found,
+ src_changed)) {
+ return true;
+ }
+
+ if (!src_found) {
+ match = match_chain_strings(base_chain->srcline,
+ pair_chain->srcline);
+ if (match != MATCH_ERROR)
+ return match == MATCH_EQ;
+
+ match = match_chain_dso_addresses(base_chain->ms.map,
+ base_chain->ip,
+ pair_chain->ms.map,
+ pair_chain->ip);
+
+ return match == MATCH_EQ;
+ }
+
+ return false;
+}
+
+static bool callchain_node_matched(struct callchain_node *base_cnode,
+ struct callchain_node *pair_cnode,
+ struct srclist *src_list,
+ int *nr_changed)
+{
+ struct callchain_list *base_chain, *pair_chain;
+ bool match = false;
+
+ pair_chain = list_first_entry(&pair_cnode->val,
+ struct callchain_list,
+ list);
+
+ list_for_each_entry(base_chain, &base_cnode->val, list) {
+ bool src_changed;
+
+ if (&pair_chain->list == &pair_cnode->val)
+ return false;
+
+ if (!base_chain->srcline || !pair_chain->srcline) {
+ pair_chain = list_next_entry(pair_chain, list);
+ continue;
+ }
+
+ match = chain_match(base_chain, pair_chain, src_list,
+ &src_changed);
+
+ if (src_changed) {
+ pair_chain->src_changed = true;
+ *nr_changed += 1;
+ }
+
+ if (!match)
+ return false;
+
+ pair_chain = list_next_entry(pair_chain, list);
+ }
+
+ /*
+ * Say chain1 is ABC, chain2 is ABCD, we consider they are
+ * not fully matched.
+ */
+ if (pair_chain && (&pair_chain->list != &pair_cnode->val))
+ return false;
+
+ return match;
+}
+
+static struct stream_node *stream_node_match(struct stream_node *base_node,
+ struct callchain_streams *cs_pair,
+ struct srclist *src_list,
+ bool *src_changed)
+{
+ *src_changed = false;
+
+ for (int i = 0; i < cs_pair->nr_streams; i++) {
+ struct stream_node *pair_node = &cs_pair->streams[i];
+ int nr_changed = 0;
+
+ if (callchain_node_matched(base_node->cnode, pair_node->cnode,
+ src_list, &nr_changed)) {
+ if (nr_changed)
+ *src_changed = true;
+
+ return pair_node;
+ }
+ }
+
+ return NULL;
+}
+
+static void stream_nodes_link(struct stream_node *base_node,
+ struct stream_node *pair_node,
+ bool src_changed)
+{
+ base_node->pair_cnode = pair_node->cnode;
+ base_node->src_changed = src_changed;
+ pair_node->pair_cnode = base_node->cnode;
+}
+
+void callchain_match_streams(struct callchain_streams *cs_base,
+ struct callchain_streams *cs_pair,
+ struct srclist *src_list)
+{
+ for (int i = 0; i < cs_base->nr_streams; i++) {
+ struct stream_node *base_node = &cs_base->streams[i];
+ struct stream_node *pair_node;
+ bool src_changed;
+
+ pair_node = stream_node_match(base_node, cs_pair, src_list,
+ &src_changed);
+ if (pair_node)
+ stream_nodes_link(base_node, pair_node, src_changed);
+ else
+ base_node->src_changed = src_changed;
+ }
+
+ if (src_list)
+ srclist__dump(src_list);
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 4e881361e154..c996ab4fb108 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -6,6 +6,7 @@
#include <linux/rbtree.h>
#include "map_symbol.h"
#include "branch.h"
+#include "srclist.h"

struct addr_location;
struct evsel;
@@ -131,6 +132,7 @@ struct callchain_list {
u64 iter_cycles;
struct branch_type_stat brtype_stat;
const char *srcline;
+ bool src_changed;
struct list_head list;
};

@@ -162,6 +164,8 @@ struct callchain_cursor {

struct stream_node {
struct callchain_node *cnode;
+ struct callchain_node *pair_cnode;
+ bool src_changed;
};

struct callchain_streams {
@@ -309,4 +313,8 @@ struct callchain_streams *callchain_evsel_streams_get(struct callchain_streams *
int nr_streams_max,
int evsel_idx);

+void callchain_match_streams(struct callchain_streams *cs_base,
+ struct callchain_streams *cs_pair,
+ struct srclist *src_list);
+
#endif /* __PERF_CALLCHAIN_H */
--
2.17.1

2020-03-13 07:14:03

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 12/14] perf util: Filter out streams by name of changed functions

Sometime some changes are not reflected in the source code,
e.g. changing the compiler option. So for this, we can't get
the changes by diffing the source code lines.

This idea is to let user provide a list of changed functions,
then perf-diff can know these functions are not matched between
old perf data and new perf data.

In this patch, the names of changed function names are stored
in strlist, and we will check if the symbol name is in strlist.
If yes, that means this function is changed.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/builtin-diff.c | 2 +-
tools/perf/util/callchain.c | 30 ++++++++++++++++++++++++++----
tools/perf/util/callchain.h | 4 +++-
3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 566e811054b1..2f891e8a5122 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -1017,7 +1017,7 @@ static int process_base_stream(struct data__file *data_base,
if (!es_pair)
return -1;

- callchain_match_streams(es_base, es_pair, pdiff.src_list);
+ callchain_match_streams(es_base, es_pair, pdiff.src_list, NULL);
callchain_stream_report(es_base, es_pair);
}

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 84fe8e418532..3894514e116c 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1810,16 +1810,35 @@ static bool chain_srclist_match(struct srclist *src_list, const char *srcline_a,
return false;
}

+static bool sym_name_changed(struct callchain_list *base_chain,
+ struct callchain_list *pair_chain,
+ struct strlist *func_list)
+{
+ if (!func_list || !base_chain->ms.sym || !pair_chain->ms.sym)
+ return false;
+
+ if (strlist__has_entry(func_list, base_chain->ms.sym->name))
+ return true;
+
+ return false;
+}
+
static bool chain_match(struct callchain_list *base_chain,
struct callchain_list *pair_chain,
struct srclist *src_list,
+ struct strlist *func_list,
bool *src_changed)
{
enum match_result match;
bool src_found = false;
+ bool func_changed;

*src_changed = false;

+ func_changed = sym_name_changed(base_chain, pair_chain, func_list);
+ if (func_changed)
+ return false;
+
/*
* Check sourceline first. If not matched,
* fallback to symbol match and address match.
@@ -1850,6 +1869,7 @@ static bool chain_match(struct callchain_list *base_chain,
static bool callchain_node_matched(struct callchain_node *base_cnode,
struct callchain_node *pair_cnode,
struct srclist *src_list,
+ struct strlist *func_list,
int *nr_changed)
{
struct callchain_list *base_chain, *pair_chain;
@@ -1871,7 +1891,7 @@ static bool callchain_node_matched(struct callchain_node *base_cnode,
}

match = chain_match(base_chain, pair_chain, src_list,
- &src_changed);
+ func_list, &src_changed);

if (src_changed) {
pair_chain->src_changed = true;
@@ -1897,6 +1917,7 @@ static bool callchain_node_matched(struct callchain_node *base_cnode,
static struct stream_node *stream_node_match(struct stream_node *base_node,
struct callchain_streams *cs_pair,
struct srclist *src_list,
+ struct strlist *func_list,
bool *src_changed)
{
*src_changed = false;
@@ -1906,7 +1927,7 @@ static struct stream_node *stream_node_match(struct stream_node *base_node,
int nr_changed = 0;

if (callchain_node_matched(base_node->cnode, pair_node->cnode,
- src_list, &nr_changed)) {
+ src_list, func_list, &nr_changed)) {
if (nr_changed)
*src_changed = true;

@@ -1928,7 +1949,8 @@ static void stream_nodes_link(struct stream_node *base_node,

void callchain_match_streams(struct callchain_streams *cs_base,
struct callchain_streams *cs_pair,
- struct srclist *src_list)
+ struct srclist *src_list,
+ struct strlist *func_list)
{
for (int i = 0; i < cs_base->nr_streams; i++) {
struct stream_node *base_node = &cs_base->streams[i];
@@ -1936,7 +1958,7 @@ void callchain_match_streams(struct callchain_streams *cs_base,
bool src_changed;

pair_node = stream_node_match(base_node, cs_pair, src_list,
- &src_changed);
+ func_list, &src_changed);
if (pair_node)
stream_nodes_link(base_node, pair_node, src_changed);
else
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 90826a280476..a6153af9f776 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -7,6 +7,7 @@
#include "map_symbol.h"
#include "branch.h"
#include "srclist.h"
+#include "strlist.h"

struct addr_location;
struct evsel;
@@ -316,7 +317,8 @@ struct callchain_streams *callchain_evsel_streams_get(struct callchain_streams *

void callchain_match_streams(struct callchain_streams *cs_base,
struct callchain_streams *cs_pair,
- struct srclist *src_list);
+ struct srclist *src_list,
+ struct strlist *func_list);

void callchain_stream_report(struct callchain_streams *cs_base,
struct callchain_streams *cs_pair);
--
2.17.1

2020-03-13 07:14:06

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 09/14] perf util: Add new block info fmts for showing hot blocks comparison

We support three new columns in output

New Stream Diff(cycles%,cycles):
The diff of 'Sampled Cycles%' and 'Avg Cycles' between new stream
and old stream for the same block.

If block is not matched, it reports "[block not in new stream]" or
"[block changed in new stream]".

New Stream Sampled Cycles%:
The 'Sampled Cycles%' of the block in new stream. It's only reported
when the block is changed in new stream.

New Stream Avg Cycles:
The 'Average Cycles' of the block in new stream. It's only reported
when the block is changed in new stream.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/block-info.c | 117 ++++++++++++++++++++++++++++++++++-
tools/perf/util/block-info.h | 4 ++
2 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c
index 431cd97bae39..96b7b81cabd5 100644
--- a/tools/perf/util/block-info.c
+++ b/tools/perf/util/block-info.c
@@ -41,7 +41,19 @@ static struct block_header_column {
[PERF_HPP_REPORT__BLOCK_DSO] = {
.name = "Shared Object",
.width = 20,
- }
+ },
+ [PERF_HPP__BLOCK_NEW_STREAM_DIFF] = {
+ .name = "New Stream Diff(cycles%,cycles)",
+ .width = 31,
+ },
+ [PERF_HPP__BLOCK_NEW_STREAM_TOTAL_CYCLES_PCT] = {
+ .name = "New Stream Sampled Cycles%",
+ .width = 26,
+ },
+ [PERF_HPP__BLOCK_NEW_STREAM_AVG_CYCLES] = {
+ .name = "New Stream Avg Cycles",
+ .width = 21,
+ },
};

struct block_info *block_info__get(struct block_info *bi)
@@ -342,6 +354,100 @@ static int block_dso_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
"[unknown]");
}

+static int block_new_stream_diff_entry(struct perf_hpp_fmt *fmt,
+ struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+ struct hist_entry *pair = hist_entry__next_pair(he);
+ struct block_info *bi_h, *bi_p;
+ double ratio_h = 0.0, ratio_p = 0.0;
+ s64 cycles_diff;
+ char buf[32];
+
+ if (!pair) {
+ return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
+ "[block not in new stream]");
+ }
+
+ bi_h = he->block_info;
+ bi_p = pair->block_info;
+
+ if (bi_p->block_changed) {
+ return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
+ "[block changed in new stream]");
+ }
+
+ if (bi_h->total_cycles)
+ ratio_h = (double)bi_h->cycles / (double)bi_h->total_cycles;
+
+ if (bi_p->total_cycles)
+ ratio_p = (double)bi_p->cycles / (double)bi_p->total_cycles;
+
+ cycles_diff = (s64)(bi_p->cycles_aggr / bi_p->num_aggr) -
+ (s64)(bi_h->cycles_aggr / bi_h->num_aggr);
+
+ snprintf(buf, sizeof(buf), "%4.2f%%, %4ld",
+ 100.0 * (ratio_p - ratio_h), cycles_diff);
+
+ return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
+}
+
+static int block_new_stream_total_cycles_pct_entry(struct perf_hpp_fmt *fmt,
+ struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+ struct hist_entry *pair = hist_entry__next_pair(he);
+ struct block_info *bi_p;
+ double ratio = 0.0;
+
+ if (!pair) {
+ return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
+ "-");
+ }
+
+ bi_p = pair->block_info;
+
+ if (!bi_p->block_changed) {
+ return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
+ "-");
+ }
+
+ if (bi_p->total_cycles)
+ ratio = (double)bi_p->cycles / (double)bi_p->total_cycles;
+
+ return color_pct(hpp, block_fmt->width, 100.0 * ratio);
+}
+
+static int block_new_stream_avg_cycles_entry(struct perf_hpp_fmt *fmt,
+ struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+ struct hist_entry *pair = hist_entry__next_pair(he);
+ struct block_info *bi_p;
+ char cycles_buf[16];
+
+ if (!pair) {
+ return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
+ "-");
+ }
+
+ bi_p = pair->block_info;
+
+ if (!bi_p->block_changed) {
+ return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
+ "-");
+ }
+
+ cycles_string(bi_p->cycles_aggr / bi_p->num_aggr, cycles_buf,
+ sizeof(cycles_buf));
+
+ return scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width,
+ cycles_buf);
+}
+
static void init_block_header(struct block_fmt *block_fmt)
{
struct perf_hpp_fmt *fmt = &block_fmt->fmt;
@@ -385,6 +491,15 @@ static void hpp_register(struct block_fmt *block_fmt, int idx,
case PERF_HPP_REPORT__BLOCK_DSO:
fmt->entry = block_dso_entry;
break;
+ case PERF_HPP__BLOCK_NEW_STREAM_DIFF:
+ fmt->entry = block_new_stream_diff_entry;
+ break;
+ case PERF_HPP__BLOCK_NEW_STREAM_TOTAL_CYCLES_PCT:
+ fmt->entry = block_new_stream_total_cycles_pct_entry;
+ break;
+ case PERF_HPP__BLOCK_NEW_STREAM_AVG_CYCLES:
+ fmt->entry = block_new_stream_avg_cycles_entry;
+ break;
default:
return;
}
diff --git a/tools/perf/util/block-info.h b/tools/perf/util/block-info.h
index 458bd998089d..4e22bce81731 100644
--- a/tools/perf/util/block-info.h
+++ b/tools/perf/util/block-info.h
@@ -22,6 +22,7 @@ struct block_info {
refcount_t refcnt;
struct block_line *line;
bool srcline_matched;
+ bool block_changed;
};

struct block_fmt {
@@ -40,6 +41,9 @@ enum {
PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
PERF_HPP_REPORT__BLOCK_RANGE,
PERF_HPP_REPORT__BLOCK_DSO,
+ PERF_HPP__BLOCK_NEW_STREAM_DIFF,
+ PERF_HPP__BLOCK_NEW_STREAM_TOTAL_CYCLES_PCT,
+ PERF_HPP__BLOCK_NEW_STREAM_AVG_CYCLES,
PERF_HPP_REPORT__BLOCK_MAX_INDEX
};

--
2.17.1

2020-03-13 07:14:30

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 14/14] perf diff: Filter out streams by changed functions

Sometime some changes are not reflected in the source code,
e.g. changing the compiler option. So for this, we can't get
the changes by diffing the source code lines.

This patch introduces a new perf-diff option "--changed-func".
It passes the names of changed functions then perf-diff can
know what functions are changed.

For example,
perf diff --stream --changed-func main --changed-func rand

It passes the function list {"main", "rand"} to perf-diff.
Now perf-diff knows the functions "main" and "rand" in new perf
data file are changed.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/Documentation/perf-diff.txt | 5 +++++
tools/perf/builtin-diff.c | 30 ++++++++++++++++++++++++--
2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-diff.txt b/tools/perf/Documentation/perf-diff.txt
index 296fea98ac07..784598c12e26 100644
--- a/tools/perf/Documentation/perf-diff.txt
+++ b/tools/perf/Documentation/perf-diff.txt
@@ -196,6 +196,11 @@ OPTIONS
Source code directory corresponding to perf.data. Should be
used with --stream and --before.

+--changed-func=::
+ The given function is changed in new perf data file. This option
+ needs to be used with --stream option. Multiple functions can be given
+ by using this option more than once.
+
COMPARISON
----------
The comparison is governed by the baseline file. The baseline perf.data
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 98e9ab8c69ce..5e5f29105fe1 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -27,6 +27,7 @@
#include "util/block-info.h"
#include "util/srclist.h"
#include "util/callchain.h"
+#include "util/strlist.h"
#include <linux/err.h>
#include <linux/zalloc.h>
#include <subcmd/pager.h>
@@ -49,6 +50,7 @@ struct perf_diff {
bool src_cmp;
bool stream;
struct srclist *src_list;
+ struct strlist *func_list;
u64 total_cycles;
float min_percent;
};
@@ -1017,7 +1019,8 @@ static int process_base_stream(struct data__file *data_base,
if (!es_pair)
return -1;

- callchain_match_streams(es_base, es_pair, pdiff.src_list, NULL);
+ callchain_match_streams(es_base, es_pair, pdiff.src_list,
+ pdiff.func_list);
callchain_stream_report(es_base, es_pair);
}

@@ -1043,7 +1046,7 @@ static int process_base_stream(struct data__file *data_base,
block_hists_addr2line(&rep_pair->hist.block_hists, pair_dir);

block_info__match_report(rep_base, rep_pair,
- pdiff.src_list, NULL, NULL);
+ pdiff.src_list, pdiff.func_list, NULL);

fprintf(stdout, "%s", title);

@@ -1213,6 +1216,24 @@ static struct callchain_streams *create_evsel_streams(struct evlist *evlist,
return evsel_streams;
}

+static int parse_func(const struct option *opt __maybe_unused,
+ const char *str, int unset __maybe_unused)
+{
+ int ret;
+
+ if (!pdiff.func_list) {
+ pdiff.func_list = strlist__new(NULL, NULL);
+ if (!pdiff.func_list)
+ return -ENOMEM;
+ }
+
+ ret = strlist__add(pdiff.func_list, str);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
static int __cmd_diff(void)
{
struct data__file *d;
@@ -1312,6 +1333,9 @@ static int __cmd_diff(void)
if (pdiff.src_list)
srclist__delete(pdiff.src_list);

+ if (pdiff.func_list)
+ strlist__delete(pdiff.func_list);
+
return ret;
}

@@ -1390,6 +1414,8 @@ static const struct option options[] = {
OPT_CALLBACK(0, "percent-limit", &pdiff, "percent",
"Don't show entries under that percent",
parse_percent_limit),
+ OPT_CALLBACK(0, "changed-func", NULL, "func",
+ "Given function is changed", parse_func),
OPT_END()
};

--
2.17.1

2020-03-13 07:14:32

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 13/14] perf util: Filter out blocks by name of changed functions

Similar as previous patch "perf util: Filter out streams
by name of changed functions", user provides a list of changed
functions, then perf-diff can know these functions are not
matched between old perf data and new perf data.

In this patch, the names of changed function names have been
stored in strlist, and we will check if the symbol name is in
strlist. If yes, that means this function is changed and the
related blocks are not matched.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/builtin-diff.c | 4 ++--
tools/perf/util/block-info.c | 18 ++++++++++++------
tools/perf/util/block-info.h | 4 +++-
3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 2f891e8a5122..98e9ab8c69ce 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -703,7 +703,7 @@ static void hists__precompute(struct hists *hists)
if (bh->valid && pair_bh->valid) {
block_hists_match(&bh->block_hists,
&pair_bh->block_hists,
- NULL,
+ NULL, NULL,
compute_cycles_diff);
hists__output_resort(&pair_bh->block_hists,
NULL);
@@ -1043,7 +1043,7 @@ static int process_base_stream(struct data__file *data_base,
block_hists_addr2line(&rep_pair->hist.block_hists, pair_dir);

block_info__match_report(rep_base, rep_pair,
- pdiff.src_list, NULL);
+ pdiff.src_list, NULL, NULL);

fprintf(stdout, "%s", title);

diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c
index 4d0275fbd0df..255b54ce2962 100644
--- a/tools/perf/util/block-info.c
+++ b/tools/perf/util/block-info.c
@@ -93,7 +93,7 @@ struct block_info *block_info__new(void)
}

int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right,
- struct srclist *src_list)
+ struct srclist *src_list, struct strlist *func_list)
{
struct block_info *bi_l = left->block_info;
struct block_info *bi_r = right->block_info;
@@ -113,6 +113,9 @@ int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right,
if (cmp)
return cmp;

+ if (func_list && strlist__has_entry(func_list, bi_l->sym->name))
+ return -1;
+
if (src_list && bi_l->line && bi_r->line) {
if (block_same_srcfiles(bi_l->line, bi_r->line) &&
bi_l->line->start_rel) {
@@ -143,7 +146,7 @@ int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right,
int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
struct hist_entry *left, struct hist_entry *right)
{
- return __block_info__cmp(left, right, NULL);
+ return __block_info__cmp(left, right, NULL, NULL);
}

static void init_block_info(struct block_info *bi, struct symbol *sym,
@@ -855,7 +858,8 @@ bool block_srclist_matched(struct srclist *slist, char *rel_path,

static struct hist_entry *get_block_pair(struct hist_entry *he,
struct hists *hists_pair,
- struct srclist *src_list)
+ struct srclist *src_list,
+ struct strlist *func_list)
{
struct rb_root_cached *root = hists_pair->entries_in;
struct rb_node *next = rb_first_cached(root);
@@ -867,7 +871,7 @@ static struct hist_entry *get_block_pair(struct hist_entry *he,

next = rb_next(&he_pair->rb_node_in);

- cmp = __block_info__cmp(he_pair, he, src_list);
+ cmp = __block_info__cmp(he_pair, he, src_list, func_list);
if (!cmp)
return he_pair;
}
@@ -878,6 +882,7 @@ static struct hist_entry *get_block_pair(struct hist_entry *he,
void block_hists_match(struct hists *hists_base,
struct hists *hists_pair,
struct srclist *src_list,
+ struct strlist *func_list,
void (*func)(struct hist_entry *,
struct hist_entry *))
{
@@ -888,7 +893,7 @@ void block_hists_match(struct hists *hists_base,
struct hist_entry *he = rb_entry(next, struct hist_entry,
rb_node_in);
struct hist_entry *pair = get_block_pair(he, hists_pair,
- src_list);
+ src_list, func_list);

next = rb_next(&he->rb_node_in);

@@ -904,6 +909,7 @@ void block_hists_match(struct hists *hists_base,
int block_info__match_report(struct block_report *rep_base,
struct block_report *rep_pair,
struct srclist *src_list,
+ struct strlist *func_list,
void (*func)(struct hist_entry *,
struct hist_entry *))
{
@@ -911,7 +917,7 @@ int block_info__match_report(struct block_report *rep_base,
struct block_hist *bh_pair = &rep_pair->hist;

block_hists_match(&bh_base->block_hists, &bh_pair->block_hists,
- src_list, func);
+ src_list, func_list, func);

return 0;
}
diff --git a/tools/perf/util/block-info.h b/tools/perf/util/block-info.h
index 4e22bce81731..2b440cef4ef7 100644
--- a/tools/perf/util/block-info.h
+++ b/tools/perf/util/block-info.h
@@ -70,7 +70,7 @@ static inline void __block_info__zput(struct block_info **bi)
#define block_info__zput(bi) __block_info__zput(&bi)

int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right,
- struct srclist *src_list __maybe_unused);
+ struct srclist *src_list, struct strlist *func_list);

int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
struct hist_entry *left, struct hist_entry *right);
@@ -108,12 +108,14 @@ bool block_srclist_matched(struct srclist *slist, char *rel_path,
void block_hists_match(struct hists *hists_base,
struct hists *hists_pair,
struct srclist *src_list,
+ struct strlist *func_list,
void (*func)(struct hist_entry *,
struct hist_entry *));

int block_info__match_report(struct block_report *rep_base,
struct block_report *rep_pair,
struct srclist *src_list,
+ struct strlist *func_list,
void (*func)(struct hist_entry *,
struct hist_entry *));

--
2.17.1

2020-03-13 07:14:41

by Jin Yao

[permalink] [raw]
Subject: [PATCH v2 01/14] perf util: Create source line mapping table

Sometimes, a small change in a hot function reducing the cycles of
this function, but the overall workload doesn't get faster. It is
interesting where the cycles are moved to.

What it would like is to diff before/after streams. A stream is a
callchain which is aggregated by the branch records from samples.

By browsing the hot streams, we can understand the hot code flow.
By comparing the cycles variation of same streams between old perf
data and new perf data, we can understand if the cycles are moved to
the unchanged code.

The before stream is the stream before source line changed
(e.g. streams in perf.data.old). The after stream is the stream
after source line changed (e.g. streams in perf.data).

Diffing before/after streams compares all streams (or compares top
N hot streams) between two perf data files.

If all entries of one stream in perf.data.old are fully matched with
all entries of another stream in perf.data, we think these two streams
are matched, otherwise the streams are not matched.

For example,

cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
-------------------------- --------------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44

It looks that two streams are matched and we can see for the same
streams the cycles are equal and the stream hit percents are
slightly changed. That's expected in the normal range.

But that's not always true if source lines are changed in perf.data
(e.g. div.c:39 is changed). If the source line is changed, they
are different streams, we can't compare them. We just think the
stream in perf.data is a new one.

The challenge is how to identify the changed source lines. The basic
idea is to use linux command 'diff' to compare the source file A and
source file A* line by line (assume A is used in perf.data.old
and A* is updated in perf.data). Create a source line mapping table.

For example,

Execute 'diff ./before/div.c ./after/div.c'

25c25
< i = rand() % 2;
---
> i = rand() % 4;
39c39
< for (i = 0; i < 2000000000; i++) {
---
> for (i = 0; i < 20000000001; i++) {

div.c (after -> before) lines mapping:
0 -> 0
1 -> 1
2 -> 2
3 -> 3
4 -> 4
5 -> 5
6 -> 6
7 -> 7
8 -> 8
9 -> 9
...
24 -> 24
25 -> -1
26 -> 26
27 -> 27
28 -> 28
29 -> 29
30 -> 30
31 -> 31
32 -> 32
33 -> 33
34 -> 34
35 -> 35
36 -> 36
37 -> 37
38 -> 38
39 -> -1
40 -> 40
...

From the table, we can easily know div.c:39 is source line changed.
(it's mapped to -1). So these two streams are not matched.

This patch can also handle the cases of new source lines insertion
and old source lines deletion. This source line mapping table is a
base for next processing for stream comparison.

This patch creates a new rblist 'srclist' to manage the source line
mapping tables. Each node contains the source line mapping table of
a before/after source file pair. The node is created on demand as
we process the samples. So it's likely we don't need to create so many
nodes.

v2:
---
Refine the code
1. calloc -> zalloc
2. Functions operating on a 'struct line_pair' have the
line_pair__ prefix.
3. Check get_range() return value.
4. Use fgets to replace fgetc for getting the number of lines.
5. Use path__join to generate the full path.
6. Keep the buffer after calling getline.

Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/Build | 1 +
tools/perf/util/srclist.c | 555 ++++++++++++++++++++++++++++++++++++++
tools/perf/util/srclist.h | 65 +++++
3 files changed, 621 insertions(+)
create mode 100644 tools/perf/util/srclist.c
create mode 100644 tools/perf/util/srclist.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index c0cf8dff694e..b9bf620b60f0 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -99,6 +99,7 @@ perf-y += call-path.o
perf-y += rwsem.o
perf-y += thread-stack.o
perf-y += spark.o
+perf-y += srclist.o
perf-$(CONFIG_AUXTRACE) += auxtrace.o
perf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
perf-$(CONFIG_AUXTRACE) += intel-pt.o
diff --git a/tools/perf/util/srclist.c b/tools/perf/util/srclist.c
new file mode 100644
index 000000000000..8060e4855d11
--- /dev/null
+++ b/tools/perf/util/srclist.c
@@ -0,0 +1,555 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Manage difference of source lines
+ * Copyright (c) 2020, Intel Corporation.
+ * Author: Jin Yao
+ */
+#include <stdlib.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <inttypes.h>
+#include <string.h>
+#include <strings.h>
+#include <unistd.h>
+#include <err.h>
+#include <errno.h>
+#include <limits.h>
+#include <stdbool.h>
+#include <linux/zalloc.h>
+#include "strlist.h"
+#include "srclist.h"
+#include "debug.h"
+#include "path.h"
+
+enum {
+ DIFF_LINE_ADD,
+ DIFF_LINE_DEL,
+ DIFF_LINE_CHANGE
+};
+
+static void get_full_path(const char *dir, const char *rel_path,
+ char *full_path, int size)
+{
+ if (!dir) {
+ snprintf(full_path, size, "%s", rel_path);
+ return;
+ }
+
+ path__join(full_path, size, dir, rel_path);
+}
+
+static int path__number_of_lines(char *path)
+{
+ FILE *fp;
+ char buf[4096], *p, last = 0;
+ int num = 0;
+
+ fp = fopen(path, "r");
+ if (!fp) {
+ pr_debug("Failed to open file %s\n", path);
+ return -1;
+ }
+
+ while (!feof(fp)) {
+ p = fgets(buf, sizeof(buf), fp);
+ if (!p)
+ break;
+
+ last = p[strlen(p) - 1];
+ if (last == '\n')
+ num++;
+ }
+
+ fclose(fp);
+
+ if (last != '\n' && last != 0)
+ num++;
+
+ return num;
+}
+
+static int get_digit(char *str, char **end_str, int *val)
+{
+ int v;
+ char *end;
+
+ errno = 0;
+ v = strtol(str, &end, 10);
+ if ((errno == ERANGE && (v == INT_MAX || v == INT_MIN))
+ || (errno != 0 && v == 0)) {
+ return -1;
+ }
+
+ if (end == str)
+ return -1;
+
+ *val = v;
+ *end_str = end;
+ return 0;
+}
+
+static int get_range(char *str, int *start_val, int *end_val)
+{
+ int val, ret;
+ char *end, *next;
+
+ *start_val = -1;
+ *end_val = -1;
+
+ ret = get_digit(str, &end, &val);
+ if (ret)
+ return ret;
+
+ *start_val = val;
+
+ if (*end != '\0') {
+ next = end + 1;
+ ret = get_digit(next, &end, &val);
+ if (ret)
+ return ret;
+
+ *end_val = val;
+ }
+
+ return 0;
+}
+
+static int opr_str_idx(char *str, int len, char ch)
+{
+ int i;
+
+ for (i = 0; i < len; i++) {
+ if (str[i] == ch)
+ break;
+ }
+
+ if (i == len || i == 0 || i == len - 1)
+ return -1;
+
+ if (str[i - 1] >= '0' && str[i - 1] <= '9' &&
+ str[i + 1] >= '0' && str[i + 1] <= '9') {
+ return i;
+ }
+
+ return -1;
+}
+
+static bool get_diff_operation(char *str, int *opr, int *b_start, int *b_end,
+ int *a_start, int *a_end)
+{
+ int idx, len;
+
+ if (str[0] == '<' || str[0] == '>' || str[0] == '-')
+ return false;
+
+ len = strlen(str);
+
+ /*
+ * For example,
+ * 5,6d4
+ * 8a7
+ * 20,21c19
+ */
+ idx = opr_str_idx(str, len, 'd');
+ if (idx != -1) {
+ *opr = DIFF_LINE_DEL;
+ goto exit;
+ }
+
+ idx = opr_str_idx(str, len, 'a');
+ if (idx != -1) {
+ *opr = DIFF_LINE_ADD;
+ goto exit;
+ }
+
+ idx = opr_str_idx(str, len, 'c');
+ if (idx != -1) {
+ *opr = DIFF_LINE_CHANGE;
+ goto exit;
+ }
+
+exit:
+ if (idx != -1) {
+ str[idx] = 0;
+ if (get_range(str, b_start, b_end) != 0)
+ return false;
+
+ if (get_range(&str[idx + 1], a_start, a_end) != 0)
+ return false;
+
+ return true;
+ }
+
+ return false;
+}
+
+static int line_pair__del_lines(struct line_pair *lines, int b_start, int b_end,
+ int a_start, int a_end __maybe_unused,
+ int *nr_lines, int *b_adjust)
+{
+ int i = *nr_lines;
+ bool set = false;
+
+ /*
+ * For example, 5,6d4
+ * It means line 5 ~ line 6 of file1 are not same as line 4 of file2
+ * because line 5 ~ line 6 are deleted.
+ */
+
+ if (a_start == i) {
+ lines[i].a_nr = i;
+ lines[i].b_nr = i + *b_adjust;
+ set = true;
+ }
+
+ if (b_end != -1)
+ *b_adjust += b_end - b_start + 1;
+ else
+ *b_adjust += 1;
+
+ if (!set) {
+ lines[i].a_nr = i;
+ lines[i].b_nr = i + *b_adjust;
+ }
+
+ *nr_lines = i + 1;
+ return 0;
+}
+
+static int line_pair__add_lines(struct line_pair *lines,
+ int b_start __maybe_unused,
+ int b_end __maybe_unused,
+ int a_start, int a_end, int *nr_lines,
+ int *b_adjust)
+{
+ int i = *nr_lines;
+
+ /*
+ * For example, 8a7
+ * It means line 8 at file1 is not same as line 7 at file2
+ * because a new line (line 7) is inserted to file2.
+ */
+ if (a_end != -1) {
+ for (int j = 0; j < a_end - a_start + 1; j++) {
+ lines[i].a_nr = i;
+ lines[i].b_nr = -1;
+ i++;
+ }
+ *b_adjust -= a_end - a_start + 1;
+ } else {
+ lines[i].a_nr = i;
+ lines[i].b_nr = -1;
+ i++;
+ *b_adjust -= 1;
+ }
+
+ *nr_lines = i;
+ return 0;
+}
+
+static int line_pair__change_lines(struct line_pair *lines, int b_start,
+ int b_end, int a_start, int a_end,
+ int *nr_lines, int *b_adjust)
+{
+ int i = *nr_lines;
+
+ /*
+ * For example, 20,21c19
+ * It means line20~line21 of file1 are not same as line19 of file2
+ * since they are changed.
+ */
+
+ if (a_end != -1) {
+ for (int j = 0; j < a_end - a_start + 1; j++) {
+ lines[i].a_nr = i;
+ lines[i].b_nr = -1;
+ i++;
+ }
+ } else {
+ lines[i].a_nr = i;
+ lines[i].b_nr = -1;
+ i++;
+ }
+
+ if (b_end != -1)
+ *b_adjust += b_end - b_start;
+
+ if (a_end != -1)
+ *b_adjust -= a_end - a_start;
+
+ *nr_lines = i;
+ return 0;
+}
+
+static int line_pair__update_lines(struct line_pair *lines, int opr,
+ int b_start, int b_end, int a_start,
+ int a_end, int *nr_lines, int *b_adjust)
+{
+ int i = *nr_lines;
+
+ while (i < a_start) {
+ lines[i].a_nr = i;
+ lines[i].b_nr = i + *b_adjust;
+ i++;
+ }
+
+ *nr_lines = i;
+
+ switch (opr) {
+ case DIFF_LINE_DEL:
+ line_pair__del_lines(lines, b_start, b_end, a_start, a_end,
+ nr_lines, b_adjust);
+ break;
+
+ case DIFF_LINE_ADD:
+ line_pair__add_lines(lines, b_start, b_end, a_start, a_end,
+ nr_lines, b_adjust);
+ break;
+
+ case DIFF_LINE_CHANGE:
+ line_pair__change_lines(lines, b_start, b_end, a_start,
+ a_end, nr_lines, b_adjust);
+ break;
+
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+static void line_pair__update_remaining(struct line_pair *lines, int a_num,
+ int *nr_lines, int b_adjust)
+{
+ int i;
+
+ for (i = *nr_lines; i <= a_num; i++) {
+ lines[i].a_nr = i;
+ lines[i].b_nr = i + b_adjust;
+ }
+
+ *nr_lines = i;
+}
+
+static int build_mapping_table(FILE *fp, struct line_pair *lines,
+ int *nr_lines, int a_num)
+{
+ char *str = NULL, *p;
+ size_t len = 0;
+ int opr, b_start, b_end, a_start, a_end, b_adjust = 0;
+
+ *nr_lines = 1; /* First line is reserved */
+
+ while (getline(&str, &len, fp) > 0) {
+ p = strchr(str, '\n');
+ if (p)
+ *p = '\0';
+
+ pr_debug("%s\n", str);
+
+ if (!get_diff_operation(str, &opr, &b_start, &b_end,
+ &a_start, &a_end)) {
+ continue;
+ }
+
+ line_pair__update_lines(lines, opr, b_start, b_end, a_start,
+ a_end, nr_lines, &b_adjust);
+ }
+
+ line_pair__update_remaining(lines, a_num, nr_lines, b_adjust);
+
+ free(str);
+ return 0;
+}
+
+static int src_info__create_line_mapping(struct src_info *info, char *b_path,
+ char *a_path)
+{
+ char cmd[PATH_MAX];
+ FILE *fp = NULL;
+ int ret = -1;
+
+ info->nr_lines_before = path__number_of_lines(b_path);
+ if (info->nr_lines_before == -1)
+ goto out;
+
+ info->nr_lines_after = path__number_of_lines(a_path);
+ if (info->nr_lines_after == -1)
+ goto out;
+
+ /* Reserve first line */
+ info->nr_max = info->nr_lines_before + info->nr_lines_after + 1;
+ info->lines = calloc(info->nr_max, sizeof(struct line_pair));
+ if (!info->lines) {
+ pr_err("Failed to alloc memory for lines\n");
+ goto out;
+ }
+
+ snprintf(cmd, sizeof(cmd), "diff %s %s",
+ b_path, a_path);
+
+ pr_debug("Execute '%s'\n", cmd);
+
+ fp = popen(cmd, "r");
+ if (!fp) {
+ pr_err("Failed to execute '%s'\n", cmd);
+ goto out;
+ }
+
+ ret = build_mapping_table(fp, info->lines, &info->nr_lines,
+ info->nr_lines_after);
+
+out:
+ if (fp)
+ fclose(fp);
+
+ return ret;
+}
+
+static void free_src_info(struct src_info *info)
+{
+ zfree(&info->rel_path);
+ zfree(&info->lines);
+}
+
+static int init_src_info(char *b_path, char *a_path, const char *rel_path,
+ struct src_info *info)
+{
+ int ret;
+
+ info->rel_path = strdup(rel_path);
+ if (!info->rel_path)
+ return -1;
+
+ ret = src_info__create_line_mapping(info, b_path, a_path);
+ if (ret)
+ free_src_info(info);
+
+ return ret;
+}
+
+static void free_src_node(struct src_node *node)
+{
+ if (!node)
+ return;
+
+ free_src_info(&node->info);
+ free(node);
+}
+
+static struct rb_node *srclist__node_new(struct rblist *rblist,
+ const void *entry)
+{
+ struct srclist *slist = container_of(rblist, struct srclist, rblist);
+ const char *rel_path = entry;
+ char b_path[PATH_MAX], a_path[PATH_MAX];
+ struct src_node *node;
+ int ret;
+
+ get_full_path(slist->before_dir, rel_path, b_path, PATH_MAX);
+ get_full_path(slist->after_dir, rel_path, a_path, PATH_MAX);
+
+ node = zalloc(sizeof(*node));
+ if (!node)
+ return NULL;
+
+ ret = init_src_info(b_path, a_path, rel_path, &node->info);
+ if (ret)
+ goto err;
+
+ return &node->rb_node;
+
+err:
+ free_src_node(node);
+ return NULL;
+}
+
+static void srclist__node_delete(struct rblist *rblist __maybe_unused,
+ struct rb_node *rb_node)
+{
+ struct src_node *node = container_of(rb_node, struct src_node, rb_node);
+
+ free_src_info(&node->info);
+ free(node);
+}
+
+static int srclist__node_cmp(struct rb_node *rb_node, const void *entry)
+{
+ struct src_node *node = container_of(rb_node, struct src_node, rb_node);
+ const char *str = entry;
+
+ return strcmp(node->info.rel_path, str);
+}
+
+struct srclist *srclist__new(const char *before_dir, const char *after_dir)
+{
+ struct srclist *slist = zalloc(sizeof(*slist));
+
+ if (!slist)
+ return NULL;
+
+ rblist__init(&slist->rblist);
+ slist->rblist.node_cmp = srclist__node_cmp;
+ slist->rblist.node_new = srclist__node_new;
+ slist->rblist.node_delete = srclist__node_delete;
+
+ slist->before_dir = before_dir;
+ slist->after_dir = after_dir;
+ return slist;
+}
+
+void srclist__delete(struct srclist *slist)
+{
+ if (slist)
+ rblist__delete(&slist->rblist);
+}
+
+struct src_node *srclist__find(struct srclist *slist, char *rel_path,
+ bool create)
+{
+ struct src_node *node = NULL;
+ struct rb_node *rb_node;
+
+ if (create)
+ rb_node = rblist__findnew(&slist->rblist, (void *)rel_path);
+ else
+ rb_node = rblist__find(&slist->rblist, (void *)rel_path);
+
+ if (rb_node)
+ node = container_of(rb_node, struct src_node, rb_node);
+
+ return node;
+}
+
+struct line_pair *srclist__line_pair(struct src_node *node, int a_nr)
+{
+ struct src_info *info = &node->info;
+
+ if (a_nr < info->nr_lines && a_nr > 0)
+ return &info->lines[a_nr];
+
+ pr_debug("Exceeds line range: nr_lines = %d, a_nr = %d\n",
+ info->nr_lines, a_nr);
+
+ return NULL;
+}
+
+void srclist__dump(struct srclist *slist)
+{
+ struct src_node *node;
+ struct src_info *info;
+ int i;
+
+ srclist__for_each_entry(node, slist) {
+ info = &node->info;
+
+ pr_debug("%s (after -> before) lines mapping:\n",
+ info->rel_path);
+
+ for (i = 0; i < info->nr_lines; i++) {
+ pr_debug("%d -> %d\n",
+ info->lines[i].a_nr,
+ info->lines[i].b_nr);
+ }
+ }
+}
diff --git a/tools/perf/util/srclist.h b/tools/perf/util/srclist.h
new file mode 100644
index 000000000000..f25b0de91a13
--- /dev/null
+++ b/tools/perf/util/srclist.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_SRCLIST_H
+#define __PERF_SRCLIST_H
+
+#include <linux/types.h>
+#include <linux/rbtree.h>
+#include "rblist.h"
+
+struct line_pair {
+ int a_nr; /* line nr in after.c */
+ int b_nr; /* line nr in before.c */
+};
+
+struct src_node;
+
+struct src_info {
+ char *rel_path; /* relative path */
+ struct line_pair *lines;
+ int nr_max;
+ int nr_lines;
+ int nr_lines_before;
+ int nr_lines_after;
+};
+
+struct src_node {
+ struct rb_node rb_node;
+ struct src_info info;
+};
+
+struct srclist {
+ struct rblist rblist;
+ const char *before_dir;
+ const char *after_dir;
+};
+
+struct srclist *srclist__new(const char *before_dir, const char *after_dir);
+void srclist__delete(struct srclist *slist);
+
+struct src_node *srclist__find(struct srclist *slist, char *rel_path,
+ bool create);
+
+struct line_pair *srclist__line_pair(struct src_node *node, int a_nr);
+void srclist__dump(struct srclist *slist);
+
+static inline struct src_node *srclist__first(struct srclist *slist)
+{
+ struct rb_node *rn = rb_first_cached(&slist->rblist.entries);
+
+ return rn ? rb_entry(rn, struct src_node, rb_node) : NULL;
+}
+
+static inline struct src_node *srclist__next(struct src_node *sn)
+{
+ struct rb_node *rn;
+
+ if (!sn)
+ return NULL;
+ rn = rb_next(&sn->rb_node);
+ return rn ? rb_entry(rn, struct src_node, rb_node) : NULL;
+}
+
+#define srclist__for_each_entry(pos, slist) \
+ for (pos = srclist__first(slist); pos; pos = srclist__next(pos))
+
+#endif /* __PERF_SRCLIST_H */
--
2.17.1

2020-03-18 10:20:18

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v2 00/14] perf: Stream comparison

On Fri, Mar 13, 2020 at 03:11:04PM +0800, Jin Yao wrote:

SNIP

>
> v2:
> ---
> Refine the codes for following patches:
> perf util: Create source line mapping table
> perf util: Create streams for managing top N hottest callchains
> perf util: Calculate the sum of all streams hits
> perf util: Add new block info functions for top N hot blocks comparison
>
> Jin Yao (14):
> perf util: Create source line mapping table
> perf util: Create streams for managing top N hottest callchains
> perf util: Return per-event callchain streams
> perf util: Compare two streams
> perf util: Calculate the sum of all streams hits
> perf util: Report hot streams
> perf diff: Support hot streams comparison
> perf util: Add new block info functions for top N hot blocks
> comparison
> perf util: Add new block info fmts for showing hot blocks comparison
> perf util: Enable block source line comparison
> perf diff: support hot blocks comparison
> perf util: Filter out streams by name of changed functions
> perf util: Filter out blocks by name of changed functions
> perf diff: Filter out streams by changed functions

I can't apply this on latest perf/core, do you have it in git tree branch?

thanks,
jirka

>
> tools/perf/Documentation/perf-diff.txt | 19 +
> tools/perf/builtin-diff.c | 324 ++++++++++++---
> tools/perf/util/Build | 1 +
> tools/perf/util/block-info.c | 433 ++++++++++++++++++-
> tools/perf/util/block-info.h | 38 +-
> tools/perf/util/callchain.c | 517 +++++++++++++++++++++++
> tools/perf/util/callchain.h | 34 ++
> tools/perf/util/srclist.c | 555 +++++++++++++++++++++++++
> tools/perf/util/srclist.h | 74 ++++
> 9 files changed, 1935 insertions(+), 60 deletions(-)
> create mode 100644 tools/perf/util/srclist.c
> create mode 100644 tools/perf/util/srclist.h
>
> --
> 2.17.1
>

2020-03-19 01:50:33

by Jin Yao

[permalink] [raw]
Subject: Re: [PATCH v2 00/14] perf: Stream comparison



On 3/18/2020 6:19 PM, Jiri Olsa wrote:
> On Fri, Mar 13, 2020 at 03:11:04PM +0800, Jin Yao wrote:
>
> SNIP
>
>>
>> v2:
>> ---
>> Refine the codes for following patches:
>> perf util: Create source line mapping table
>> perf util: Create streams for managing top N hottest callchains
>> perf util: Calculate the sum of all streams hits
>> perf util: Add new block info functions for top N hot blocks comparison
>>
>> Jin Yao (14):
>> perf util: Create source line mapping table
>> perf util: Create streams for managing top N hottest callchains
>> perf util: Return per-event callchain streams
>> perf util: Compare two streams
>> perf util: Calculate the sum of all streams hits
>> perf util: Report hot streams
>> perf diff: Support hot streams comparison
>> perf util: Add new block info functions for top N hot blocks
>> comparison
>> perf util: Add new block info fmts for showing hot blocks comparison
>> perf util: Enable block source line comparison
>> perf diff: support hot blocks comparison
>> perf util: Filter out streams by name of changed functions
>> perf util: Filter out blocks by name of changed functions
>> perf diff: Filter out streams by changed functions
>
> I can't apply this on latest perf/core, do you have it in git tree branch?
>
> thanks,
> jirka
>

Not having a git tree branch. :(

Anyway let me rebase to perf/core

Thanks
Jin Yao

>>
>> tools/perf/Documentation/perf-diff.txt | 19 +
>> tools/perf/builtin-diff.c | 324 ++++++++++++---
>> tools/perf/util/Build | 1 +
>> tools/perf/util/block-info.c | 433 ++++++++++++++++++-
>> tools/perf/util/block-info.h | 38 +-
>> tools/perf/util/callchain.c | 517 +++++++++++++++++++++++
>> tools/perf/util/callchain.h | 34 ++
>> tools/perf/util/srclist.c | 555 +++++++++++++++++++++++++
>> tools/perf/util/srclist.h | 74 ++++
>> 9 files changed, 1935 insertions(+), 60 deletions(-)
>> create mode 100644 tools/perf/util/srclist.c
>> create mode 100644 tools/perf/util/srclist.h
>>
>> --
>> 2.17.1
>>
>

2020-03-23 11:06:36

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v2 00/14] perf: Stream comparison

On Fri, Mar 13, 2020 at 03:11:04PM +0800, Jin Yao wrote:
> Sometimes, a small change in a hot function reducing the cycles of
> this function, but the overall workload doesn't get faster. It is
> interesting where the cycles are moved to.

I'm getting compilation fail:

BUILD: Doing 'make -j1' parallel build
CC util/srclist.o
util/srclist.c: In function ‘srclist__node_new’:
util/srclist.c:388:35: error: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size 4091 [-Werror=format-truncation=]
388 | snprintf(cmd, sizeof(cmd), "diff %s %s",
| ^~
......
456 | ret = init_src_info(b_path, a_path, rel_path, &node->info);
| ~~~~~~
In file included from /usr/include/stdio.h:867,
from util/srclist.c:8:
/usr/include/bits/stdio2.h:67:10: note: ‘__builtin___snprintf_chk’ output between 7 and 8197 bytes into a destination of size 4096
67 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
68 | __bos (__s), __fmt, __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
mv: cannot stat 'util/.srclist.o.tmp': No such file or directory
make[4]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:97: util/srclist.o] Error 1
make[3]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:139: util] Error 2
make[2]: *** [Makefile.perf:617: perf-in.o] Error 2
make[1]: *** [Makefile.perf:225: sub-make] Error 2
make: *** [Makefile:70: all] Error 2


[jolsa@krava ~]$ gcc --version
gcc (GCC) 9.3.1 20200317 (Red Hat 9.3.1-1)

jirka

2020-03-23 14:00:09

by Jin Yao

[permalink] [raw]
Subject: Re: [PATCH v2 00/14] perf: Stream comparison

Hi Jiri,

On 3/23/2020 7:05 PM, Jiri Olsa wrote:
> On Fri, Mar 13, 2020 at 03:11:04PM +0800, Jin Yao wrote:
>> Sometimes, a small change in a hot function reducing the cycles of
>> this function, but the overall workload doesn't get faster. It is
>> interesting where the cycles are moved to.
>
> I'm getting compilation fail:
>
> BUILD: Doing 'make -j1' parallel build
> CC util/srclist.o
> util/srclist.c: In function ‘srclist__node_new’:
> util/srclist.c:388:35: error: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size 4091 [-Werror=format-truncation=]
> 388 | snprintf(cmd, sizeof(cmd), "diff %s %s",
> | ^~
> ......
> 456 | ret = init_src_info(b_path, a_path, rel_path, &node->info);
> | ~~~~~~
> In file included from /usr/include/stdio.h:867,
> from util/srclist.c:8:
> /usr/include/bits/stdio2.h:67:10: note: ‘__builtin___snprintf_chk’ output between 7 and 8197 bytes into a destination of size 4096
> 67 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 68 | __bos (__s), __fmt, __va_arg_pack ());
> | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> cc1: all warnings being treated as errors
> mv: cannot stat 'util/.srclist.o.tmp': No such file or directory
> make[4]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:97: util/srclist.o] Error 1
> make[3]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:139: util] Error 2
> make[2]: *** [Makefile.perf:617: perf-in.o] Error 2
> make[1]: *** [Makefile.perf:225: sub-make] Error 2
> make: *** [Makefile:70: all] Error 2
>
>
> [jolsa@krava ~]$ gcc --version
> gcc (GCC) 9.3.1 20200317 (Red Hat 9.3.1-1)
>
> jirka
>

Can you help to add following patch on top of the patch-set? Looks we
need to check the return value of snprintf for truncation checking.

jinyao@kbl:~/kbl-ws/perf-dev/lck-7589/acme/tools/perf$ git diff
diff --git a/tools/perf/util/srclist.c b/tools/perf/util/srclist.c
index 8060e4855d11..51ca69eaa9fd 100644
--- a/tools/perf/util/srclist.c
+++ b/tools/perf/util/srclist.c
@@ -385,8 +385,12 @@ static int src_info__create_line_mapping(struct
src_info *info, char *b_path,
goto out;
}

- snprintf(cmd, sizeof(cmd), "diff %s %s",
- b_path, a_path);
+ ret = snprintf(cmd, PATH_MAX, "diff %s %s",
+ b_path, a_path);
+ if (ret == PATH_MAX) {
+ ret = -1;
+ goto out;
+ }

pr_debug("Execute '%s'\n", cmd);

Thanks
Jin Yao

2020-03-23 14:38:38

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v2 00/14] perf: Stream comparison

On Mon, Mar 23, 2020 at 09:59:13PM +0800, Jin, Yao wrote:
> Hi Jiri,
>
> On 3/23/2020 7:05 PM, Jiri Olsa wrote:
> > On Fri, Mar 13, 2020 at 03:11:04PM +0800, Jin Yao wrote:
> > > Sometimes, a small change in a hot function reducing the cycles of
> > > this function, but the overall workload doesn't get faster. It is
> > > interesting where the cycles are moved to.
> >
> > I'm getting compilation fail:
> >
> > BUILD: Doing 'make -j1' parallel build
> > CC util/srclist.o
> > util/srclist.c: In function ‘srclist__node_new’:
> > util/srclist.c:388:35: error: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size 4091 [-Werror=format-truncation=]
> > 388 | snprintf(cmd, sizeof(cmd), "diff %s %s",
> > | ^~
> > ......
> > 456 | ret = init_src_info(b_path, a_path, rel_path, &node->info);
> > | ~~~~~~
> > In file included from /usr/include/stdio.h:867,
> > from util/srclist.c:8:
> > /usr/include/bits/stdio2.h:67:10: note: ‘__builtin___snprintf_chk’ output between 7 and 8197 bytes into a destination of size 4096
> > 67 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
> > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 68 | __bos (__s), __fmt, __va_arg_pack ());
> > | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > cc1: all warnings being treated as errors
> > mv: cannot stat 'util/.srclist.o.tmp': No such file or directory
> > make[4]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:97: util/srclist.o] Error 1
> > make[3]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:139: util] Error 2
> > make[2]: *** [Makefile.perf:617: perf-in.o] Error 2
> > make[1]: *** [Makefile.perf:225: sub-make] Error 2
> > make: *** [Makefile:70: all] Error 2
> >
> >
> > [jolsa@krava ~]$ gcc --version
> > gcc (GCC) 9.3.1 20200317 (Red Hat 9.3.1-1)
> >
> > jirka
> >
>
> Can you help to add following patch on top of the patch-set? Looks we need
> to check the return value of snprintf for truncation checking.

yep, that helped.. I'll apply it on top for now
and try to do the review soon ;-)

thanks,
jirka

>
> jinyao@kbl:~/kbl-ws/perf-dev/lck-7589/acme/tools/perf$ git diff
> diff --git a/tools/perf/util/srclist.c b/tools/perf/util/srclist.c
> index 8060e4855d11..51ca69eaa9fd 100644
> --- a/tools/perf/util/srclist.c
> +++ b/tools/perf/util/srclist.c
> @@ -385,8 +385,12 @@ static int src_info__create_line_mapping(struct
> src_info *info, char *b_path,
> goto out;
> }
>
> - snprintf(cmd, sizeof(cmd), "diff %s %s",
> - b_path, a_path);
> + ret = snprintf(cmd, PATH_MAX, "diff %s %s",
> + b_path, a_path);
> + if (ret == PATH_MAX) {
> + ret = -1;
> + goto out;
> + }
>
> pr_debug("Execute '%s'\n", cmd);
>
> Thanks
> Jin Yao
>