Sometimes, a small change in a hot function reducing the cycles of
this function, but the overall workload doesn't get faster. It is
interesting where the cycles are moved to.
What it would like is to diff before/after streams. A stream we think
is a callchain which is aggregated by the branch records from perf
samples.
By browsing the hot streams, we can understand the hot code path.
By comparing the cycles variation of same streams between old perf
data and new perf data, we can understand if the cycles are moved
to other codes.
The before stream is the stream in perf.data.old. The after stream
is the stream in perf.data.
Diffing before/after streams compares top N hottest streams between
two perf data files.
If all entries of one stream in perf.data.old are fully matched with
all entries of another stream in perf.data, we think two streams
are matched, otherwise the streams are not matched.
For example,
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
-------------------------- --------------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44
The above streams are matched and we can see for the same streams the
cycles (1) are equal and the callchain hit percents are slightly changed
(26.80% vs. 27.30%). That's expected.
But that's not always true if source code is changed in perf.data
(e.g. div.c:39 is changed). If div.c:39 is changed, they are different
streams, we can't compare them. We will think the stream in perf.data
is a new stream.
The challenge is how to identify the changed source lines. The basic
idea is to use linux command "diff" to compare the source file A and
source file A* line by line (assume file A is used in perf.data.old
and file A* is used in perf.data). According to "diff" output,
we can generate a source line mapping table.
For example,
Execute 'diff ./before/div.c ./after/div.c'
25c25
< i = rand() % 2;
---
> i = rand() % 4;
39c39
< for (i = 0; i < 2000000000; i++) {
---
> for (i = 0; i < 20000000001; i++) {
div.c (after -> before) lines mapping:
0 -> 0
1 -> 1
2 -> 2
3 -> 3
4 -> 4
5 -> 5
6 -> 6
7 -> 7
8 -> 8
9 -> 9
...
24 -> 24
25 -> -1
26 -> 26
27 -> 27
28 -> 28
29 -> 29
30 -> 30
31 -> 31
32 -> 32
33 -> 33
34 -> 34
35 -> 35
36 -> 36
37 -> 37
38 -> 38
39 -> -1
40 -> 40
...
From the table, we can easily know div.c:39 is source line changed.
(it's mapped to -1). So following two streams are not matched.
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
-------------------------- --------------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44
Now let's see examples.
perf record -b ... Generate perf.data.old with branch data
perf record -b ... Generate perf.data with branch data
perf diff --stream
[ Matched hot chains between old perf data and new perf data) ]
hot chain pair 1:
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
--------------------------- --------------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44
hot chain pair 2:
cycles: 35, hits: 21.43% cycles: 33, hits: 19.37%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
main div.c:40 main div.c:40
main div.c:39 main div.c:39
hot chain pair 3:
cycles: 18, hits: 6.10% cycles: 19, hits: 6.51%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
hot chain pair 4:
cycles: 9, hits: 5.95% cycles: 8, hits: 5.03%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
[ Hot chains in old perf data but source line changed (*) in new perf data ]
[ Hot chains in old perf data only ]
hot chain 1:
cycles: 2, hits: 4.08%
--------------------------
main div.c:42
compute_flag div.c:28
[ Hot chains in new perf data only ]
hot chain 1:
cycles: 36, hits: 3.36%
--------------------------
__random_r random_r.c:357
__random random.c:293
__random random.c:293
__random random.c:291
__random random.c:291
__random random.c:291
__random random.c:288
rand rand.c:27
rand rand.c:26
rand@plt
rand@plt
compute_flag div.c:25
compute_flag div.c:22
main div.c:40
main div.c:40
If we enable the source line comparison option, the output may be different.
perf record -b ... Generate perf.data.old with branch data
perf record -b ... Generate perf.data with branch data
perf diff --stream --before ./before --after ./after
[ Matched hot chains between old perf data and new perf data) ]
hot chain pair 1:
cycles: 18, hits: 6.10% cycles: 19, hits: 6.51%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
hot chain pair 2:
cycles: 9, hits: 5.95% cycles: 8, hits: 5.03%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
[ Hot chains in old perf data but source line changed (*) in new perf data ]
hot chain pair 1:
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
--------------------------- --------------------------
main div.c:39 main div.c:39*
main div.c:44 main div.c:44
hot chain pair 2:
cycles: 35, hits: 21.43% cycles: 33, hits: 19.37%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
main div.c:40 main div.c:40
main div.c:39 main div.c:39*
[ Hot chains in old perf data only ]
hot chain 1:
cycles: 2, hits: 4.08%
--------------------------
main div.c:42
compute_flag div.c:28
[ Hot chains in new perf data only ]
hot chain 1:
cycles: 36, hits: 3.36%
--------------------------
__random_r random_r.c:357
__random random.c:293
__random random.c:293
__random random.c:291
__random random.c:291
__random random.c:291
__random random.c:288
rand rand.c:27
rand rand.c:26
rand@plt
rand@plt
compute_flag div.c:25
compute_flag div.c:22
main div.c:40
main div.c:40
Now we can see, following streams pair is moved to another section
"[ Hot chains in old perf data but source line changed (*) in new perf data ]"
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
--------------------------- --------------------------
main div.c:39 main div.c:39*
main div.c:44 main div.c:44
v3:
---
v2 has 14 patches, it's hard to review.
v3 is only 7 patches for basic stream comparison.
Jin Yao (7):
perf util: Create source line mapping table
perf util: Create streams for managing top N hottest callchains
perf util: Return per-event callchain streams
perf util: Compare two streams
perf util: Calculate the sum of all streams hits
perf util: Report hot streams
perf diff: Support hot streams comparison
tools/perf/Documentation/perf-diff.txt | 14 +
tools/perf/builtin-diff.c | 170 +++++++-
tools/perf/util/Build | 1 +
tools/perf/util/callchain.c | 495 ++++++++++++++++++++++
tools/perf/util/callchain.h | 32 ++
tools/perf/util/srclist.c | 555 +++++++++++++++++++++++++
tools/perf/util/srclist.h | 65 +++
7 files changed, 1319 insertions(+), 13 deletions(-)
create mode 100644 tools/perf/util/srclist.c
create mode 100644 tools/perf/util/srclist.h
--
2.17.1
We think the stream is a callchain which is aggregated by the LBR
records from samples. By browsing the stream, we can understand
the code flow.
The struct callchain_node represents one callchain and we use the
callchain_node->hit to measure the hot level of this callchain.
Higher is hotter.
Since in perf data file, there may be many callchains so we just
need to focus on the top N hottest callchains. N is a user defined
parameter or just a predefined default value.
This patch saves the top N hottest callchains in 'struct stream_node'
type array, which is defined in a per event 'struct callchain_streams'.
So now we can get the per-event top N hottest callchains.
v2:
---
Use zfree in free_evsel_streams().
Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/callchain.c | 122 ++++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 16 +++++
2 files changed, 138 insertions(+)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 818aa4efd386..0c028caaeb19 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -31,6 +31,7 @@
#include "callchain.h"
#include "branch.h"
#include "symbol.h"
+#include "evlist.h"
#include "../perf.h"
#define CALLCHAIN_PARAM_DEFAULT \
@@ -1599,3 +1600,124 @@ void callchain_cursor_reset(struct callchain_cursor *cursor)
for (node = cursor->first; node != NULL; node = node->next)
map__zput(node->ms.map);
}
+
+static void free_evsel_streams(struct callchain_streams *callchain_streams,
+ int nr_evsel)
+{
+ for (int i = 0; i < nr_evsel; i++)
+ zfree(&callchain_streams[i].streams);
+
+ free(callchain_streams);
+}
+
+static struct callchain_streams *create_evsel_streams(int nr_evsel,
+ int nr_streams_max)
+{
+ struct callchain_streams *callchain_streams;
+
+ callchain_streams = calloc(nr_evsel, sizeof(struct callchain_streams));
+ if (!callchain_streams)
+ return NULL;
+
+ for (int i = 0; i < nr_evsel; i++) {
+ struct callchain_streams *s = &callchain_streams[i];
+
+ s->streams = calloc(nr_streams_max, sizeof(struct stream_node));
+ if (!s->streams)
+ goto err;
+
+ s->nr_streams_max = nr_streams_max;
+ s->evsel_idx = -1;
+ }
+
+ return callchain_streams;
+
+err:
+ free_evsel_streams(callchain_streams, nr_evsel);
+ return NULL;
+}
+
+/*
+ * The cnodes with high hit number are hot callchains.
+ */
+static void set_hot_cnode(struct callchain_streams *s,
+ struct callchain_node *cnode)
+{
+ int i, idx = 0;
+ u64 hit;
+
+ if (s->nr_streams < s->nr_streams_max) {
+ i = s->nr_streams;
+ s->streams[i].cnode = cnode;
+ s->nr_streams++;
+ return;
+ }
+
+ /*
+ * Since only a few number of hot streams, so only use simple
+ * way to find the cnode with smallest hit number and replace.
+ */
+ hit = (s->streams[0].cnode)->hit;
+ for (i = 1; i < s->nr_streams; i++) {
+ if ((s->streams[i].cnode)->hit < hit) {
+ hit = (s->streams[i].cnode)->hit;
+ idx = i;
+ }
+ }
+
+ if (cnode->hit > hit)
+ s->streams[idx].cnode = cnode;
+}
+
+static void update_hot_streams(struct hist_entry *he,
+ struct callchain_streams *s)
+{
+ struct rb_root *root = &he->sorted_chain;
+ struct rb_node *rb_node = rb_first(root);
+ struct callchain_node *node;
+
+ while (rb_node) {
+ node = rb_entry(rb_node, struct callchain_node, rb_node);
+ set_hot_cnode(s, node);
+ rb_node = rb_next(rb_node);
+ }
+}
+
+static void get_hot_streams(struct hists *hists,
+ struct callchain_streams *s)
+{
+ struct rb_node *next = rb_first_cached(&hists->entries);
+
+ while (next) {
+ struct hist_entry *he;
+
+ he = rb_entry(next, struct hist_entry, rb_node);
+ update_hot_streams(he, s);
+ next = rb_next(&he->rb_node);
+ }
+}
+
+struct callchain_streams *callchain_evsel_streams_create(struct evlist *evlist,
+ int nr_streams_max,
+ int *nr_evsel_streams)
+{
+ int nr_evsel = evlist->core.nr_entries, i = 0;
+ struct callchain_streams *callchain_streams;
+ struct evsel *pos;
+
+ callchain_streams = create_evsel_streams(nr_evsel, nr_streams_max);
+ if (!callchain_streams)
+ return NULL;
+
+ evlist__for_each_entry(evlist, pos) {
+ struct hists *hists = evsel__hists(pos);
+
+ hists__output_resort(hists, NULL);
+ get_hot_streams(hists, &callchain_streams[i]);
+ callchain_streams[i].evsel_idx = pos->idx;
+ i++;
+ }
+
+ *nr_evsel_streams = nr_evsel;
+ return callchain_streams;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 8f668ee29f25..6a93ad84d395 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -13,6 +13,7 @@ struct ip_callchain;
struct map;
struct perf_sample;
struct thread;
+struct evlist;
#define HELP_PAD "\t\t\t\t"
@@ -167,6 +168,17 @@ struct callchain_cursor {
struct callchain_cursor_node *curr;
};
+struct stream_node {
+ struct callchain_node *cnode;
+};
+
+struct callchain_streams {
+ struct stream_node *streams;
+ int nr_streams_max;
+ int nr_streams;
+ int evsel_idx;
+};
+
extern __thread struct callchain_cursor callchain_cursor;
static inline void callchain_init(struct callchain_root *root)
@@ -297,4 +309,8 @@ int callchain_branch_counts(struct callchain_root *root,
u64 *branch_count, u64 *predicted_count,
u64 *abort_count, u64 *cycles_count);
+struct callchain_streams *callchain_evsel_streams_create(struct evlist *evlist,
+ int nr_streams_max,
+ int *nr_evsel_streams);
+
#endif /* __PERF_CALLCHAIN_H */
--
2.17.1
In previous patch, we have created a 'struct callchain_streams'
array and each array entry contains per-event callchain streams.
This patch returns the pointer of per-event callchain streams
according to the evsel_idx.
Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/callchain.c | 12 ++++++++++++
tools/perf/util/callchain.h | 4 ++++
2 files changed, 16 insertions(+)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 0c028caaeb19..bf66f33debd4 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1721,3 +1721,15 @@ struct callchain_streams *callchain_evsel_streams_create(struct evlist *evlist,
*nr_evsel_streams = nr_evsel;
return callchain_streams;
}
+
+struct callchain_streams *callchain_evsel_streams_get(struct callchain_streams *cs,
+ int nr_streams_max,
+ int evsel_idx)
+{
+ for (int i = 0; i < nr_streams_max; i++) {
+ if (cs[i].evsel_idx == evsel_idx)
+ return &cs[i];
+ }
+
+ return NULL;
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 6a93ad84d395..6ff9d86d74d3 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -313,4 +313,8 @@ struct callchain_streams *callchain_evsel_streams_create(struct evlist *evlist,
int nr_streams_max,
int *nr_evsel_streams);
+struct callchain_streams *callchain_evsel_streams_get(struct callchain_streams *cs,
+ int nr_streams_max,
+ int evsel_idx);
+
#endif /* __PERF_CALLCHAIN_H */
--
2.17.1
Sometimes, a small change in a hot function reducing the cycles of
this function, but the overall workload doesn't get faster. It is
interesting where the cycles are moved to.
What it would like is to diff before/after streams. A stream is a
callchain which is aggregated by the branch records from samples.
By browsing the hot streams, we can understand the hot code flow.
By comparing the cycles variation of same streams between old perf
data and new perf data, we can understand if the cycles are moved to
the unchanged code.
The before stream is the stream before source line changed
(e.g. streams in perf.data.old). The after stream is the stream
after source line changed (e.g. streams in perf.data).
Diffing before/after streams compares all streams (or compares top
N hot streams) between two perf data files.
If all entries of one stream in perf.data.old are fully matched with
all entries of another stream in perf.data, we think these two streams
are matched, otherwise the streams are not matched.
For example,
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
-------------------------- --------------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44
It looks that two streams are matched and we can see for the same
streams the cycles are equal and the stream hit percents are
slightly changed. That's expected in the normal range.
But that's not always true if source lines are changed in perf.data
(e.g. div.c:39 is changed). If the source line is changed, they
are different streams, we can't compare them. We just think the
stream in perf.data is a new one.
The challenge is how to identify the changed source lines. The basic
idea is to use linux command 'diff' to compare the source file A and
source file A* line by line (assume A is used in perf.data.old
and A* is updated in perf.data). Create a source line mapping table.
For example,
Execute 'diff ./before/div.c ./after/div.c'
25c25
< i = rand() % 2;
---
> i = rand() % 4;
39c39
< for (i = 0; i < 2000000000; i++) {
---
> for (i = 0; i < 20000000001; i++) {
div.c (after -> before) lines mapping:
0 -> 0
1 -> 1
2 -> 2
3 -> 3
4 -> 4
5 -> 5
6 -> 6
7 -> 7
8 -> 8
9 -> 9
...
24 -> 24
25 -> -1
26 -> 26
27 -> 27
28 -> 28
29 -> 29
30 -> 30
31 -> 31
32 -> 32
33 -> 33
34 -> 34
35 -> 35
36 -> 36
37 -> 37
38 -> 38
39 -> -1
40 -> 40
...
From the table, we can easily know div.c:39 is source line changed.
(it's mapped to -1). So these two streams are not matched.
This patch can also handle the cases of new source lines insertion
and old source lines deletion. This source line mapping table is a
base for next processing for stream comparison.
This patch creates a new rblist 'srclist' to manage the source line
mapping tables. Each node contains the source line mapping table of
a before/after source file pair. The node is created on demand as
we process the samples. So it's likely we don't need to create so many
nodes.
v2:
---
Refine the code
1. calloc -> zalloc
2. Functions operating on a 'struct line_pair' have the
line_pair__ prefix.
3. Check get_range() return value.
4. Use fgets to replace fgetc for getting the number of lines.
5. Use path__join to generate the full path.
6. Keep the buffer after calling getline.
Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/Build | 1 +
tools/perf/util/srclist.c | 555 ++++++++++++++++++++++++++++++++++++++
tools/perf/util/srclist.h | 65 +++++
3 files changed, 621 insertions(+)
create mode 100644 tools/perf/util/srclist.c
create mode 100644 tools/perf/util/srclist.h
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index c0cf8dff694e..b9bf620b60f0 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -99,6 +99,7 @@ perf-y += call-path.o
perf-y += rwsem.o
perf-y += thread-stack.o
perf-y += spark.o
+perf-y += srclist.o
perf-$(CONFIG_AUXTRACE) += auxtrace.o
perf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
perf-$(CONFIG_AUXTRACE) += intel-pt.o
diff --git a/tools/perf/util/srclist.c b/tools/perf/util/srclist.c
new file mode 100644
index 000000000000..8060e4855d11
--- /dev/null
+++ b/tools/perf/util/srclist.c
@@ -0,0 +1,555 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Manage difference of source lines
+ * Copyright (c) 2020, Intel Corporation.
+ * Author: Jin Yao
+ */
+#include <stdlib.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <inttypes.h>
+#include <string.h>
+#include <strings.h>
+#include <unistd.h>
+#include <err.h>
+#include <errno.h>
+#include <limits.h>
+#include <stdbool.h>
+#include <linux/zalloc.h>
+#include "strlist.h"
+#include "srclist.h"
+#include "debug.h"
+#include "path.h"
+
+enum {
+ DIFF_LINE_ADD,
+ DIFF_LINE_DEL,
+ DIFF_LINE_CHANGE
+};
+
+static void get_full_path(const char *dir, const char *rel_path,
+ char *full_path, int size)
+{
+ if (!dir) {
+ snprintf(full_path, size, "%s", rel_path);
+ return;
+ }
+
+ path__join(full_path, size, dir, rel_path);
+}
+
+static int path__number_of_lines(char *path)
+{
+ FILE *fp;
+ char buf[4096], *p, last = 0;
+ int num = 0;
+
+ fp = fopen(path, "r");
+ if (!fp) {
+ pr_debug("Failed to open file %s\n", path);
+ return -1;
+ }
+
+ while (!feof(fp)) {
+ p = fgets(buf, sizeof(buf), fp);
+ if (!p)
+ break;
+
+ last = p[strlen(p) - 1];
+ if (last == '\n')
+ num++;
+ }
+
+ fclose(fp);
+
+ if (last != '\n' && last != 0)
+ num++;
+
+ return num;
+}
+
+static int get_digit(char *str, char **end_str, int *val)
+{
+ int v;
+ char *end;
+
+ errno = 0;
+ v = strtol(str, &end, 10);
+ if ((errno == ERANGE && (v == INT_MAX || v == INT_MIN))
+ || (errno != 0 && v == 0)) {
+ return -1;
+ }
+
+ if (end == str)
+ return -1;
+
+ *val = v;
+ *end_str = end;
+ return 0;
+}
+
+static int get_range(char *str, int *start_val, int *end_val)
+{
+ int val, ret;
+ char *end, *next;
+
+ *start_val = -1;
+ *end_val = -1;
+
+ ret = get_digit(str, &end, &val);
+ if (ret)
+ return ret;
+
+ *start_val = val;
+
+ if (*end != '\0') {
+ next = end + 1;
+ ret = get_digit(next, &end, &val);
+ if (ret)
+ return ret;
+
+ *end_val = val;
+ }
+
+ return 0;
+}
+
+static int opr_str_idx(char *str, int len, char ch)
+{
+ int i;
+
+ for (i = 0; i < len; i++) {
+ if (str[i] == ch)
+ break;
+ }
+
+ if (i == len || i == 0 || i == len - 1)
+ return -1;
+
+ if (str[i - 1] >= '0' && str[i - 1] <= '9' &&
+ str[i + 1] >= '0' && str[i + 1] <= '9') {
+ return i;
+ }
+
+ return -1;
+}
+
+static bool get_diff_operation(char *str, int *opr, int *b_start, int *b_end,
+ int *a_start, int *a_end)
+{
+ int idx, len;
+
+ if (str[0] == '<' || str[0] == '>' || str[0] == '-')
+ return false;
+
+ len = strlen(str);
+
+ /*
+ * For example,
+ * 5,6d4
+ * 8a7
+ * 20,21c19
+ */
+ idx = opr_str_idx(str, len, 'd');
+ if (idx != -1) {
+ *opr = DIFF_LINE_DEL;
+ goto exit;
+ }
+
+ idx = opr_str_idx(str, len, 'a');
+ if (idx != -1) {
+ *opr = DIFF_LINE_ADD;
+ goto exit;
+ }
+
+ idx = opr_str_idx(str, len, 'c');
+ if (idx != -1) {
+ *opr = DIFF_LINE_CHANGE;
+ goto exit;
+ }
+
+exit:
+ if (idx != -1) {
+ str[idx] = 0;
+ if (get_range(str, b_start, b_end) != 0)
+ return false;
+
+ if (get_range(&str[idx + 1], a_start, a_end) != 0)
+ return false;
+
+ return true;
+ }
+
+ return false;
+}
+
+static int line_pair__del_lines(struct line_pair *lines, int b_start, int b_end,
+ int a_start, int a_end __maybe_unused,
+ int *nr_lines, int *b_adjust)
+{
+ int i = *nr_lines;
+ bool set = false;
+
+ /*
+ * For example, 5,6d4
+ * It means line 5 ~ line 6 of file1 are not same as line 4 of file2
+ * because line 5 ~ line 6 are deleted.
+ */
+
+ if (a_start == i) {
+ lines[i].a_nr = i;
+ lines[i].b_nr = i + *b_adjust;
+ set = true;
+ }
+
+ if (b_end != -1)
+ *b_adjust += b_end - b_start + 1;
+ else
+ *b_adjust += 1;
+
+ if (!set) {
+ lines[i].a_nr = i;
+ lines[i].b_nr = i + *b_adjust;
+ }
+
+ *nr_lines = i + 1;
+ return 0;
+}
+
+static int line_pair__add_lines(struct line_pair *lines,
+ int b_start __maybe_unused,
+ int b_end __maybe_unused,
+ int a_start, int a_end, int *nr_lines,
+ int *b_adjust)
+{
+ int i = *nr_lines;
+
+ /*
+ * For example, 8a7
+ * It means line 8 at file1 is not same as line 7 at file2
+ * because a new line (line 7) is inserted to file2.
+ */
+ if (a_end != -1) {
+ for (int j = 0; j < a_end - a_start + 1; j++) {
+ lines[i].a_nr = i;
+ lines[i].b_nr = -1;
+ i++;
+ }
+ *b_adjust -= a_end - a_start + 1;
+ } else {
+ lines[i].a_nr = i;
+ lines[i].b_nr = -1;
+ i++;
+ *b_adjust -= 1;
+ }
+
+ *nr_lines = i;
+ return 0;
+}
+
+static int line_pair__change_lines(struct line_pair *lines, int b_start,
+ int b_end, int a_start, int a_end,
+ int *nr_lines, int *b_adjust)
+{
+ int i = *nr_lines;
+
+ /*
+ * For example, 20,21c19
+ * It means line20~line21 of file1 are not same as line19 of file2
+ * since they are changed.
+ */
+
+ if (a_end != -1) {
+ for (int j = 0; j < a_end - a_start + 1; j++) {
+ lines[i].a_nr = i;
+ lines[i].b_nr = -1;
+ i++;
+ }
+ } else {
+ lines[i].a_nr = i;
+ lines[i].b_nr = -1;
+ i++;
+ }
+
+ if (b_end != -1)
+ *b_adjust += b_end - b_start;
+
+ if (a_end != -1)
+ *b_adjust -= a_end - a_start;
+
+ *nr_lines = i;
+ return 0;
+}
+
+static int line_pair__update_lines(struct line_pair *lines, int opr,
+ int b_start, int b_end, int a_start,
+ int a_end, int *nr_lines, int *b_adjust)
+{
+ int i = *nr_lines;
+
+ while (i < a_start) {
+ lines[i].a_nr = i;
+ lines[i].b_nr = i + *b_adjust;
+ i++;
+ }
+
+ *nr_lines = i;
+
+ switch (opr) {
+ case DIFF_LINE_DEL:
+ line_pair__del_lines(lines, b_start, b_end, a_start, a_end,
+ nr_lines, b_adjust);
+ break;
+
+ case DIFF_LINE_ADD:
+ line_pair__add_lines(lines, b_start, b_end, a_start, a_end,
+ nr_lines, b_adjust);
+ break;
+
+ case DIFF_LINE_CHANGE:
+ line_pair__change_lines(lines, b_start, b_end, a_start,
+ a_end, nr_lines, b_adjust);
+ break;
+
+ default:
+ break;
+ }
+
+ return 0;
+}
+
+static void line_pair__update_remaining(struct line_pair *lines, int a_num,
+ int *nr_lines, int b_adjust)
+{
+ int i;
+
+ for (i = *nr_lines; i <= a_num; i++) {
+ lines[i].a_nr = i;
+ lines[i].b_nr = i + b_adjust;
+ }
+
+ *nr_lines = i;
+}
+
+static int build_mapping_table(FILE *fp, struct line_pair *lines,
+ int *nr_lines, int a_num)
+{
+ char *str = NULL, *p;
+ size_t len = 0;
+ int opr, b_start, b_end, a_start, a_end, b_adjust = 0;
+
+ *nr_lines = 1; /* First line is reserved */
+
+ while (getline(&str, &len, fp) > 0) {
+ p = strchr(str, '\n');
+ if (p)
+ *p = '\0';
+
+ pr_debug("%s\n", str);
+
+ if (!get_diff_operation(str, &opr, &b_start, &b_end,
+ &a_start, &a_end)) {
+ continue;
+ }
+
+ line_pair__update_lines(lines, opr, b_start, b_end, a_start,
+ a_end, nr_lines, &b_adjust);
+ }
+
+ line_pair__update_remaining(lines, a_num, nr_lines, b_adjust);
+
+ free(str);
+ return 0;
+}
+
+static int src_info__create_line_mapping(struct src_info *info, char *b_path,
+ char *a_path)
+{
+ char cmd[PATH_MAX];
+ FILE *fp = NULL;
+ int ret = -1;
+
+ info->nr_lines_before = path__number_of_lines(b_path);
+ if (info->nr_lines_before == -1)
+ goto out;
+
+ info->nr_lines_after = path__number_of_lines(a_path);
+ if (info->nr_lines_after == -1)
+ goto out;
+
+ /* Reserve first line */
+ info->nr_max = info->nr_lines_before + info->nr_lines_after + 1;
+ info->lines = calloc(info->nr_max, sizeof(struct line_pair));
+ if (!info->lines) {
+ pr_err("Failed to alloc memory for lines\n");
+ goto out;
+ }
+
+ snprintf(cmd, sizeof(cmd), "diff %s %s",
+ b_path, a_path);
+
+ pr_debug("Execute '%s'\n", cmd);
+
+ fp = popen(cmd, "r");
+ if (!fp) {
+ pr_err("Failed to execute '%s'\n", cmd);
+ goto out;
+ }
+
+ ret = build_mapping_table(fp, info->lines, &info->nr_lines,
+ info->nr_lines_after);
+
+out:
+ if (fp)
+ fclose(fp);
+
+ return ret;
+}
+
+static void free_src_info(struct src_info *info)
+{
+ zfree(&info->rel_path);
+ zfree(&info->lines);
+}
+
+static int init_src_info(char *b_path, char *a_path, const char *rel_path,
+ struct src_info *info)
+{
+ int ret;
+
+ info->rel_path = strdup(rel_path);
+ if (!info->rel_path)
+ return -1;
+
+ ret = src_info__create_line_mapping(info, b_path, a_path);
+ if (ret)
+ free_src_info(info);
+
+ return ret;
+}
+
+static void free_src_node(struct src_node *node)
+{
+ if (!node)
+ return;
+
+ free_src_info(&node->info);
+ free(node);
+}
+
+static struct rb_node *srclist__node_new(struct rblist *rblist,
+ const void *entry)
+{
+ struct srclist *slist = container_of(rblist, struct srclist, rblist);
+ const char *rel_path = entry;
+ char b_path[PATH_MAX], a_path[PATH_MAX];
+ struct src_node *node;
+ int ret;
+
+ get_full_path(slist->before_dir, rel_path, b_path, PATH_MAX);
+ get_full_path(slist->after_dir, rel_path, a_path, PATH_MAX);
+
+ node = zalloc(sizeof(*node));
+ if (!node)
+ return NULL;
+
+ ret = init_src_info(b_path, a_path, rel_path, &node->info);
+ if (ret)
+ goto err;
+
+ return &node->rb_node;
+
+err:
+ free_src_node(node);
+ return NULL;
+}
+
+static void srclist__node_delete(struct rblist *rblist __maybe_unused,
+ struct rb_node *rb_node)
+{
+ struct src_node *node = container_of(rb_node, struct src_node, rb_node);
+
+ free_src_info(&node->info);
+ free(node);
+}
+
+static int srclist__node_cmp(struct rb_node *rb_node, const void *entry)
+{
+ struct src_node *node = container_of(rb_node, struct src_node, rb_node);
+ const char *str = entry;
+
+ return strcmp(node->info.rel_path, str);
+}
+
+struct srclist *srclist__new(const char *before_dir, const char *after_dir)
+{
+ struct srclist *slist = zalloc(sizeof(*slist));
+
+ if (!slist)
+ return NULL;
+
+ rblist__init(&slist->rblist);
+ slist->rblist.node_cmp = srclist__node_cmp;
+ slist->rblist.node_new = srclist__node_new;
+ slist->rblist.node_delete = srclist__node_delete;
+
+ slist->before_dir = before_dir;
+ slist->after_dir = after_dir;
+ return slist;
+}
+
+void srclist__delete(struct srclist *slist)
+{
+ if (slist)
+ rblist__delete(&slist->rblist);
+}
+
+struct src_node *srclist__find(struct srclist *slist, char *rel_path,
+ bool create)
+{
+ struct src_node *node = NULL;
+ struct rb_node *rb_node;
+
+ if (create)
+ rb_node = rblist__findnew(&slist->rblist, (void *)rel_path);
+ else
+ rb_node = rblist__find(&slist->rblist, (void *)rel_path);
+
+ if (rb_node)
+ node = container_of(rb_node, struct src_node, rb_node);
+
+ return node;
+}
+
+struct line_pair *srclist__line_pair(struct src_node *node, int a_nr)
+{
+ struct src_info *info = &node->info;
+
+ if (a_nr < info->nr_lines && a_nr > 0)
+ return &info->lines[a_nr];
+
+ pr_debug("Exceeds line range: nr_lines = %d, a_nr = %d\n",
+ info->nr_lines, a_nr);
+
+ return NULL;
+}
+
+void srclist__dump(struct srclist *slist)
+{
+ struct src_node *node;
+ struct src_info *info;
+ int i;
+
+ srclist__for_each_entry(node, slist) {
+ info = &node->info;
+
+ pr_debug("%s (after -> before) lines mapping:\n",
+ info->rel_path);
+
+ for (i = 0; i < info->nr_lines; i++) {
+ pr_debug("%d -> %d\n",
+ info->lines[i].a_nr,
+ info->lines[i].b_nr);
+ }
+ }
+}
diff --git a/tools/perf/util/srclist.h b/tools/perf/util/srclist.h
new file mode 100644
index 000000000000..f25b0de91a13
--- /dev/null
+++ b/tools/perf/util/srclist.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_SRCLIST_H
+#define __PERF_SRCLIST_H
+
+#include <linux/types.h>
+#include <linux/rbtree.h>
+#include "rblist.h"
+
+struct line_pair {
+ int a_nr; /* line nr in after.c */
+ int b_nr; /* line nr in before.c */
+};
+
+struct src_node;
+
+struct src_info {
+ char *rel_path; /* relative path */
+ struct line_pair *lines;
+ int nr_max;
+ int nr_lines;
+ int nr_lines_before;
+ int nr_lines_after;
+};
+
+struct src_node {
+ struct rb_node rb_node;
+ struct src_info info;
+};
+
+struct srclist {
+ struct rblist rblist;
+ const char *before_dir;
+ const char *after_dir;
+};
+
+struct srclist *srclist__new(const char *before_dir, const char *after_dir);
+void srclist__delete(struct srclist *slist);
+
+struct src_node *srclist__find(struct srclist *slist, char *rel_path,
+ bool create);
+
+struct line_pair *srclist__line_pair(struct src_node *node, int a_nr);
+void srclist__dump(struct srclist *slist);
+
+static inline struct src_node *srclist__first(struct srclist *slist)
+{
+ struct rb_node *rn = rb_first_cached(&slist->rblist.entries);
+
+ return rn ? rb_entry(rn, struct src_node, rb_node) : NULL;
+}
+
+static inline struct src_node *srclist__next(struct src_node *sn)
+{
+ struct rb_node *rn;
+
+ if (!sn)
+ return NULL;
+ rn = rb_next(&sn->rb_node);
+ return rn ? rb_entry(rn, struct src_node, rb_node) : NULL;
+}
+
+#define srclist__for_each_entry(pos, slist) \
+ for (pos = srclist__first(slist); pos; pos = srclist__next(pos))
+
+#endif /* __PERF_SRCLIST_H */
--
2.17.1
We have used callchain_node->hit to measure the hot level of one
stream. This patch calculates the sum of hits of all streams.
Then in next patch, we can use following formula to report hot
percent for one stream.
hot percent = callchain_node->hit / sum of all hits
v2:
---
Combine the variable decl line with its initial assignment
in total_callchain_hits().
Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/callchain.c | 34 ++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 1 +
2 files changed, 35 insertions(+)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 1619f7fa4076..b0c8757c2dcf 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1683,6 +1683,38 @@ static void update_hot_streams(struct hist_entry *he,
}
}
+static u64 count_callchain_hits(struct hist_entry *he)
+{
+ struct rb_root *root = &he->sorted_chain;
+ struct rb_node *rb_node = rb_first(root);
+ struct callchain_node *node;
+ u64 chain_hits = 0;
+
+ while (rb_node) {
+ node = rb_entry(rb_node, struct callchain_node, rb_node);
+ chain_hits += node->hit;
+ rb_node = rb_next(rb_node);
+ }
+
+ return chain_hits;
+}
+
+static u64 total_callchain_hits(struct hists *hists)
+{
+ struct rb_node *next = rb_first_cached(&hists->entries);
+ u64 chain_hits = 0;
+
+ while (next) {
+ struct hist_entry *he = rb_entry(next, struct hist_entry,
+ rb_node);
+
+ chain_hits += count_callchain_hits(he);
+ next = rb_next(&he->rb_node);
+ }
+
+ return chain_hits;
+}
+
static void get_hot_streams(struct hists *hists,
struct callchain_streams *s)
{
@@ -1695,6 +1727,8 @@ static void get_hot_streams(struct hists *hists,
update_hot_streams(he, s);
next = rb_next(&he->rb_node);
}
+
+ s->chain_hits = total_callchain_hits(hists);
}
struct callchain_streams *callchain_evsel_streams_create(struct evlist *evlist,
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 52aa01b9eedf..a9b20b785dc7 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -181,6 +181,7 @@ struct callchain_streams {
int nr_streams_max;
int nr_streams;
int evsel_idx;
+ u64 chain_hits;
};
extern __thread struct callchain_cursor callchain_cursor;
--
2.17.1
We show the streams separately. They are divided into different sections.
1. "Matched hot chains between old perf data and new perf data"
2. "Hot chains in old perf data but source line changed in new perf data"
3. "Hot chains in old perf data only"
4. "Hot chains in new perf data only".
For each stream, we report the cycles and hot percent (hits%).
For example,
cycles: 2, hits: 4.08%
--------------------------
main div.c:42
compute_flag div.c:28
Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/callchain.c | 155 ++++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 3 +
2 files changed, 158 insertions(+)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index b0c8757c2dcf..84fe8e418532 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1777,6 +1777,9 @@ static bool chain_srclist_match(struct srclist *src_list, const char *srcline_a,
struct src_node *node;
struct line_pair *lp;
+ if (!srcline_a || !srcline_b)
+ return false;
+
if (sscanf(srcline_a, "%[^:]:%d", file_a, &line_a) != 2)
return false;
@@ -1817,6 +1820,10 @@ static bool chain_match(struct callchain_list *base_chain,
*src_changed = false;
+ /*
+ * Check sourceline first. If not matched,
+ * fallback to symbol match and address match.
+ */
if (src_list && chain_srclist_match(src_list, pair_chain->srcline,
base_chain->srcline, &src_found,
src_changed)) {
@@ -1939,3 +1946,151 @@ void callchain_match_streams(struct callchain_streams *cs_base,
if (src_list)
srclist__dump(src_list);
}
+
+static s64 callchain_avg_cycles(struct callchain_node *cnode)
+{
+ struct callchain_list *chain;
+ s64 cycles = 0;
+
+ list_for_each_entry(chain, &cnode->val, list) {
+ if (chain->srcline && chain->branch_count)
+ cycles += chain->cycles_count / chain->branch_count;
+ }
+
+ return cycles;
+}
+
+static void print_callchain_pair(struct stream_node *base_node, int idx,
+ struct callchain_streams *cs_base,
+ struct callchain_streams *cs_pair)
+{
+ struct callchain_node *base_cnode = base_node->cnode;
+ struct callchain_node *pair_cnode = base_node->pair_cnode;
+ struct callchain_list *base_chain, *pair_chain;
+ char buf1[512], buf2[512], cbuf1[256], cbuf2[256];
+ char *s1, *s2;
+ double pct;
+
+ printf("\nhot chain pair %d:\n", idx);
+
+ pct = (double)base_cnode->hit / (double)cs_base->chain_hits;
+ scnprintf(buf1, sizeof(buf1), "cycles: %ld, hits: %.2f%%",
+ callchain_avg_cycles(base_cnode), pct * 100.0);
+
+ pct = (double)pair_cnode->hit / (double)cs_pair->chain_hits;
+ scnprintf(buf2, sizeof(buf2), "cycles: %ld, hits: %.2f%%",
+ callchain_avg_cycles(pair_cnode), pct * 100.0);
+
+ printf("%35s\t%35s\n", buf1, buf2);
+
+ printf("%35s\t%35s\n",
+ "---------------------------",
+ "--------------------------");
+
+ pair_chain = list_first_entry(&pair_cnode->val,
+ struct callchain_list,
+ list);
+
+ list_for_each_entry(base_chain, &base_cnode->val, list) {
+ if (&pair_chain->list == &pair_cnode->val)
+ return;
+
+ s1 = callchain_list__sym_name(base_chain, cbuf1, sizeof(cbuf1),
+ false);
+ s2 = callchain_list__sym_name(pair_chain, cbuf2, sizeof(cbuf2),
+ false);
+
+ if (!pair_chain->src_changed)
+ scnprintf(buf1, sizeof(buf1), "%35s\t%35s", s1, s2);
+ else
+ scnprintf(buf1, sizeof(buf1), "%35s\t%35s*", s1, s2);
+
+ printf("%s\n", buf1);
+ pair_chain = list_next_entry(pair_chain, list);
+ }
+}
+
+static void print_stream_callchain(struct stream_node *node, int idx,
+ struct callchain_streams *cs, bool pair)
+{
+ struct callchain_node *cnode = node->cnode;
+ struct callchain_list *chain;
+ char buf[512], cbuf[256], *s;
+ double pct;
+
+ printf("\nhot chain %d:\n", idx);
+
+ pct = (double)cnode->hit / (double)cs->chain_hits;
+ scnprintf(buf, sizeof(buf), "cycles: %ld, hits: %.2f%%",
+ callchain_avg_cycles(cnode), pct * 100.0);
+
+ if (pair) {
+ printf("%35s\t%35s\n", "", buf);
+ printf("%35s\t%35s\n",
+ "", "--------------------------");
+ } else {
+ printf("%35s\n", buf);
+ printf("%35s\n", "--------------------------");
+ }
+
+ list_for_each_entry(chain, &cnode->val, list) {
+ s = callchain_list__sym_name(chain, cbuf, sizeof(cbuf), false);
+
+ if (pair)
+ scnprintf(buf, sizeof(buf), "%35s\t%35s", "", s);
+ else
+ scnprintf(buf, sizeof(buf), "%35s", s);
+
+ printf("%s\n", buf);
+ }
+}
+
+void callchain_stream_report(struct callchain_streams *cs_base,
+ struct callchain_streams *cs_pair)
+{
+ struct stream_node *base_node;
+ int i, idx = 0;
+
+ printf("[ Matched hot chains between old perf data and new perf data ]\n");
+ for (i = 0; i < cs_base->nr_streams; i++) {
+ base_node = &cs_base->streams[i];
+ if (base_node->pair_cnode) {
+ if (!base_node->src_changed) {
+ print_callchain_pair(base_node, ++idx,
+ cs_base, cs_pair);
+ }
+ }
+ }
+
+ idx = 0;
+ printf("\n[ Hot chains in old perf data but source line changed (*) in new perf data ]\n");
+ for (i = 0; i < cs_base->nr_streams; i++) {
+ base_node = &cs_base->streams[i];
+ if (base_node->pair_cnode) {
+ if (base_node->src_changed) {
+ print_callchain_pair(base_node, ++idx,
+ cs_base, cs_pair);
+ }
+ }
+ }
+
+ idx = 0;
+ printf("\n[ Hot chains in old perf data only ]\n");
+ for (i = 0; i < cs_base->nr_streams; i++) {
+ base_node = &cs_base->streams[i];
+ if (!base_node->pair_cnode) {
+ print_stream_callchain(base_node, ++idx,
+ cs_base, false);
+ }
+ }
+
+ idx = 0;
+ printf("\n[ Hot chains in new perf data only ]\n");
+ for (i = 0; i < cs_pair->nr_streams; i++) {
+ base_node = &cs_pair->streams[i];
+ if (!base_node->pair_cnode) {
+ print_stream_callchain(base_node, ++idx,
+ cs_pair, true);
+ }
+ }
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index a9b20b785dc7..8835d63a212f 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -326,4 +326,7 @@ void callchain_match_streams(struct callchain_streams *cs_base,
struct callchain_streams *cs_pair,
struct srclist *src_list);
+void callchain_stream_report(struct callchain_streams *cs_base,
+ struct callchain_streams *cs_pair);
+
#endif /* __PERF_CALLCHAIN_H */
--
2.17.1
If all callchain entries of one stream are fully matched with
all entries of another stream, we think these two streams are
matched.
For example,
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
----------------------- -----------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44
Above two streams are matched if div.c is not changed.
But the streams are not matched if div.c is source line changed
(e.g. div.c:39 is changed). If the source line is changed, they
are different streams.
So the challenge is how to identify the changed source lines.
For this purpose, we use the source line mapping table (see patch
"perf util: Create source line mapping table"). The source line
mapping information is saved in src_list.
So the matching logic is, if we can find the source lines, the
callchain entry comparison is based on source lines. Otherwise,
fallback to dso address comparison.
Once two streams are fully matched, they will be linked as
a pair. From the pair, we can know which streams are matched.
The pair information will be used in next patches.
Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/util/callchain.c | 172 ++++++++++++++++++++++++++++++++++++
tools/perf/util/callchain.h | 8 ++
2 files changed, 180 insertions(+)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index bf66f33debd4..1619f7fa4076 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1733,3 +1733,175 @@ struct callchain_streams *callchain_evsel_streams_get(struct callchain_streams *
return NULL;
}
+
+static bool chain_srclist_match(struct srclist *src_list, const char *srcline_a,
+ const char *srcline_b, bool *src_found,
+ bool *src_changed)
+{
+ char file_a[PATH_MAX], file_b[PATH_MAX];
+ int line_a, line_b;
+ struct src_node *node;
+ struct line_pair *lp;
+
+ if (sscanf(srcline_a, "%[^:]:%d", file_a, &line_a) != 2)
+ return false;
+
+ if (sscanf(srcline_b, "%[^:]:%d", file_b, &line_b) != 2)
+ return false;
+
+ if (strcmp(file_a, file_b))
+ return false;
+
+ node = srclist__find(src_list, file_a, true);
+ if (!node)
+ return false;
+
+ *src_found = true;
+
+ lp = srclist__line_pair(node, line_a);
+ if (!lp)
+ return false;
+
+ if (line_a == line_b && lp->b_nr == -1) {
+ *src_changed = true;
+ return true;
+ }
+
+ if (lp->b_nr == line_b)
+ return true;
+
+ return false;
+}
+
+static bool chain_match(struct callchain_list *base_chain,
+ struct callchain_list *pair_chain,
+ struct srclist *src_list,
+ bool *src_changed)
+{
+ enum match_result match;
+ bool src_found = false;
+
+ *src_changed = false;
+
+ if (src_list && chain_srclist_match(src_list, pair_chain->srcline,
+ base_chain->srcline, &src_found,
+ src_changed)) {
+ return true;
+ }
+
+ if (!src_found) {
+ match = match_chain_strings(base_chain->srcline,
+ pair_chain->srcline);
+ if (match != MATCH_ERROR)
+ return match == MATCH_EQ;
+
+ match = match_chain_dso_addresses(base_chain->ms.map,
+ base_chain->ip,
+ pair_chain->ms.map,
+ pair_chain->ip);
+
+ return match == MATCH_EQ;
+ }
+
+ return false;
+}
+
+static bool callchain_node_matched(struct callchain_node *base_cnode,
+ struct callchain_node *pair_cnode,
+ struct srclist *src_list,
+ int *nr_changed)
+{
+ struct callchain_list *base_chain, *pair_chain;
+ bool match = false;
+
+ pair_chain = list_first_entry(&pair_cnode->val,
+ struct callchain_list,
+ list);
+
+ list_for_each_entry(base_chain, &base_cnode->val, list) {
+ bool src_changed;
+
+ if (&pair_chain->list == &pair_cnode->val)
+ return false;
+
+ if (!base_chain->srcline || !pair_chain->srcline) {
+ pair_chain = list_next_entry(pair_chain, list);
+ continue;
+ }
+
+ match = chain_match(base_chain, pair_chain, src_list,
+ &src_changed);
+
+ if (src_changed) {
+ pair_chain->src_changed = true;
+ *nr_changed += 1;
+ }
+
+ if (!match)
+ return false;
+
+ pair_chain = list_next_entry(pair_chain, list);
+ }
+
+ /*
+ * Say chain1 is ABC, chain2 is ABCD, we consider they are
+ * not fully matched.
+ */
+ if (pair_chain && (&pair_chain->list != &pair_cnode->val))
+ return false;
+
+ return match;
+}
+
+static struct stream_node *stream_node_match(struct stream_node *base_node,
+ struct callchain_streams *cs_pair,
+ struct srclist *src_list,
+ bool *src_changed)
+{
+ *src_changed = false;
+
+ for (int i = 0; i < cs_pair->nr_streams; i++) {
+ struct stream_node *pair_node = &cs_pair->streams[i];
+ int nr_changed = 0;
+
+ if (callchain_node_matched(base_node->cnode, pair_node->cnode,
+ src_list, &nr_changed)) {
+ if (nr_changed)
+ *src_changed = true;
+
+ return pair_node;
+ }
+ }
+
+ return NULL;
+}
+
+static void stream_nodes_link(struct stream_node *base_node,
+ struct stream_node *pair_node,
+ bool src_changed)
+{
+ base_node->pair_cnode = pair_node->cnode;
+ base_node->src_changed = src_changed;
+ pair_node->pair_cnode = base_node->cnode;
+}
+
+void callchain_match_streams(struct callchain_streams *cs_base,
+ struct callchain_streams *cs_pair,
+ struct srclist *src_list)
+{
+ for (int i = 0; i < cs_base->nr_streams; i++) {
+ struct stream_node *base_node = &cs_base->streams[i];
+ struct stream_node *pair_node;
+ bool src_changed;
+
+ pair_node = stream_node_match(base_node, cs_pair, src_list,
+ &src_changed);
+ if (pair_node)
+ stream_nodes_link(base_node, pair_node, src_changed);
+ else
+ base_node->src_changed = src_changed;
+ }
+
+ if (src_list)
+ srclist__dump(src_list);
+}
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 6ff9d86d74d3..52aa01b9eedf 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -6,6 +6,7 @@
#include <linux/rbtree.h>
#include "map_symbol.h"
#include "branch.h"
+#include "srclist.h"
struct addr_location;
struct evsel;
@@ -131,6 +132,7 @@ struct callchain_list {
u64 iter_cycles;
struct branch_type_stat brtype_stat;
const char *srcline;
+ bool src_changed;
struct list_head list;
};
@@ -170,6 +172,8 @@ struct callchain_cursor {
struct stream_node {
struct callchain_node *cnode;
+ struct callchain_node *pair_cnode;
+ bool src_changed;
};
struct callchain_streams {
@@ -317,4 +321,8 @@ struct callchain_streams *callchain_evsel_streams_get(struct callchain_streams *
int nr_streams_max,
int evsel_idx);
+void callchain_match_streams(struct callchain_streams *cs_base,
+ struct callchain_streams *cs_pair,
+ struct srclist *src_list);
+
#endif /* __PERF_CALLCHAIN_H */
--
2.17.1
This patch enables perf-diff with "--stream", "--before" and "--after"
options.
"--stream": Enable hot streams comparison
"--before": Source code directory corresponding to perf.data.old
"--after" : Source code directory corresponding to perf.data
Now let's see examples.
perf record -b ... Generate perf.data.old with branch data
perf record -b ... Generate perf.data with branch data
perf diff --stream
[ Matched hot chains between old perf data and new perf data) ]
hot chain pair 1:
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
--------------------------- --------------------------
main div.c:39 main div.c:39
main div.c:44 main div.c:44
hot chain pair 2:
cycles: 35, hits: 21.43% cycles: 33, hits: 19.37%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
main div.c:40 main div.c:40
main div.c:39 main div.c:39
hot chain pair 3:
cycles: 18, hits: 6.10% cycles: 19, hits: 6.51%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
hot chain pair 4:
cycles: 9, hits: 5.95% cycles: 8, hits: 5.03%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
[ Hot chains in old perf data but source line changed (*) in new perf data ]
[ Hot chains in old perf data only ]
hot chain 1:
cycles: 2, hits: 4.08%
--------------------------
main div.c:42
compute_flag div.c:28
[ Hot chains in new perf data only ]
hot chain 1:
cycles: 36, hits: 3.36%
--------------------------
__random_r random_r.c:357
__random random.c:293
__random random.c:293
__random random.c:291
__random random.c:291
__random random.c:291
__random random.c:288
rand rand.c:27
rand rand.c:26
rand@plt
rand@plt
compute_flag div.c:25
compute_flag div.c:22
main div.c:40
main div.c:40
If we enable the source line comparison option, the output may be different.
perf record -b ... Generate perf.data.old with branch data
perf record -b ... Generate perf.data with branch data
perf diff --stream --before ./before --after ./after
[ Matched hot chains between old perf data and new perf data) ]
hot chain pair 1:
cycles: 18, hits: 6.10% cycles: 19, hits: 6.51%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
hot chain pair 2:
cycles: 9, hits: 5.95% cycles: 8, hits: 5.03%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
[ Hot chains in old perf data but source line changed (*) in new perf data ]
hot chain pair 1:
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
--------------------------- --------------------------
main div.c:39 main div.c:39*
main div.c:44 main div.c:44
hot chain pair 2:
cycles: 35, hits: 21.43% cycles: 33, hits: 19.37%
--------------------------- --------------------------
__random_r random_r.c:360 __random_r random_r.c:360
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:388 __random_r random_r.c:388
__random_r random_r.c:380 __random_r random_r.c:380
__random_r random_r.c:357 __random_r random_r.c:357
__random random.c:293 __random random.c:293
__random random.c:293 __random random.c:293
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:291 __random random.c:291
__random random.c:288 __random random.c:288
rand rand.c:27 rand rand.c:27
rand rand.c:26 rand rand.c:26
rand@plt rand@plt
rand@plt rand@plt
compute_flag div.c:25 compute_flag div.c:25
compute_flag div.c:22 compute_flag div.c:22
main div.c:40 main div.c:40
main div.c:40 main div.c:40
main div.c:39 main div.c:39*
[ Hot chains in old perf data only ]
hot chain 1:
cycles: 2, hits: 4.08%
--------------------------
main div.c:42
compute_flag div.c:28
[ Hot chains in new perf data only ]
hot chain 1:
cycles: 36, hits: 3.36%
--------------------------
__random_r random_r.c:357
__random random.c:293
__random random.c:293
__random random.c:291
__random random.c:291
__random random.c:291
__random random.c:288
rand rand.c:27
rand rand.c:26
rand@plt
rand@plt
compute_flag div.c:25
compute_flag div.c:22
main div.c:40
main div.c:40
Now we can see, following streams pair is moved to another section
"[ Hot chains in old perf data but source line changed (*) in new perf data ]"
cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
--------------------------- --------------------------
main div.c:39 main div.c:39*
main div.c:44 main div.c:44
Signed-off-by: Jin Yao <[email protected]>
---
tools/perf/Documentation/perf-diff.txt | 14 ++
tools/perf/builtin-diff.c | 170 +++++++++++++++++++++++--
2 files changed, 171 insertions(+), 13 deletions(-)
diff --git a/tools/perf/Documentation/perf-diff.txt b/tools/perf/Documentation/perf-diff.txt
index f50ca0fef0a4..296fea98ac07 100644
--- a/tools/perf/Documentation/perf-diff.txt
+++ b/tools/perf/Documentation/perf-diff.txt
@@ -182,6 +182,20 @@ OPTIONS
--tid=::
Only diff samples for given thread ID (comma separated list).
+--stream::
+ Enable hot streams comparison. Stream is a callchain which is
+ aggregated by the branch records from samples. The option can be
+ used with --before and --after if source line comparison is
+ required.
+
+--before::
+ Source code directory corresponding to perf.data.old. Should be
+ used with --stream and --after.
+
+--after::
+ Source code directory corresponding to perf.data. Should be
+ used with --stream and --before.
+
COMPARISON
----------
The comparison is governed by the baseline file. The baseline perf.data
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index c94a002f295e..6abd3c795a5e 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -25,6 +25,8 @@
#include "util/map.h"
#include "util/spark.h"
#include "util/block-info.h"
+#include "util/srclist.h"
+#include "util/callchain.h"
#include <linux/err.h>
#include <linux/zalloc.h>
#include <subcmd/pager.h>
@@ -38,10 +40,15 @@
struct perf_diff {
struct perf_tool tool;
const char *time_str;
+ const char *before_dir;
+ const char *after_dir;
struct perf_time_interval *ptime_range;
int range_size;
int range_num;
bool has_br_stack;
+ bool src_cmp;
+ bool stream;
+ struct srclist *src_list;
};
/* Diff command specific HPP columns. */
@@ -72,6 +79,8 @@ struct data__file {
struct perf_data data;
int idx;
struct hists *hists;
+ struct callchain_streams *evsel_streams;
+ int nr_evsel_streams;
struct diff_hpp_fmt fmt[PERF_HPP_DIFF__MAX_INDEX];
};
@@ -106,6 +115,7 @@ enum {
COMPUTE_DELTA_ABS,
COMPUTE_CYCLES,
COMPUTE_MAX,
+ COMPUTE_STREAM, /* After COMPUTE_MAX to avoid use current compute arrays */
};
const char *compute_names[COMPUTE_MAX] = {
@@ -393,6 +403,11 @@ static int diff__process_sample_event(struct perf_tool *tool,
struct perf_diff *pdiff = container_of(tool, struct perf_diff, tool);
struct addr_location al;
struct hists *hists = evsel__hists(evsel);
+ struct hist_entry_iter iter = {
+ .evsel = evsel,
+ .sample = sample,
+ .ops = &hist_iter_normal,
+ };
int ret = -1;
if (perf_time__ranges_skip_sample(pdiff->ptime_range, pdiff->range_num,
@@ -411,14 +426,8 @@ static int diff__process_sample_event(struct perf_tool *tool,
goto out_put;
}
- if (compute != COMPUTE_CYCLES) {
- if (!hists__add_entry(hists, &al, NULL, NULL, NULL, sample,
- true)) {
- pr_warning("problem incrementing symbol period, "
- "skipping event\n");
- goto out_put;
- }
- } else {
+ switch (compute) {
+ case COMPUTE_CYCLES:
if (!hists__add_entry_ops(hists, &block_hist_ops, &al, NULL,
NULL, NULL, sample, true)) {
pr_warning("problem incrementing symbol period, "
@@ -428,6 +437,23 @@ static int diff__process_sample_event(struct perf_tool *tool,
hist__account_cycles(sample->branch_stack, &al, sample, false,
NULL);
+ break;
+
+ case COMPUTE_STREAM:
+ if (hist_entry_iter__add(&iter, &al, PERF_MAX_STACK_DEPTH,
+ NULL)) {
+ pr_debug("problem adding hist entry, skipping event\n");
+ goto out_put;
+ }
+ break;
+
+ default:
+ if (!hists__add_entry(hists, &al, NULL, NULL, NULL, sample,
+ true)) {
+ pr_warning("problem incrementing symbol period, "
+ "skipping event\n");
+ goto out_put;
+ }
}
/*
@@ -996,6 +1022,53 @@ static void data_process(void)
}
}
+static int process_base_stream(struct data__file *data_base,
+ struct data__file *data_pair,
+ const char *title __maybe_unused,
+ const char *base_dir __maybe_unused,
+ const char *pair_dir __maybe_unused)
+{
+ struct evlist *evlist_base = data_base->session->evlist;
+ struct evlist *evlist_pair = data_pair->session->evlist;
+ struct evsel *evsel_base, *evsel_pair;
+ struct callchain_streams *es_base, *es_pair;
+
+ evlist__for_each_entry(evlist_base, evsel_base) {
+ evsel_pair = evsel_match(evsel_base, evlist_pair);
+ if (!evsel_pair)
+ continue;
+
+ es_base = callchain_evsel_streams_get(data_base->evsel_streams,
+ data_base->nr_evsel_streams,
+ evsel_base->idx);
+ if (!es_base)
+ return -1;
+
+ es_pair = callchain_evsel_streams_get(data_pair->evsel_streams,
+ data_pair->nr_evsel_streams,
+ evsel_pair->idx);
+ if (!es_pair)
+ return -1;
+
+ callchain_match_streams(es_base, es_pair, pdiff.src_list);
+ callchain_stream_report(es_base, es_pair);
+ }
+
+ return 0;
+}
+
+static void stream_process(void)
+{
+ /*
+ * Stream comparison only supports two data files.
+ * perf.data.old and perf.data. data__files[0] is perf.data.old,
+ * data__files[1] is perf.data.
+ */
+ process_base_stream(&data__files[0], &data__files[1],
+ "# Output based on old perf data:\n#\n",
+ pdiff.before_dir, pdiff.after_dir);
+}
+
static void data__free(struct data__file *d)
{
int col;
@@ -1109,6 +1182,19 @@ static int check_file_brstack(void)
return 0;
}
+static struct callchain_streams *create_evsel_streams(struct evlist *evlist,
+ int nr_streams_max,
+ int *nr_evsel_streams)
+{
+ struct callchain_streams *evsel_streams;
+
+ evsel_streams = callchain_evsel_streams_create(evlist,
+ nr_streams_max,
+ nr_evsel_streams);
+
+ return evsel_streams;
+}
+
static int __cmd_diff(void)
{
struct data__file *d;
@@ -1122,6 +1208,14 @@ static int __cmd_diff(void)
abstime_tmp = abstime_ostr;
ret = -EINVAL;
+ if (pdiff.src_cmp) {
+ pdiff.src_list = srclist__new(pdiff.before_dir,
+ pdiff.after_dir);
+
+ if (!pdiff.src_list)
+ goto out_free;
+ }
+
data__for_each_file(i, d) {
d->session = perf_session__new(&d->data, false, &pdiff.tool);
if (IS_ERR(d->session)) {
@@ -1153,9 +1247,21 @@ static int __cmd_diff(void)
if (pdiff.ptime_range)
zfree(&pdiff.ptime_range);
+
+ if (compute == COMPUTE_STREAM) {
+ d->evsel_streams = create_evsel_streams(
+ d->session->evlist,
+ 5,
+ &d->nr_evsel_streams);
+ if (!d->evsel_streams)
+ goto out_delete;
+ }
}
- data_process();
+ if (compute == COMPUTE_STREAM)
+ stream_process();
+ else
+ data_process();
out_delete:
data__for_each_file(i, d) {
@@ -1163,6 +1269,7 @@ static int __cmd_diff(void)
data__free(d);
}
+ out_free:
free(data__files);
if (pdiff.ptime_range)
@@ -1171,6 +1278,9 @@ static int __cmd_diff(void)
if (abstime_ostr)
free(abstime_ostr);
+ if (pdiff.src_list)
+ srclist__delete(pdiff.src_list);
+
return ret;
}
@@ -1228,6 +1338,15 @@ static const struct option options[] = {
"only consider symbols in these pids"),
OPT_STRING(0, "tid", &symbol_conf.tid_list_str, "tid[,tid...]",
"only consider symbols in these tids"),
+ OPT_BOOLEAN(0, "stream", &pdiff.stream,
+ "Enable hot streams comparison. "
+ "Can be used with --before and --after"),
+ OPT_STRING(0, "before", &pdiff.before_dir, "dir",
+ "Source code directory corresponding to perf.data.old. "
+ "WARNING: use with --after and --stream"),
+ OPT_STRING(0, "after", &pdiff.after_dir, "dir",
+ "Source code directory corresponding to perf.data. "
+ "WARNING: use with --before and --stream"),
OPT_END()
};
@@ -1887,6 +2006,19 @@ int cmd_diff(int argc, const char **argv)
if (cycles_hist && (compute != COMPUTE_CYCLES))
usage_with_options(diff_usage, options);
+ if (pdiff.stream) {
+ compute = COMPUTE_STREAM;
+ if ((pdiff.before_dir && !pdiff.after_dir) ||
+ (!pdiff.before_dir && pdiff.after_dir)) {
+ usage_with_options(diff_usage, options);
+ }
+
+ if (pdiff.before_dir && pdiff.after_dir)
+ pdiff.src_cmp = true;
+ } else if (pdiff.before_dir || pdiff.after_dir) {
+ usage_with_options(diff_usage, options);
+ }
+
symbol__annotation_init();
if (symbol__init(NULL) < 0)
@@ -1898,13 +2030,25 @@ int cmd_diff(int argc, const char **argv)
if (check_file_brstack() < 0)
return -1;
- if (compute == COMPUTE_CYCLES && !pdiff.has_br_stack)
+ if ((compute == COMPUTE_CYCLES || compute == COMPUTE_STREAM)
+ && !pdiff.has_br_stack) {
return -1;
+ }
- if (ui_init() < 0)
- return -1;
+ if (compute == COMPUTE_STREAM) {
+ symbol_conf.show_branchflag_count = true;
+ callchain_param.mode = CHAIN_FLAT;
+ callchain_param.key = CCKEY_SRCLINE;
+ callchain_param.branch_callstack = 1;
+ symbol_conf.use_callchain = true;
+ callchain_register_param(&callchain_param);
+ sort_order = "srcline,symbol,dso";
+ } else {
+ if (ui_init() < 0)
+ return -1;
- sort__mode = SORT_MODE__DIFF;
+ sort__mode = SORT_MODE__DIFF;
+ }
if (setup_sorting(NULL) < 0)
usage_with_options(diff_usage, options);
--
2.17.1
On Mon, Apr 20, 2020 at 09:04:44AM +0800, Jin Yao wrote:
SNIP
> compute_flag div.c:25 compute_flag div.c:25
> compute_flag div.c:22 compute_flag div.c:22
> main div.c:40 main div.c:40
> main div.c:40 main div.c:40
> main div.c:39 main div.c:39*
>
> [ Hot chains in old perf data only ]
>
> hot chain 1:
> cycles: 2, hits: 4.08%
> --------------------------
> main div.c:42
> compute_flag div.c:28
>
> [ Hot chains in new perf data only ]
>
> hot chain 1:
> cycles: 36, hits: 3.36%
> --------------------------
> __random_r random_r.c:357
> __random random.c:293
> __random random.c:293
> __random random.c:291
> __random random.c:291
> __random random.c:291
> __random random.c:288
> rand rand.c:27
> rand rand.c:26
> rand@plt
> rand@plt
> compute_flag div.c:25
> compute_flag div.c:22
> main div.c:40
> main div.c:40
>
> Now we can see, following streams pair is moved to another section
> "[ Hot chains in old perf data but source line changed (*) in new perf data ]"
>
> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
> --------------------------- --------------------------
> main div.c:39 main div.c:39*
> main div.c:44 main div.c:44
>
so I tried following:
# ./perf record -e cycles:u -b ./perf bench sched pipe
# ./perf record -e cycles:u -b ./perf bench sched pipe
# ./perf diff -f --stream --before $PWD --after $PWD >out 2>&1
and the out file looks like this:
[ Matched hot chains between old perf data and new perf data ]
[ Hot chains in old perf data but source line changed (*) in new perf data ]
[ Hot chains in old perf data only ]
hot chain 1:
cycles: 0, hits: 4.20%
--------------------------
0xffffffff89c00163
hot chain 2:
cycles: 0, hits: 4.11%
--------------------------
0xffffffff89c00163
hot chain 3:
cycles: 0, hits: 8.22%
--------------------------
0xffffffff89c00163
hot chain 4:
cycles: 0, hits: 5.54%
--------------------------
0xffffffff89c00163
hot chain 5:
cycles: 0, hits: 6.10%
--------------------------
0xffffffff89c00163
[ Hot chains in new perf data only ]
hot chain 1:
cycles: 0, hits: 5.21%
--------------------------
0xffffffff89c00163
hot chain 2:
cycles: 0, hits: 4.79%
--------------------------
0xffffffff89c00163
hot chain 3:
cycles: 0, hits: 5.44%
--------------------------
0xffffffff89c00163
hot chain 4:
cycles: 0, hits: 5.50%
--------------------------
0xffffffff89c00163
hot chain 5:
cycles: 0, hits: 7.14%
--------------------------
0xffffffff89c00163
I'd expected more common paths, from what I can see from 'perf report --branch-history'
on bpth perf.data and perf.data.old
jirka
On Mon, Apr 20, 2020 at 09:04:45AM +0800, Jin Yao wrote:
> Sometimes, a small change in a hot function reducing the cycles of
> this function, but the overall workload doesn't get faster. It is
> interesting where the cycles are moved to.
>
> What it would like is to diff before/after streams. A stream is a
> callchain which is aggregated by the branch records from samples.
>
> By browsing the hot streams, we can understand the hot code flow.
> By comparing the cycles variation of same streams between old perf
> data and new perf data, we can understand if the cycles are moved to
> the unchanged code.
>
> The before stream is the stream before source line changed
> (e.g. streams in perf.data.old). The after stream is the stream
> after source line changed (e.g. streams in perf.data).
>
> Diffing before/after streams compares all streams (or compares top
> N hot streams) between two perf data files.
>
> If all entries of one stream in perf.data.old are fully matched with
> all entries of another stream in perf.data, we think these two streams
> are matched, otherwise the streams are not matched.
>
> For example,
>
> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
> -------------------------- --------------------------
> main div.c:39 main div.c:39
> main div.c:44 main div.c:44
hi,
I'm getting compile error with this patch:
[jolsa@krava perf]$ make JOBS=1
BUILD: Doing 'make -j1' parallel build
CC util/srclist.o
util/srclist.c: In function ‘srclist__node_new’:
util/srclist.c:388:35: error: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size 4091 [-Werror=format-truncation=]
388 | snprintf(cmd, sizeof(cmd), "diff %s %s",
| ^~
......
456 | ret = init_src_info(b_path, a_path, rel_path, &node->info);
| ~~~~~~
In file included from /usr/include/stdio.h:867,
from util/srclist.c:8:
/usr/include/bits/stdio2.h:67:10: note: ‘__builtin___snprintf_chk’ output between 7 and 8197 bytes into a destination of size 4096
67 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
68 | __bos (__s), __fmt, __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
mv: cannot stat 'util/.srclist.o.tmp': No such file or directory
make[4]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:97: util/srclist.o] Error 1
make[3]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:139: util] Error 2
make[2]: *** [Makefile.perf:617: perf-in.o] Error 2
make[1]: *** [Makefile.perf:225: sub-make] Error 2
make: *** [Makefile:70: all] Error 2
thanks,
jirka
>
> It looks that two streams are matched and we can see for the same
> streams the cycles are equal and the stream hit percents are
> slightly changed. That's expected in the normal range.
>
> But that's not always true if source lines are changed in perf.data
> (e.g. div.c:39 is changed). If the source line is changed, they
> are different streams, we can't compare them. We just think the
> stream in perf.data is a new one.
>
> The challenge is how to identify the changed source lines. The basic
> idea is to use linux command 'diff' to compare the source file A and
> source file A* line by line (assume A is used in perf.data.old
> and A* is updated in perf.data). Create a source line mapping table.
>
> For example,
>
> Execute 'diff ./before/div.c ./after/div.c'
>
> 25c25
> < i = rand() % 2;
> ---
> > i = rand() % 4;
> 39c39
> < for (i = 0; i < 2000000000; i++) {
> ---
> > for (i = 0; i < 20000000001; i++) {
>
> div.c (after -> before) lines mapping:
> 0 -> 0
> 1 -> 1
> 2 -> 2
> 3 -> 3
> 4 -> 4
> 5 -> 5
> 6 -> 6
> 7 -> 7
> 8 -> 8
> 9 -> 9
> ...
> 24 -> 24
> 25 -> -1
> 26 -> 26
> 27 -> 27
> 28 -> 28
> 29 -> 29
> 30 -> 30
> 31 -> 31
> 32 -> 32
> 33 -> 33
> 34 -> 34
> 35 -> 35
> 36 -> 36
> 37 -> 37
> 38 -> 38
> 39 -> -1
> 40 -> 40
> ...
>
> From the table, we can easily know div.c:39 is source line changed.
> (it's mapped to -1). So these two streams are not matched.
>
> This patch can also handle the cases of new source lines insertion
> and old source lines deletion. This source line mapping table is a
> base for next processing for stream comparison.
>
> This patch creates a new rblist 'srclist' to manage the source line
> mapping tables. Each node contains the source line mapping table of
> a before/after source file pair. The node is created on demand as
> we process the samples. So it's likely we don't need to create so many
> nodes.
>
> v2:
> ---
> Refine the code
> 1. calloc -> zalloc
> 2. Functions operating on a 'struct line_pair' have the
> line_pair__ prefix.
> 3. Check get_range() return value.
> 4. Use fgets to replace fgetc for getting the number of lines.
> 5. Use path__join to generate the full path.
> 6. Keep the buffer after calling getline.
>
> Signed-off-by: Jin Yao <[email protected]>
> ---
> tools/perf/util/Build | 1 +
> tools/perf/util/srclist.c | 555 ++++++++++++++++++++++++++++++++++++++
> tools/perf/util/srclist.h | 65 +++++
> 3 files changed, 621 insertions(+)
> create mode 100644 tools/perf/util/srclist.c
> create mode 100644 tools/perf/util/srclist.h
>
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index c0cf8dff694e..b9bf620b60f0 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -99,6 +99,7 @@ perf-y += call-path.o
> perf-y += rwsem.o
> perf-y += thread-stack.o
> perf-y += spark.o
> +perf-y += srclist.o
> perf-$(CONFIG_AUXTRACE) += auxtrace.o
> perf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
> perf-$(CONFIG_AUXTRACE) += intel-pt.o
> diff --git a/tools/perf/util/srclist.c b/tools/perf/util/srclist.c
> new file mode 100644
> index 000000000000..8060e4855d11
> --- /dev/null
> +++ b/tools/perf/util/srclist.c
> @@ -0,0 +1,555 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Manage difference of source lines
> + * Copyright (c) 2020, Intel Corporation.
> + * Author: Jin Yao
> + */
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <sys/types.h>
> +#include <inttypes.h>
> +#include <string.h>
> +#include <strings.h>
> +#include <unistd.h>
> +#include <err.h>
> +#include <errno.h>
> +#include <limits.h>
> +#include <stdbool.h>
> +#include <linux/zalloc.h>
> +#include "strlist.h"
> +#include "srclist.h"
> +#include "debug.h"
> +#include "path.h"
> +
> +enum {
> + DIFF_LINE_ADD,
> + DIFF_LINE_DEL,
> + DIFF_LINE_CHANGE
> +};
> +
> +static void get_full_path(const char *dir, const char *rel_path,
> + char *full_path, int size)
> +{
> + if (!dir) {
> + snprintf(full_path, size, "%s", rel_path);
> + return;
> + }
> +
> + path__join(full_path, size, dir, rel_path);
> +}
> +
> +static int path__number_of_lines(char *path)
> +{
> + FILE *fp;
> + char buf[4096], *p, last = 0;
> + int num = 0;
> +
> + fp = fopen(path, "r");
> + if (!fp) {
> + pr_debug("Failed to open file %s\n", path);
> + return -1;
> + }
> +
> + while (!feof(fp)) {
> + p = fgets(buf, sizeof(buf), fp);
> + if (!p)
> + break;
> +
> + last = p[strlen(p) - 1];
> + if (last == '\n')
> + num++;
> + }
> +
> + fclose(fp);
> +
> + if (last != '\n' && last != 0)
> + num++;
> +
> + return num;
> +}
> +
> +static int get_digit(char *str, char **end_str, int *val)
> +{
> + int v;
> + char *end;
> +
> + errno = 0;
> + v = strtol(str, &end, 10);
> + if ((errno == ERANGE && (v == INT_MAX || v == INT_MIN))
> + || (errno != 0 && v == 0)) {
> + return -1;
> + }
> +
> + if (end == str)
> + return -1;
> +
> + *val = v;
> + *end_str = end;
> + return 0;
> +}
> +
> +static int get_range(char *str, int *start_val, int *end_val)
> +{
> + int val, ret;
> + char *end, *next;
> +
> + *start_val = -1;
> + *end_val = -1;
> +
> + ret = get_digit(str, &end, &val);
> + if (ret)
> + return ret;
> +
> + *start_val = val;
> +
> + if (*end != '\0') {
> + next = end + 1;
> + ret = get_digit(next, &end, &val);
> + if (ret)
> + return ret;
> +
> + *end_val = val;
> + }
> +
> + return 0;
> +}
> +
> +static int opr_str_idx(char *str, int len, char ch)
> +{
> + int i;
> +
> + for (i = 0; i < len; i++) {
> + if (str[i] == ch)
> + break;
> + }
> +
> + if (i == len || i == 0 || i == len - 1)
> + return -1;
> +
> + if (str[i - 1] >= '0' && str[i - 1] <= '9' &&
> + str[i + 1] >= '0' && str[i + 1] <= '9') {
> + return i;
> + }
> +
> + return -1;
> +}
> +
> +static bool get_diff_operation(char *str, int *opr, int *b_start, int *b_end,
> + int *a_start, int *a_end)
> +{
> + int idx, len;
> +
> + if (str[0] == '<' || str[0] == '>' || str[0] == '-')
> + return false;
> +
> + len = strlen(str);
> +
> + /*
> + * For example,
> + * 5,6d4
> + * 8a7
> + * 20,21c19
> + */
> + idx = opr_str_idx(str, len, 'd');
> + if (idx != -1) {
> + *opr = DIFF_LINE_DEL;
> + goto exit;
> + }
> +
> + idx = opr_str_idx(str, len, 'a');
> + if (idx != -1) {
> + *opr = DIFF_LINE_ADD;
> + goto exit;
> + }
> +
> + idx = opr_str_idx(str, len, 'c');
> + if (idx != -1) {
> + *opr = DIFF_LINE_CHANGE;
> + goto exit;
> + }
> +
> +exit:
> + if (idx != -1) {
> + str[idx] = 0;
> + if (get_range(str, b_start, b_end) != 0)
> + return false;
> +
> + if (get_range(&str[idx + 1], a_start, a_end) != 0)
> + return false;
> +
> + return true;
> + }
> +
> + return false;
> +}
> +
> +static int line_pair__del_lines(struct line_pair *lines, int b_start, int b_end,
> + int a_start, int a_end __maybe_unused,
> + int *nr_lines, int *b_adjust)
> +{
> + int i = *nr_lines;
> + bool set = false;
> +
> + /*
> + * For example, 5,6d4
> + * It means line 5 ~ line 6 of file1 are not same as line 4 of file2
> + * because line 5 ~ line 6 are deleted.
> + */
> +
> + if (a_start == i) {
> + lines[i].a_nr = i;
> + lines[i].b_nr = i + *b_adjust;
> + set = true;
> + }
> +
> + if (b_end != -1)
> + *b_adjust += b_end - b_start + 1;
> + else
> + *b_adjust += 1;
> +
> + if (!set) {
> + lines[i].a_nr = i;
> + lines[i].b_nr = i + *b_adjust;
> + }
> +
> + *nr_lines = i + 1;
> + return 0;
> +}
> +
> +static int line_pair__add_lines(struct line_pair *lines,
> + int b_start __maybe_unused,
> + int b_end __maybe_unused,
> + int a_start, int a_end, int *nr_lines,
> + int *b_adjust)
> +{
> + int i = *nr_lines;
> +
> + /*
> + * For example, 8a7
> + * It means line 8 at file1 is not same as line 7 at file2
> + * because a new line (line 7) is inserted to file2.
> + */
> + if (a_end != -1) {
> + for (int j = 0; j < a_end - a_start + 1; j++) {
> + lines[i].a_nr = i;
> + lines[i].b_nr = -1;
> + i++;
> + }
> + *b_adjust -= a_end - a_start + 1;
> + } else {
> + lines[i].a_nr = i;
> + lines[i].b_nr = -1;
> + i++;
> + *b_adjust -= 1;
> + }
> +
> + *nr_lines = i;
> + return 0;
> +}
> +
> +static int line_pair__change_lines(struct line_pair *lines, int b_start,
> + int b_end, int a_start, int a_end,
> + int *nr_lines, int *b_adjust)
> +{
> + int i = *nr_lines;
> +
> + /*
> + * For example, 20,21c19
> + * It means line20~line21 of file1 are not same as line19 of file2
> + * since they are changed.
> + */
> +
> + if (a_end != -1) {
> + for (int j = 0; j < a_end - a_start + 1; j++) {
> + lines[i].a_nr = i;
> + lines[i].b_nr = -1;
> + i++;
> + }
> + } else {
> + lines[i].a_nr = i;
> + lines[i].b_nr = -1;
> + i++;
> + }
> +
> + if (b_end != -1)
> + *b_adjust += b_end - b_start;
> +
> + if (a_end != -1)
> + *b_adjust -= a_end - a_start;
> +
> + *nr_lines = i;
> + return 0;
> +}
> +
> +static int line_pair__update_lines(struct line_pair *lines, int opr,
> + int b_start, int b_end, int a_start,
> + int a_end, int *nr_lines, int *b_adjust)
> +{
> + int i = *nr_lines;
> +
> + while (i < a_start) {
> + lines[i].a_nr = i;
> + lines[i].b_nr = i + *b_adjust;
> + i++;
> + }
> +
> + *nr_lines = i;
> +
> + switch (opr) {
> + case DIFF_LINE_DEL:
> + line_pair__del_lines(lines, b_start, b_end, a_start, a_end,
> + nr_lines, b_adjust);
> + break;
> +
> + case DIFF_LINE_ADD:
> + line_pair__add_lines(lines, b_start, b_end, a_start, a_end,
> + nr_lines, b_adjust);
> + break;
> +
> + case DIFF_LINE_CHANGE:
> + line_pair__change_lines(lines, b_start, b_end, a_start,
> + a_end, nr_lines, b_adjust);
> + break;
> +
> + default:
> + break;
> + }
> +
> + return 0;
> +}
> +
> +static void line_pair__update_remaining(struct line_pair *lines, int a_num,
> + int *nr_lines, int b_adjust)
> +{
> + int i;
> +
> + for (i = *nr_lines; i <= a_num; i++) {
> + lines[i].a_nr = i;
> + lines[i].b_nr = i + b_adjust;
> + }
> +
> + *nr_lines = i;
> +}
> +
> +static int build_mapping_table(FILE *fp, struct line_pair *lines,
> + int *nr_lines, int a_num)
> +{
> + char *str = NULL, *p;
> + size_t len = 0;
> + int opr, b_start, b_end, a_start, a_end, b_adjust = 0;
> +
> + *nr_lines = 1; /* First line is reserved */
> +
> + while (getline(&str, &len, fp) > 0) {
> + p = strchr(str, '\n');
> + if (p)
> + *p = '\0';
> +
> + pr_debug("%s\n", str);
> +
> + if (!get_diff_operation(str, &opr, &b_start, &b_end,
> + &a_start, &a_end)) {
> + continue;
> + }
> +
> + line_pair__update_lines(lines, opr, b_start, b_end, a_start,
> + a_end, nr_lines, &b_adjust);
> + }
> +
> + line_pair__update_remaining(lines, a_num, nr_lines, b_adjust);
> +
> + free(str);
> + return 0;
> +}
> +
> +static int src_info__create_line_mapping(struct src_info *info, char *b_path,
> + char *a_path)
> +{
> + char cmd[PATH_MAX];
> + FILE *fp = NULL;
> + int ret = -1;
> +
> + info->nr_lines_before = path__number_of_lines(b_path);
> + if (info->nr_lines_before == -1)
> + goto out;
> +
> + info->nr_lines_after = path__number_of_lines(a_path);
> + if (info->nr_lines_after == -1)
> + goto out;
> +
> + /* Reserve first line */
> + info->nr_max = info->nr_lines_before + info->nr_lines_after + 1;
> + info->lines = calloc(info->nr_max, sizeof(struct line_pair));
> + if (!info->lines) {
> + pr_err("Failed to alloc memory for lines\n");
> + goto out;
> + }
> +
> + snprintf(cmd, sizeof(cmd), "diff %s %s",
> + b_path, a_path);
> +
> + pr_debug("Execute '%s'\n", cmd);
> +
> + fp = popen(cmd, "r");
> + if (!fp) {
> + pr_err("Failed to execute '%s'\n", cmd);
> + goto out;
> + }
> +
> + ret = build_mapping_table(fp, info->lines, &info->nr_lines,
> + info->nr_lines_after);
> +
> +out:
> + if (fp)
> + fclose(fp);
> +
> + return ret;
> +}
> +
> +static void free_src_info(struct src_info *info)
> +{
> + zfree(&info->rel_path);
> + zfree(&info->lines);
> +}
> +
> +static int init_src_info(char *b_path, char *a_path, const char *rel_path,
> + struct src_info *info)
> +{
> + int ret;
> +
> + info->rel_path = strdup(rel_path);
> + if (!info->rel_path)
> + return -1;
> +
> + ret = src_info__create_line_mapping(info, b_path, a_path);
> + if (ret)
> + free_src_info(info);
> +
> + return ret;
> +}
> +
> +static void free_src_node(struct src_node *node)
> +{
> + if (!node)
> + return;
> +
> + free_src_info(&node->info);
> + free(node);
> +}
> +
> +static struct rb_node *srclist__node_new(struct rblist *rblist,
> + const void *entry)
> +{
> + struct srclist *slist = container_of(rblist, struct srclist, rblist);
> + const char *rel_path = entry;
> + char b_path[PATH_MAX], a_path[PATH_MAX];
> + struct src_node *node;
> + int ret;
> +
> + get_full_path(slist->before_dir, rel_path, b_path, PATH_MAX);
> + get_full_path(slist->after_dir, rel_path, a_path, PATH_MAX);
> +
> + node = zalloc(sizeof(*node));
> + if (!node)
> + return NULL;
> +
> + ret = init_src_info(b_path, a_path, rel_path, &node->info);
> + if (ret)
> + goto err;
> +
> + return &node->rb_node;
> +
> +err:
> + free_src_node(node);
> + return NULL;
> +}
> +
> +static void srclist__node_delete(struct rblist *rblist __maybe_unused,
> + struct rb_node *rb_node)
> +{
> + struct src_node *node = container_of(rb_node, struct src_node, rb_node);
> +
> + free_src_info(&node->info);
> + free(node);
> +}
> +
> +static int srclist__node_cmp(struct rb_node *rb_node, const void *entry)
> +{
> + struct src_node *node = container_of(rb_node, struct src_node, rb_node);
> + const char *str = entry;
> +
> + return strcmp(node->info.rel_path, str);
> +}
> +
> +struct srclist *srclist__new(const char *before_dir, const char *after_dir)
> +{
> + struct srclist *slist = zalloc(sizeof(*slist));
> +
> + if (!slist)
> + return NULL;
> +
> + rblist__init(&slist->rblist);
> + slist->rblist.node_cmp = srclist__node_cmp;
> + slist->rblist.node_new = srclist__node_new;
> + slist->rblist.node_delete = srclist__node_delete;
> +
> + slist->before_dir = before_dir;
> + slist->after_dir = after_dir;
> + return slist;
> +}
> +
> +void srclist__delete(struct srclist *slist)
> +{
> + if (slist)
> + rblist__delete(&slist->rblist);
> +}
> +
> +struct src_node *srclist__find(struct srclist *slist, char *rel_path,
> + bool create)
> +{
> + struct src_node *node = NULL;
> + struct rb_node *rb_node;
> +
> + if (create)
> + rb_node = rblist__findnew(&slist->rblist, (void *)rel_path);
> + else
> + rb_node = rblist__find(&slist->rblist, (void *)rel_path);
> +
> + if (rb_node)
> + node = container_of(rb_node, struct src_node, rb_node);
> +
> + return node;
> +}
> +
> +struct line_pair *srclist__line_pair(struct src_node *node, int a_nr)
> +{
> + struct src_info *info = &node->info;
> +
> + if (a_nr < info->nr_lines && a_nr > 0)
> + return &info->lines[a_nr];
> +
> + pr_debug("Exceeds line range: nr_lines = %d, a_nr = %d\n",
> + info->nr_lines, a_nr);
> +
> + return NULL;
> +}
> +
> +void srclist__dump(struct srclist *slist)
> +{
> + struct src_node *node;
> + struct src_info *info;
> + int i;
> +
> + srclist__for_each_entry(node, slist) {
> + info = &node->info;
> +
> + pr_debug("%s (after -> before) lines mapping:\n",
> + info->rel_path);
> +
> + for (i = 0; i < info->nr_lines; i++) {
> + pr_debug("%d -> %d\n",
> + info->lines[i].a_nr,
> + info->lines[i].b_nr);
> + }
> + }
> +}
> diff --git a/tools/perf/util/srclist.h b/tools/perf/util/srclist.h
> new file mode 100644
> index 000000000000..f25b0de91a13
> --- /dev/null
> +++ b/tools/perf/util/srclist.h
> @@ -0,0 +1,65 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __PERF_SRCLIST_H
> +#define __PERF_SRCLIST_H
> +
> +#include <linux/types.h>
> +#include <linux/rbtree.h>
> +#include "rblist.h"
> +
> +struct line_pair {
> + int a_nr; /* line nr in after.c */
> + int b_nr; /* line nr in before.c */
> +};
> +
> +struct src_node;
> +
> +struct src_info {
> + char *rel_path; /* relative path */
> + struct line_pair *lines;
> + int nr_max;
> + int nr_lines;
> + int nr_lines_before;
> + int nr_lines_after;
> +};
> +
> +struct src_node {
> + struct rb_node rb_node;
> + struct src_info info;
> +};
> +
> +struct srclist {
> + struct rblist rblist;
> + const char *before_dir;
> + const char *after_dir;
> +};
> +
> +struct srclist *srclist__new(const char *before_dir, const char *after_dir);
> +void srclist__delete(struct srclist *slist);
> +
> +struct src_node *srclist__find(struct srclist *slist, char *rel_path,
> + bool create);
> +
> +struct line_pair *srclist__line_pair(struct src_node *node, int a_nr);
> +void srclist__dump(struct srclist *slist);
> +
> +static inline struct src_node *srclist__first(struct srclist *slist)
> +{
> + struct rb_node *rn = rb_first_cached(&slist->rblist.entries);
> +
> + return rn ? rb_entry(rn, struct src_node, rb_node) : NULL;
> +}
> +
> +static inline struct src_node *srclist__next(struct src_node *sn)
> +{
> + struct rb_node *rn;
> +
> + if (!sn)
> + return NULL;
> + rn = rb_next(&sn->rb_node);
> + return rn ? rb_entry(rn, struct src_node, rb_node) : NULL;
> +}
> +
> +#define srclist__for_each_entry(pos, slist) \
> + for (pos = srclist__first(slist); pos; pos = srclist__next(pos))
> +
> +#endif /* __PERF_SRCLIST_H */
> --
> 2.17.1
>
On Mon, Apr 20, 2020 at 09:04:46AM +0800, Jin Yao wrote:
> We think the stream is a callchain which is aggregated by the LBR
> records from samples. By browsing the stream, we can understand
> the code flow.
>
> The struct callchain_node represents one callchain and we use the
> callchain_node->hit to measure the hot level of this callchain.
> Higher is hotter.
>
> Since in perf data file, there may be many callchains so we just
> need to focus on the top N hottest callchains. N is a user defined
> parameter or just a predefined default value.
>
> This patch saves the top N hottest callchains in 'struct stream_node'
> type array, which is defined in a per event 'struct callchain_streams'.
>
> So now we can get the per-event top N hottest callchains.
>
> v2:
> ---
> Use zfree in free_evsel_streams().
>
> Signed-off-by: Jin Yao <[email protected]>
> ---
> tools/perf/util/callchain.c | 122 ++++++++++++++++++++++++++++++++++++
> tools/perf/util/callchain.h | 16 +++++
> 2 files changed, 138 insertions(+)
SNIP
could this and all the other related code moved to separated object
like streams.c or such.. I think also the stuff from patch 1 could
go there, as it's specific only to this streams code
jirka
On Mon, Apr 20, 2020 at 09:04:44AM +0800, Jin Yao wrote:
> Sometimes, a small change in a hot function reducing the cycles of
> this function, but the overall workload doesn't get faster. It is
> interesting where the cycles are moved to.
>
> What it would like is to diff before/after streams. A stream we think
> is a callchain which is aggregated by the branch records from perf
> samples.
I wonder we could use this on intel_pt trace.. like compare streams
for given function call.. not sure that would be feasible, but might
be good idea to write this in a generic way and not callchain specific
jirka
>
> By browsing the hot streams, we can understand the hot code path.
> By comparing the cycles variation of same streams between old perf
> data and new perf data, we can understand if the cycles are moved
> to other codes.
>
> The before stream is the stream in perf.data.old. The after stream
> is the stream in perf.data.
>
> Diffing before/after streams compares top N hottest streams between
> two perf data files.
>
> If all entries of one stream in perf.data.old are fully matched with
> all entries of another stream in perf.data, we think two streams
> are matched, otherwise the streams are not matched.
>
> For example,
>
> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
> -------------------------- --------------------------
> main div.c:39 main div.c:39
> main div.c:44 main div.c:44
>
> The above streams are matched and we can see for the same streams the
> cycles (1) are equal and the callchain hit percents are slightly changed
> (26.80% vs. 27.30%). That's expected.
>
> But that's not always true if source code is changed in perf.data
> (e.g. div.c:39 is changed). If div.c:39 is changed, they are different
> streams, we can't compare them. We will think the stream in perf.data
> is a new stream.
>
> The challenge is how to identify the changed source lines. The basic
> idea is to use linux command "diff" to compare the source file A and
> source file A* line by line (assume file A is used in perf.data.old
> and file A* is used in perf.data). According to "diff" output,
> we can generate a source line mapping table.
>
> For example,
>
> Execute 'diff ./before/div.c ./after/div.c'
>
> 25c25
> < i = rand() % 2;
> ---
> > i = rand() % 4;
> 39c39
> < for (i = 0; i < 2000000000; i++) {
> ---
> > for (i = 0; i < 20000000001; i++) {
>
> div.c (after -> before) lines mapping:
> 0 -> 0
> 1 -> 1
> 2 -> 2
> 3 -> 3
> 4 -> 4
> 5 -> 5
> 6 -> 6
> 7 -> 7
> 8 -> 8
> 9 -> 9
> ...
> 24 -> 24
> 25 -> -1
> 26 -> 26
> 27 -> 27
> 28 -> 28
> 29 -> 29
> 30 -> 30
> 31 -> 31
> 32 -> 32
> 33 -> 33
> 34 -> 34
> 35 -> 35
> 36 -> 36
> 37 -> 37
> 38 -> 38
> 39 -> -1
> 40 -> 40
> ...
>
> From the table, we can easily know div.c:39 is source line changed.
> (it's mapped to -1). So following two streams are not matched.
>
> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
> -------------------------- --------------------------
> main div.c:39 main div.c:39
> main div.c:44 main div.c:44
>
> Now let's see examples.
>
> perf record -b ... Generate perf.data.old with branch data
> perf record -b ... Generate perf.data with branch data
> perf diff --stream
>
> [ Matched hot chains between old perf data and new perf data) ]
>
> hot chain pair 1:
> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
> --------------------------- --------------------------
> main div.c:39 main div.c:39
> main div.c:44 main div.c:44
>
> hot chain pair 2:
> cycles: 35, hits: 21.43% cycles: 33, hits: 19.37%
> --------------------------- --------------------------
> __random_r random_r.c:360 __random_r random_r.c:360
> __random_r random_r.c:388 __random_r random_r.c:388
> __random_r random_r.c:388 __random_r random_r.c:388
> __random_r random_r.c:380 __random_r random_r.c:380
> __random_r random_r.c:357 __random_r random_r.c:357
> __random random.c:293 __random random.c:293
> __random random.c:293 __random random.c:293
> __random random.c:291 __random random.c:291
> __random random.c:291 __random random.c:291
> __random random.c:291 __random random.c:291
> __random random.c:288 __random random.c:288
> rand rand.c:27 rand rand.c:27
> rand rand.c:26 rand rand.c:26
> rand@plt rand@plt
> rand@plt rand@plt
> compute_flag div.c:25 compute_flag div.c:25
> compute_flag div.c:22 compute_flag div.c:22
> main div.c:40 main div.c:40
> main div.c:40 main div.c:40
> main div.c:39 main div.c:39
>
> hot chain pair 3:
> cycles: 18, hits: 6.10% cycles: 19, hits: 6.51%
> --------------------------- --------------------------
> __random_r random_r.c:360 __random_r random_r.c:360
> __random_r random_r.c:388 __random_r random_r.c:388
> __random_r random_r.c:388 __random_r random_r.c:388
> __random_r random_r.c:380 __random_r random_r.c:380
> __random_r random_r.c:357 __random_r random_r.c:357
> __random random.c:293 __random random.c:293
> __random random.c:293 __random random.c:293
> __random random.c:291 __random random.c:291
> __random random.c:291 __random random.c:291
> __random random.c:291 __random random.c:291
> __random random.c:288 __random random.c:288
> rand rand.c:27 rand rand.c:27
> rand rand.c:26 rand rand.c:26
> rand@plt rand@plt
> rand@plt rand@plt
> compute_flag div.c:25 compute_flag div.c:25
> compute_flag div.c:22 compute_flag div.c:22
> main div.c:40 main div.c:40
>
> hot chain pair 4:
> cycles: 9, hits: 5.95% cycles: 8, hits: 5.03%
> --------------------------- --------------------------
> __random_r random_r.c:360 __random_r random_r.c:360
> __random_r random_r.c:388 __random_r random_r.c:388
> __random_r random_r.c:388 __random_r random_r.c:388
> __random_r random_r.c:380 __random_r random_r.c:380
>
> [ Hot chains in old perf data but source line changed (*) in new perf data ]
>
> [ Hot chains in old perf data only ]
>
> hot chain 1:
> cycles: 2, hits: 4.08%
> --------------------------
> main div.c:42
> compute_flag div.c:28
>
> [ Hot chains in new perf data only ]
>
> hot chain 1:
> cycles: 36, hits: 3.36%
> --------------------------
> __random_r random_r.c:357
> __random random.c:293
> __random random.c:293
> __random random.c:291
> __random random.c:291
> __random random.c:291
> __random random.c:288
> rand rand.c:27
> rand rand.c:26
> rand@plt
> rand@plt
> compute_flag div.c:25
> compute_flag div.c:22
> main div.c:40
> main div.c:40
>
> If we enable the source line comparison option, the output may be different.
>
> perf record -b ... Generate perf.data.old with branch data
> perf record -b ... Generate perf.data with branch data
> perf diff --stream --before ./before --after ./after
>
> [ Matched hot chains between old perf data and new perf data) ]
>
> hot chain pair 1:
> cycles: 18, hits: 6.10% cycles: 19, hits: 6.51%
> --------------------------- --------------------------
> __random_r random_r.c:360 __random_r random_r.c:360
> __random_r random_r.c:388 __random_r random_r.c:388
> __random_r random_r.c:388 __random_r random_r.c:388
> __random_r random_r.c:380 __random_r random_r.c:380
> __random_r random_r.c:357 __random_r random_r.c:357
> __random random.c:293 __random random.c:293
> __random random.c:293 __random random.c:293
> __random random.c:291 __random random.c:291
> __random random.c:291 __random random.c:291
> __random random.c:291 __random random.c:291
> __random random.c:288 __random random.c:288
> rand rand.c:27 rand rand.c:27
> rand rand.c:26 rand rand.c:26
> rand@plt rand@plt
> rand@plt rand@plt
> compute_flag div.c:25 compute_flag div.c:25
> compute_flag div.c:22 compute_flag div.c:22
> main div.c:40 main div.c:40
>
> hot chain pair 2:
> cycles: 9, hits: 5.95% cycles: 8, hits: 5.03%
> --------------------------- --------------------------
> __random_r random_r.c:360 __random_r random_r.c:360
> __random_r random_r.c:388 __random_r random_r.c:388
> __random_r random_r.c:388 __random_r random_r.c:388
> __random_r random_r.c:380 __random_r random_r.c:380
>
> [ Hot chains in old perf data but source line changed (*) in new perf data ]
>
> hot chain pair 1:
> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
> --------------------------- --------------------------
> main div.c:39 main div.c:39*
> main div.c:44 main div.c:44
>
> hot chain pair 2:
> cycles: 35, hits: 21.43% cycles: 33, hits: 19.37%
> --------------------------- --------------------------
> __random_r random_r.c:360 __random_r random_r.c:360
> __random_r random_r.c:388 __random_r random_r.c:388
> __random_r random_r.c:388 __random_r random_r.c:388
> __random_r random_r.c:380 __random_r random_r.c:380
> __random_r random_r.c:357 __random_r random_r.c:357
> __random random.c:293 __random random.c:293
> __random random.c:293 __random random.c:293
> __random random.c:291 __random random.c:291
> __random random.c:291 __random random.c:291
> __random random.c:291 __random random.c:291
> __random random.c:288 __random random.c:288
> rand rand.c:27 rand rand.c:27
> rand rand.c:26 rand rand.c:26
> rand@plt rand@plt
> rand@plt rand@plt
> compute_flag div.c:25 compute_flag div.c:25
> compute_flag div.c:22 compute_flag div.c:22
> main div.c:40 main div.c:40
> main div.c:40 main div.c:40
> main div.c:39 main div.c:39*
>
> [ Hot chains in old perf data only ]
>
> hot chain 1:
> cycles: 2, hits: 4.08%
> --------------------------
> main div.c:42
> compute_flag div.c:28
>
> [ Hot chains in new perf data only ]
>
> hot chain 1:
> cycles: 36, hits: 3.36%
> --------------------------
> __random_r random_r.c:357
> __random random.c:293
> __random random.c:293
> __random random.c:291
> __random random.c:291
> __random random.c:291
> __random random.c:288
> rand rand.c:27
> rand rand.c:26
> rand@plt
> rand@plt
> compute_flag div.c:25
> compute_flag div.c:22
> main div.c:40
> main div.c:40
>
> Now we can see, following streams pair is moved to another section
> "[ Hot chains in old perf data but source line changed (*) in new perf data ]"
>
> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
> --------------------------- --------------------------
> main div.c:39 main div.c:39*
> main div.c:44 main div.c:44
>
> v3:
> ---
> v2 has 14 patches, it's hard to review.
> v3 is only 7 patches for basic stream comparison.
>
> Jin Yao (7):
> perf util: Create source line mapping table
> perf util: Create streams for managing top N hottest callchains
> perf util: Return per-event callchain streams
> perf util: Compare two streams
> perf util: Calculate the sum of all streams hits
> perf util: Report hot streams
> perf diff: Support hot streams comparison
>
> tools/perf/Documentation/perf-diff.txt | 14 +
> tools/perf/builtin-diff.c | 170 +++++++-
> tools/perf/util/Build | 1 +
> tools/perf/util/callchain.c | 495 ++++++++++++++++++++++
> tools/perf/util/callchain.h | 32 ++
> tools/perf/util/srclist.c | 555 +++++++++++++++++++++++++
> tools/perf/util/srclist.h | 65 +++
> 7 files changed, 1319 insertions(+), 13 deletions(-)
> create mode 100644 tools/perf/util/srclist.c
> create mode 100644 tools/perf/util/srclist.h
>
> --
> 2.17.1
>
Hi Jiri,
On 4/27/2020 6:10 PM, Jiri Olsa wrote:
> On Mon, Apr 20, 2020 at 09:04:46AM +0800, Jin Yao wrote:
>> We think the stream is a callchain which is aggregated by the LBR
>> records from samples. By browsing the stream, we can understand
>> the code flow.
>>
>> The struct callchain_node represents one callchain and we use the
>> callchain_node->hit to measure the hot level of this callchain.
>> Higher is hotter.
>>
>> Since in perf data file, there may be many callchains so we just
>> need to focus on the top N hottest callchains. N is a user defined
>> parameter or just a predefined default value.
>>
>> This patch saves the top N hottest callchains in 'struct stream_node'
>> type array, which is defined in a per event 'struct callchain_streams'.
>>
>> So now we can get the per-event top N hottest callchains.
>>
>> v2:
>> ---
>> Use zfree in free_evsel_streams().
>>
>> Signed-off-by: Jin Yao <[email protected]>
>> ---
>> tools/perf/util/callchain.c | 122 ++++++++++++++++++++++++++++++++++++
>> tools/perf/util/callchain.h | 16 +++++
>> 2 files changed, 138 insertions(+)
>
> SNIP
>
> could this and all the other related code moved to separated object
> like streams.c or such.. I think also the stuff from patch 1 could
> go there, as it's specific only to this streams code
>
> jirka
>
That's fine. I will move the related codes to a new streams.c/streams.h.
Thanks
Jin Yao
Hi Jiri,
On 4/27/2020 6:10 PM, Jiri Olsa wrote:
> On Mon, Apr 20, 2020 at 09:04:44AM +0800, Jin Yao wrote:
>
> SNIP
>
>> compute_flag div.c:25 compute_flag div.c:25
>> compute_flag div.c:22 compute_flag div.c:22
>> main div.c:40 main div.c:40
>> main div.c:40 main div.c:40
>> main div.c:39 main div.c:39*
>>
>> [ Hot chains in old perf data only ]
>>
>> hot chain 1:
>> cycles: 2, hits: 4.08%
>> --------------------------
>> main div.c:42
>> compute_flag div.c:28
>>
>> [ Hot chains in new perf data only ]
>>
>> hot chain 1:
>> cycles: 36, hits: 3.36%
>> --------------------------
>> __random_r random_r.c:357
>> __random random.c:293
>> __random random.c:293
>> __random random.c:291
>> __random random.c:291
>> __random random.c:291
>> __random random.c:288
>> rand rand.c:27
>> rand rand.c:26
>> rand@plt
>> rand@plt
>> compute_flag div.c:25
>> compute_flag div.c:22
>> main div.c:40
>> main div.c:40
>>
>> Now we can see, following streams pair is moved to another section
>> "[ Hot chains in old perf data but source line changed (*) in new perf data ]"
>>
>> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
>> --------------------------- --------------------------
>> main div.c:39 main div.c:39*
>> main div.c:44 main div.c:44
>>
>
>
> so I tried following:
>
> # ./perf record -e cycles:u -b ./perf bench sched pipe
> # ./perf record -e cycles:u -b ./perf bench sched pipe
> # ./perf diff -f --stream --before $PWD --after $PWD >out 2>&1
>
> and the out file looks like this:
>
> [ Matched hot chains between old perf data and new perf data ]
>
> [ Hot chains in old perf data but source line changed (*) in new perf data ]
>
> [ Hot chains in old perf data only ]
>
> hot chain 1:
> cycles: 0, hits: 4.20%
> --------------------------
> 0xffffffff89c00163
>
> hot chain 2:
> cycles: 0, hits: 4.11%
> --------------------------
> 0xffffffff89c00163
>
> hot chain 3:
> cycles: 0, hits: 8.22%
> --------------------------
> 0xffffffff89c00163
>
> hot chain 4:
> cycles: 0, hits: 5.54%
> --------------------------
> 0xffffffff89c00163
>
> hot chain 5:
> cycles: 0, hits: 6.10%
> --------------------------
> 0xffffffff89c00163
>
> [ Hot chains in new perf data only ]
>
> hot chain 1:
> cycles: 0, hits: 5.21%
> --------------------------
> 0xffffffff89c00163
>
> hot chain 2:
> cycles: 0, hits: 4.79%
> --------------------------
> 0xffffffff89c00163
>
> hot chain 3:
> cycles: 0, hits: 5.44%
> --------------------------
> 0xffffffff89c00163
>
> hot chain 4:
> cycles: 0, hits: 5.50%
> --------------------------
> 0xffffffff89c00163
>
> hot chain 5:
> cycles: 0, hits: 7.14%
> --------------------------
> 0xffffffff89c00163
>
>
> I'd expected more common paths, from what I can see from 'perf report --branch-history'
> on bpth perf.data and perf.data.old
>
> jirka
>
I used the same command line but I can see more callchain entries.
perf record -e cycles:u -b perf bench sched pipe
perf record -e cycles:u -b perf bench sched pipe
perf diff --stream
[ Matched hot chains between old perf data and new perf data ]
hot chain pair 1:
cycles: 0, hits: 7.95% cycles: 0, hits: 6.61%
--------------------------- --------------------------
__libc_read read.c:27 __libc_read read.c:27
0xffffffffa9800163 0xffffffffa9800163
[ Hot chains in old perf data but source line changed (*) in new perf data ]
[ Hot chains in old perf data only ]
hot chain 1:
cycles: 49, hits: 4.98%
--------------------------
worker_thread sched-pipe.c:64
worker_thread sched-pipe.c:63
__libc_read read.c:28
__libc_read read.c:27
0xffffffffa9800163
hot chain 2:
cycles: 0, hits: 6.68%
--------------------------
0xffffffffa9800163
hot chain 3:
cycles: 0, hits: 6.57%
--------------------------
0xffffffffa9800163
hot chain 4:
cycles: 60, hits: 5.20%
--------------------------
worker_thread sched-pipe.c:67
worker_thread sched-pipe.c:60
worker_thread sched-pipe.c:70
worker_thread sched-pipe.c:70
__libc_read read.c:28
__libc_read read.c:27
0xffffffffa9800163
[ Hot chains in new perf data only ]
hot chain 1:
cycles: 68, hits: 7.83%
--------------------------
worker_thread sched-pipe.c:68
__libc_write write.c:28
__libc_write write.c:27
0xffffffffa9800163
__libc_write write.c:27
write@plt
write@plt
worker_thread sched-pipe.c:67
worker_thread sched-pipe.c:60
worker_thread sched-pipe.c:70
worker_thread sched-pipe.c:70
__libc_read read.c:28
hot chain 2:
cycles: 70, hits: 4.34%
--------------------------
worker_thread unistd.h:44
worker_thread sched-pipe.c:61
worker_thread sched-pipe.c:65
__libc_write write.c:28
__libc_write write.c:27
0xffffffffa9800163
__libc_write write.c:27
write@plt
write@plt
worker_thread sched-pipe.c:64
worker_thread sched-pipe.c:63
__libc_read read.c:28
hot chain 3:
cycles: 0, hits: 5.67%
--------------------------
0xffffffffa9800163
hot chain 4:
cycles: 0, hits: 5.47%
--------------------------
0xffffffffa9800163
It's interesting that some leaked kernel address are displayed in callchains
even we use the -e cycles:u. Should be the skid issue. I have a patch for
processing the kernel leaked samples but have not posted it.
But I'm no idea why only the leaked kernel address displayed in your example. :(
Thanks
Jin Yao
Hi Jiri,
On 4/27/2020 6:11 PM, Jiri Olsa wrote:
> On Mon, Apr 20, 2020 at 09:04:45AM +0800, Jin Yao wrote:
>> Sometimes, a small change in a hot function reducing the cycles of
>> this function, but the overall workload doesn't get faster. It is
>> interesting where the cycles are moved to.
>>
>> What it would like is to diff before/after streams. A stream is a
>> callchain which is aggregated by the branch records from samples.
>>
>> By browsing the hot streams, we can understand the hot code flow.
>> By comparing the cycles variation of same streams between old perf
>> data and new perf data, we can understand if the cycles are moved to
>> the unchanged code.
>>
>> The before stream is the stream before source line changed
>> (e.g. streams in perf.data.old). The after stream is the stream
>> after source line changed (e.g. streams in perf.data).
>>
>> Diffing before/after streams compares all streams (or compares top
>> N hot streams) between two perf data files.
>>
>> If all entries of one stream in perf.data.old are fully matched with
>> all entries of another stream in perf.data, we think these two streams
>> are matched, otherwise the streams are not matched.
>>
>> For example,
>>
>> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
>> -------------------------- --------------------------
>> main div.c:39 main div.c:39
>> main div.c:44 main div.c:44
>
> hi,
> I'm getting compile error with this patch:
>
> [jolsa@krava perf]$ make JOBS=1
> BUILD: Doing 'make -j1' parallel build
> CC util/srclist.o
> util/srclist.c: In function ‘srclist__node_new’:
> util/srclist.c:388:35: error: ‘%s’ directive output may be truncated writing up to 4095 bytes into a region of size 4091 [-Werror=format-truncation=]
> 388 | snprintf(cmd, sizeof(cmd), "diff %s %s",
> | ^~
> ......
There is no error in my compilation.
But anyway, is it ok to use snprintf(cmd, PATH_MAX, "diff %s %s", b_path,
a_path) instead?
Thanks
Jin Yao
> 456 | ret = init_src_info(b_path, a_path, rel_path, &node->info);
> | ~~~~~~
> In file included from /usr/include/stdio.h:867,
> from util/srclist.c:8:
> /usr/include/bits/stdio2.h:67:10: note: ‘__builtin___snprintf_chk’ output between 7 and 8197 bytes into a destination of size 4096
> 67 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 68 | __bos (__s), __fmt, __va_arg_pack ());
> | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> cc1: all warnings being treated as errors
> mv: cannot stat 'util/.srclist.o.tmp': No such file or directory
> make[4]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:97: util/srclist.o] Error 1
> make[3]: *** [/home/jolsa/kernel/linux-perf/tools/build/Makefile.build:139: util] Error 2
> make[2]: *** [Makefile.perf:617: perf-in.o] Error 2
> make[1]: *** [Makefile.perf:225: sub-make] Error 2
> make: *** [Makefile:70: all] Error 2
>
> thanks,
> jirka
>
>
>>
>> It looks that two streams are matched and we can see for the same
>> streams the cycles are equal and the stream hit percents are
>> slightly changed. That's expected in the normal range.
>>
>> But that's not always true if source lines are changed in perf.data
>> (e.g. div.c:39 is changed). If the source line is changed, they
>> are different streams, we can't compare them. We just think the
>> stream in perf.data is a new one.
>>
>> The challenge is how to identify the changed source lines. The basic
>> idea is to use linux command 'diff' to compare the source file A and
>> source file A* line by line (assume A is used in perf.data.old
>> and A* is updated in perf.data). Create a source line mapping table.
>>
>> For example,
>>
>> Execute 'diff ./before/div.c ./after/div.c'
>>
>> 25c25
>> < i = rand() % 2;
>> ---
>> > i = rand() % 4;
>> 39c39
>> < for (i = 0; i < 2000000000; i++) {
>> ---
>> > for (i = 0; i < 20000000001; i++) {
>>
>> div.c (after -> before) lines mapping:
>> 0 -> 0
>> 1 -> 1
>> 2 -> 2
>> 3 -> 3
>> 4 -> 4
>> 5 -> 5
>> 6 -> 6
>> 7 -> 7
>> 8 -> 8
>> 9 -> 9
>> ...
>> 24 -> 24
>> 25 -> -1
>> 26 -> 26
>> 27 -> 27
>> 28 -> 28
>> 29 -> 29
>> 30 -> 30
>> 31 -> 31
>> 32 -> 32
>> 33 -> 33
>> 34 -> 34
>> 35 -> 35
>> 36 -> 36
>> 37 -> 37
>> 38 -> 38
>> 39 -> -1
>> 40 -> 40
>> ...
>>
>> From the table, we can easily know div.c:39 is source line changed.
>> (it's mapped to -1). So these two streams are not matched.
>>
>> This patch can also handle the cases of new source lines insertion
>> and old source lines deletion. This source line mapping table is a
>> base for next processing for stream comparison.
>>
>> This patch creates a new rblist 'srclist' to manage the source line
>> mapping tables. Each node contains the source line mapping table of
>> a before/after source file pair. The node is created on demand as
>> we process the samples. So it's likely we don't need to create so many
>> nodes.
>>
>> v2:
>> ---
>> Refine the code
>> 1. calloc -> zalloc
>> 2. Functions operating on a 'struct line_pair' have the
>> line_pair__ prefix.
>> 3. Check get_range() return value.
>> 4. Use fgets to replace fgetc for getting the number of lines.
>> 5. Use path__join to generate the full path.
>> 6. Keep the buffer after calling getline.
>>
>> Signed-off-by: Jin Yao <[email protected]>
>> ---
>> tools/perf/util/Build | 1 +
>> tools/perf/util/srclist.c | 555 ++++++++++++++++++++++++++++++++++++++
>> tools/perf/util/srclist.h | 65 +++++
>> 3 files changed, 621 insertions(+)
>> create mode 100644 tools/perf/util/srclist.c
>> create mode 100644 tools/perf/util/srclist.h
>>
>> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
>> index c0cf8dff694e..b9bf620b60f0 100644
>> --- a/tools/perf/util/Build
>> +++ b/tools/perf/util/Build
>> @@ -99,6 +99,7 @@ perf-y += call-path.o
>> perf-y += rwsem.o
>> perf-y += thread-stack.o
>> perf-y += spark.o
>> +perf-y += srclist.o
>> perf-$(CONFIG_AUXTRACE) += auxtrace.o
>> perf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
>> perf-$(CONFIG_AUXTRACE) += intel-pt.o
>> diff --git a/tools/perf/util/srclist.c b/tools/perf/util/srclist.c
>> new file mode 100644
>> index 000000000000..8060e4855d11
>> --- /dev/null
>> +++ b/tools/perf/util/srclist.c
>> @@ -0,0 +1,555 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Manage difference of source lines
>> + * Copyright (c) 2020, Intel Corporation.
>> + * Author: Jin Yao
>> + */
>> +#include <stdlib.h>
>> +#include <stdio.h>
>> +#include <sys/types.h>
>> +#include <inttypes.h>
>> +#include <string.h>
>> +#include <strings.h>
>> +#include <unistd.h>
>> +#include <err.h>
>> +#include <errno.h>
>> +#include <limits.h>
>> +#include <stdbool.h>
>> +#include <linux/zalloc.h>
>> +#include "strlist.h"
>> +#include "srclist.h"
>> +#include "debug.h"
>> +#include "path.h"
>> +
>> +enum {
>> + DIFF_LINE_ADD,
>> + DIFF_LINE_DEL,
>> + DIFF_LINE_CHANGE
>> +};
>> +
>> +static void get_full_path(const char *dir, const char *rel_path,
>> + char *full_path, int size)
>> +{
>> + if (!dir) {
>> + snprintf(full_path, size, "%s", rel_path);
>> + return;
>> + }
>> +
>> + path__join(full_path, size, dir, rel_path);
>> +}
>> +
>> +static int path__number_of_lines(char *path)
>> +{
>> + FILE *fp;
>> + char buf[4096], *p, last = 0;
>> + int num = 0;
>> +
>> + fp = fopen(path, "r");
>> + if (!fp) {
>> + pr_debug("Failed to open file %s\n", path);
>> + return -1;
>> + }
>> +
>> + while (!feof(fp)) {
>> + p = fgets(buf, sizeof(buf), fp);
>> + if (!p)
>> + break;
>> +
>> + last = p[strlen(p) - 1];
>> + if (last == '\n')
>> + num++;
>> + }
>> +
>> + fclose(fp);
>> +
>> + if (last != '\n' && last != 0)
>> + num++;
>> +
>> + return num;
>> +}
>> +
>> +static int get_digit(char *str, char **end_str, int *val)
>> +{
>> + int v;
>> + char *end;
>> +
>> + errno = 0;
>> + v = strtol(str, &end, 10);
>> + if ((errno == ERANGE && (v == INT_MAX || v == INT_MIN))
>> + || (errno != 0 && v == 0)) {
>> + return -1;
>> + }
>> +
>> + if (end == str)
>> + return -1;
>> +
>> + *val = v;
>> + *end_str = end;
>> + return 0;
>> +}
>> +
>> +static int get_range(char *str, int *start_val, int *end_val)
>> +{
>> + int val, ret;
>> + char *end, *next;
>> +
>> + *start_val = -1;
>> + *end_val = -1;
>> +
>> + ret = get_digit(str, &end, &val);
>> + if (ret)
>> + return ret;
>> +
>> + *start_val = val;
>> +
>> + if (*end != '\0') {
>> + next = end + 1;
>> + ret = get_digit(next, &end, &val);
>> + if (ret)
>> + return ret;
>> +
>> + *end_val = val;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int opr_str_idx(char *str, int len, char ch)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < len; i++) {
>> + if (str[i] == ch)
>> + break;
>> + }
>> +
>> + if (i == len || i == 0 || i == len - 1)
>> + return -1;
>> +
>> + if (str[i - 1] >= '0' && str[i - 1] <= '9' &&
>> + str[i + 1] >= '0' && str[i + 1] <= '9') {
>> + return i;
>> + }
>> +
>> + return -1;
>> +}
>> +
>> +static bool get_diff_operation(char *str, int *opr, int *b_start, int *b_end,
>> + int *a_start, int *a_end)
>> +{
>> + int idx, len;
>> +
>> + if (str[0] == '<' || str[0] == '>' || str[0] == '-')
>> + return false;
>> +
>> + len = strlen(str);
>> +
>> + /*
>> + * For example,
>> + * 5,6d4
>> + * 8a7
>> + * 20,21c19
>> + */
>> + idx = opr_str_idx(str, len, 'd');
>> + if (idx != -1) {
>> + *opr = DIFF_LINE_DEL;
>> + goto exit;
>> + }
>> +
>> + idx = opr_str_idx(str, len, 'a');
>> + if (idx != -1) {
>> + *opr = DIFF_LINE_ADD;
>> + goto exit;
>> + }
>> +
>> + idx = opr_str_idx(str, len, 'c');
>> + if (idx != -1) {
>> + *opr = DIFF_LINE_CHANGE;
>> + goto exit;
>> + }
>> +
>> +exit:
>> + if (idx != -1) {
>> + str[idx] = 0;
>> + if (get_range(str, b_start, b_end) != 0)
>> + return false;
>> +
>> + if (get_range(&str[idx + 1], a_start, a_end) != 0)
>> + return false;
>> +
>> + return true;
>> + }
>> +
>> + return false;
>> +}
>> +
>> +static int line_pair__del_lines(struct line_pair *lines, int b_start, int b_end,
>> + int a_start, int a_end __maybe_unused,
>> + int *nr_lines, int *b_adjust)
>> +{
>> + int i = *nr_lines;
>> + bool set = false;
>> +
>> + /*
>> + * For example, 5,6d4
>> + * It means line 5 ~ line 6 of file1 are not same as line 4 of file2
>> + * because line 5 ~ line 6 are deleted.
>> + */
>> +
>> + if (a_start == i) {
>> + lines[i].a_nr = i;
>> + lines[i].b_nr = i + *b_adjust;
>> + set = true;
>> + }
>> +
>> + if (b_end != -1)
>> + *b_adjust += b_end - b_start + 1;
>> + else
>> + *b_adjust += 1;
>> +
>> + if (!set) {
>> + lines[i].a_nr = i;
>> + lines[i].b_nr = i + *b_adjust;
>> + }
>> +
>> + *nr_lines = i + 1;
>> + return 0;
>> +}
>> +
>> +static int line_pair__add_lines(struct line_pair *lines,
>> + int b_start __maybe_unused,
>> + int b_end __maybe_unused,
>> + int a_start, int a_end, int *nr_lines,
>> + int *b_adjust)
>> +{
>> + int i = *nr_lines;
>> +
>> + /*
>> + * For example, 8a7
>> + * It means line 8 at file1 is not same as line 7 at file2
>> + * because a new line (line 7) is inserted to file2.
>> + */
>> + if (a_end != -1) {
>> + for (int j = 0; j < a_end - a_start + 1; j++) {
>> + lines[i].a_nr = i;
>> + lines[i].b_nr = -1;
>> + i++;
>> + }
>> + *b_adjust -= a_end - a_start + 1;
>> + } else {
>> + lines[i].a_nr = i;
>> + lines[i].b_nr = -1;
>> + i++;
>> + *b_adjust -= 1;
>> + }
>> +
>> + *nr_lines = i;
>> + return 0;
>> +}
>> +
>> +static int line_pair__change_lines(struct line_pair *lines, int b_start,
>> + int b_end, int a_start, int a_end,
>> + int *nr_lines, int *b_adjust)
>> +{
>> + int i = *nr_lines;
>> +
>> + /*
>> + * For example, 20,21c19
>> + * It means line20~line21 of file1 are not same as line19 of file2
>> + * since they are changed.
>> + */
>> +
>> + if (a_end != -1) {
>> + for (int j = 0; j < a_end - a_start + 1; j++) {
>> + lines[i].a_nr = i;
>> + lines[i].b_nr = -1;
>> + i++;
>> + }
>> + } else {
>> + lines[i].a_nr = i;
>> + lines[i].b_nr = -1;
>> + i++;
>> + }
>> +
>> + if (b_end != -1)
>> + *b_adjust += b_end - b_start;
>> +
>> + if (a_end != -1)
>> + *b_adjust -= a_end - a_start;
>> +
>> + *nr_lines = i;
>> + return 0;
>> +}
>> +
>> +static int line_pair__update_lines(struct line_pair *lines, int opr,
>> + int b_start, int b_end, int a_start,
>> + int a_end, int *nr_lines, int *b_adjust)
>> +{
>> + int i = *nr_lines;
>> +
>> + while (i < a_start) {
>> + lines[i].a_nr = i;
>> + lines[i].b_nr = i + *b_adjust;
>> + i++;
>> + }
>> +
>> + *nr_lines = i;
>> +
>> + switch (opr) {
>> + case DIFF_LINE_DEL:
>> + line_pair__del_lines(lines, b_start, b_end, a_start, a_end,
>> + nr_lines, b_adjust);
>> + break;
>> +
>> + case DIFF_LINE_ADD:
>> + line_pair__add_lines(lines, b_start, b_end, a_start, a_end,
>> + nr_lines, b_adjust);
>> + break;
>> +
>> + case DIFF_LINE_CHANGE:
>> + line_pair__change_lines(lines, b_start, b_end, a_start,
>> + a_end, nr_lines, b_adjust);
>> + break;
>> +
>> + default:
>> + break;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void line_pair__update_remaining(struct line_pair *lines, int a_num,
>> + int *nr_lines, int b_adjust)
>> +{
>> + int i;
>> +
>> + for (i = *nr_lines; i <= a_num; i++) {
>> + lines[i].a_nr = i;
>> + lines[i].b_nr = i + b_adjust;
>> + }
>> +
>> + *nr_lines = i;
>> +}
>> +
>> +static int build_mapping_table(FILE *fp, struct line_pair *lines,
>> + int *nr_lines, int a_num)
>> +{
>> + char *str = NULL, *p;
>> + size_t len = 0;
>> + int opr, b_start, b_end, a_start, a_end, b_adjust = 0;
>> +
>> + *nr_lines = 1; /* First line is reserved */
>> +
>> + while (getline(&str, &len, fp) > 0) {
>> + p = strchr(str, '\n');
>> + if (p)
>> + *p = '\0';
>> +
>> + pr_debug("%s\n", str);
>> +
>> + if (!get_diff_operation(str, &opr, &b_start, &b_end,
>> + &a_start, &a_end)) {
>> + continue;
>> + }
>> +
>> + line_pair__update_lines(lines, opr, b_start, b_end, a_start,
>> + a_end, nr_lines, &b_adjust);
>> + }
>> +
>> + line_pair__update_remaining(lines, a_num, nr_lines, b_adjust);
>> +
>> + free(str);
>> + return 0;
>> +}
>> +
>> +static int src_info__create_line_mapping(struct src_info *info, char *b_path,
>> + char *a_path)
>> +{
>> + char cmd[PATH_MAX];
>> + FILE *fp = NULL;
>> + int ret = -1;
>> +
>> + info->nr_lines_before = path__number_of_lines(b_path);
>> + if (info->nr_lines_before == -1)
>> + goto out;
>> +
>> + info->nr_lines_after = path__number_of_lines(a_path);
>> + if (info->nr_lines_after == -1)
>> + goto out;
>> +
>> + /* Reserve first line */
>> + info->nr_max = info->nr_lines_before + info->nr_lines_after + 1;
>> + info->lines = calloc(info->nr_max, sizeof(struct line_pair));
>> + if (!info->lines) {
>> + pr_err("Failed to alloc memory for lines\n");
>> + goto out;
>> + }
>> +
>> + snprintf(cmd, sizeof(cmd), "diff %s %s",
>> + b_path, a_path);
>> +
>> + pr_debug("Execute '%s'\n", cmd);
>> +
>> + fp = popen(cmd, "r");
>> + if (!fp) {
>> + pr_err("Failed to execute '%s'\n", cmd);
>> + goto out;
>> + }
>> +
>> + ret = build_mapping_table(fp, info->lines, &info->nr_lines,
>> + info->nr_lines_after);
>> +
>> +out:
>> + if (fp)
>> + fclose(fp);
>> +
>> + return ret;
>> +}
>> +
>> +static void free_src_info(struct src_info *info)
>> +{
>> + zfree(&info->rel_path);
>> + zfree(&info->lines);
>> +}
>> +
>> +static int init_src_info(char *b_path, char *a_path, const char *rel_path,
>> + struct src_info *info)
>> +{
>> + int ret;
>> +
>> + info->rel_path = strdup(rel_path);
>> + if (!info->rel_path)
>> + return -1;
>> +
>> + ret = src_info__create_line_mapping(info, b_path, a_path);
>> + if (ret)
>> + free_src_info(info);
>> +
>> + return ret;
>> +}
>> +
>> +static void free_src_node(struct src_node *node)
>> +{
>> + if (!node)
>> + return;
>> +
>> + free_src_info(&node->info);
>> + free(node);
>> +}
>> +
>> +static struct rb_node *srclist__node_new(struct rblist *rblist,
>> + const void *entry)
>> +{
>> + struct srclist *slist = container_of(rblist, struct srclist, rblist);
>> + const char *rel_path = entry;
>> + char b_path[PATH_MAX], a_path[PATH_MAX];
>> + struct src_node *node;
>> + int ret;
>> +
>> + get_full_path(slist->before_dir, rel_path, b_path, PATH_MAX);
>> + get_full_path(slist->after_dir, rel_path, a_path, PATH_MAX);
>> +
>> + node = zalloc(sizeof(*node));
>> + if (!node)
>> + return NULL;
>> +
>> + ret = init_src_info(b_path, a_path, rel_path, &node->info);
>> + if (ret)
>> + goto err;
>> +
>> + return &node->rb_node;
>> +
>> +err:
>> + free_src_node(node);
>> + return NULL;
>> +}
>> +
>> +static void srclist__node_delete(struct rblist *rblist __maybe_unused,
>> + struct rb_node *rb_node)
>> +{
>> + struct src_node *node = container_of(rb_node, struct src_node, rb_node);
>> +
>> + free_src_info(&node->info);
>> + free(node);
>> +}
>> +
>> +static int srclist__node_cmp(struct rb_node *rb_node, const void *entry)
>> +{
>> + struct src_node *node = container_of(rb_node, struct src_node, rb_node);
>> + const char *str = entry;
>> +
>> + return strcmp(node->info.rel_path, str);
>> +}
>> +
>> +struct srclist *srclist__new(const char *before_dir, const char *after_dir)
>> +{
>> + struct srclist *slist = zalloc(sizeof(*slist));
>> +
>> + if (!slist)
>> + return NULL;
>> +
>> + rblist__init(&slist->rblist);
>> + slist->rblist.node_cmp = srclist__node_cmp;
>> + slist->rblist.node_new = srclist__node_new;
>> + slist->rblist.node_delete = srclist__node_delete;
>> +
>> + slist->before_dir = before_dir;
>> + slist->after_dir = after_dir;
>> + return slist;
>> +}
>> +
>> +void srclist__delete(struct srclist *slist)
>> +{
>> + if (slist)
>> + rblist__delete(&slist->rblist);
>> +}
>> +
>> +struct src_node *srclist__find(struct srclist *slist, char *rel_path,
>> + bool create)
>> +{
>> + struct src_node *node = NULL;
>> + struct rb_node *rb_node;
>> +
>> + if (create)
>> + rb_node = rblist__findnew(&slist->rblist, (void *)rel_path);
>> + else
>> + rb_node = rblist__find(&slist->rblist, (void *)rel_path);
>> +
>> + if (rb_node)
>> + node = container_of(rb_node, struct src_node, rb_node);
>> +
>> + return node;
>> +}
>> +
>> +struct line_pair *srclist__line_pair(struct src_node *node, int a_nr)
>> +{
>> + struct src_info *info = &node->info;
>> +
>> + if (a_nr < info->nr_lines && a_nr > 0)
>> + return &info->lines[a_nr];
>> +
>> + pr_debug("Exceeds line range: nr_lines = %d, a_nr = %d\n",
>> + info->nr_lines, a_nr);
>> +
>> + return NULL;
>> +}
>> +
>> +void srclist__dump(struct srclist *slist)
>> +{
>> + struct src_node *node;
>> + struct src_info *info;
>> + int i;
>> +
>> + srclist__for_each_entry(node, slist) {
>> + info = &node->info;
>> +
>> + pr_debug("%s (after -> before) lines mapping:\n",
>> + info->rel_path);
>> +
>> + for (i = 0; i < info->nr_lines; i++) {
>> + pr_debug("%d -> %d\n",
>> + info->lines[i].a_nr,
>> + info->lines[i].b_nr);
>> + }
>> + }
>> +}
>> diff --git a/tools/perf/util/srclist.h b/tools/perf/util/srclist.h
>> new file mode 100644
>> index 000000000000..f25b0de91a13
>> --- /dev/null
>> +++ b/tools/perf/util/srclist.h
>> @@ -0,0 +1,65 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef __PERF_SRCLIST_H
>> +#define __PERF_SRCLIST_H
>> +
>> +#include <linux/types.h>
>> +#include <linux/rbtree.h>
>> +#include "rblist.h"
>> +
>> +struct line_pair {
>> + int a_nr; /* line nr in after.c */
>> + int b_nr; /* line nr in before.c */
>> +};
>> +
>> +struct src_node;
>> +
>> +struct src_info {
>> + char *rel_path; /* relative path */
>> + struct line_pair *lines;
>> + int nr_max;
>> + int nr_lines;
>> + int nr_lines_before;
>> + int nr_lines_after;
>> +};
>> +
>> +struct src_node {
>> + struct rb_node rb_node;
>> + struct src_info info;
>> +};
>> +
>> +struct srclist {
>> + struct rblist rblist;
>> + const char *before_dir;
>> + const char *after_dir;
>> +};
>> +
>> +struct srclist *srclist__new(const char *before_dir, const char *after_dir);
>> +void srclist__delete(struct srclist *slist);
>> +
>> +struct src_node *srclist__find(struct srclist *slist, char *rel_path,
>> + bool create);
>> +
>> +struct line_pair *srclist__line_pair(struct src_node *node, int a_nr);
>> +void srclist__dump(struct srclist *slist);
>> +
>> +static inline struct src_node *srclist__first(struct srclist *slist)
>> +{
>> + struct rb_node *rn = rb_first_cached(&slist->rblist.entries);
>> +
>> + return rn ? rb_entry(rn, struct src_node, rb_node) : NULL;
>> +}
>> +
>> +static inline struct src_node *srclist__next(struct src_node *sn)
>> +{
>> + struct rb_node *rn;
>> +
>> + if (!sn)
>> + return NULL;
>> + rn = rb_next(&sn->rb_node);
>> + return rn ? rb_entry(rn, struct src_node, rb_node) : NULL;
>> +}
>> +
>> +#define srclist__for_each_entry(pos, slist) \
>> + for (pos = srclist__first(slist); pos; pos = srclist__next(pos))
>> +
>> +#endif /* __PERF_SRCLIST_H */
>> --
>> 2.17.1
>>
>
Hi Jiri,
On 4/27/2020 6:29 PM, Jiri Olsa wrote:
> On Mon, Apr 20, 2020 at 09:04:44AM +0800, Jin Yao wrote:
>> Sometimes, a small change in a hot function reducing the cycles of
>> this function, but the overall workload doesn't get faster. It is
>> interesting where the cycles are moved to.
>>
>> What it would like is to diff before/after streams. A stream we think
>> is a callchain which is aggregated by the branch records from perf
>> samples.
>
> I wonder we could use this on intel_pt trace.. like compare streams
> for given function call.. not sure that would be feasible, but might
> be good idea to write this in a generic way and not callchain specific
>
> jirka
>
Yes, that's a good idea. We should try to write the code in a generic way.
Thanks
Jin Yao
>>
>> By browsing the hot streams, we can understand the hot code path.
>> By comparing the cycles variation of same streams between old perf
>> data and new perf data, we can understand if the cycles are moved
>> to other codes.
>>
>> The before stream is the stream in perf.data.old. The after stream
>> is the stream in perf.data.
>>
>> Diffing before/after streams compares top N hottest streams between
>> two perf data files.
>>
>> If all entries of one stream in perf.data.old are fully matched with
>> all entries of another stream in perf.data, we think two streams
>> are matched, otherwise the streams are not matched.
>>
>> For example,
>>
>> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
>> -------------------------- --------------------------
>> main div.c:39 main div.c:39
>> main div.c:44 main div.c:44
>>
>> The above streams are matched and we can see for the same streams the
>> cycles (1) are equal and the callchain hit percents are slightly changed
>> (26.80% vs. 27.30%). That's expected.
>>
>> But that's not always true if source code is changed in perf.data
>> (e.g. div.c:39 is changed). If div.c:39 is changed, they are different
>> streams, we can't compare them. We will think the stream in perf.data
>> is a new stream.
>>
>> The challenge is how to identify the changed source lines. The basic
>> idea is to use linux command "diff" to compare the source file A and
>> source file A* line by line (assume file A is used in perf.data.old
>> and file A* is used in perf.data). According to "diff" output,
>> we can generate a source line mapping table.
>>
>> For example,
>>
>> Execute 'diff ./before/div.c ./after/div.c'
>>
>> 25c25
>> < i = rand() % 2;
>> ---
>> > i = rand() % 4;
>> 39c39
>> < for (i = 0; i < 2000000000; i++) {
>> ---
>> > for (i = 0; i < 20000000001; i++) {
>>
>> div.c (after -> before) lines mapping:
>> 0 -> 0
>> 1 -> 1
>> 2 -> 2
>> 3 -> 3
>> 4 -> 4
>> 5 -> 5
>> 6 -> 6
>> 7 -> 7
>> 8 -> 8
>> 9 -> 9
>> ...
>> 24 -> 24
>> 25 -> -1
>> 26 -> 26
>> 27 -> 27
>> 28 -> 28
>> 29 -> 29
>> 30 -> 30
>> 31 -> 31
>> 32 -> 32
>> 33 -> 33
>> 34 -> 34
>> 35 -> 35
>> 36 -> 36
>> 37 -> 37
>> 38 -> 38
>> 39 -> -1
>> 40 -> 40
>> ...
>>
>> From the table, we can easily know div.c:39 is source line changed.
>> (it's mapped to -1). So following two streams are not matched.
>>
>> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
>> -------------------------- --------------------------
>> main div.c:39 main div.c:39
>> main div.c:44 main div.c:44
>>
>> Now let's see examples.
>>
>> perf record -b ... Generate perf.data.old with branch data
>> perf record -b ... Generate perf.data with branch data
>> perf diff --stream
>>
>> [ Matched hot chains between old perf data and new perf data) ]
>>
>> hot chain pair 1:
>> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
>> --------------------------- --------------------------
>> main div.c:39 main div.c:39
>> main div.c:44 main div.c:44
>>
>> hot chain pair 2:
>> cycles: 35, hits: 21.43% cycles: 33, hits: 19.37%
>> --------------------------- --------------------------
>> __random_r random_r.c:360 __random_r random_r.c:360
>> __random_r random_r.c:388 __random_r random_r.c:388
>> __random_r random_r.c:388 __random_r random_r.c:388
>> __random_r random_r.c:380 __random_r random_r.c:380
>> __random_r random_r.c:357 __random_r random_r.c:357
>> __random random.c:293 __random random.c:293
>> __random random.c:293 __random random.c:293
>> __random random.c:291 __random random.c:291
>> __random random.c:291 __random random.c:291
>> __random random.c:291 __random random.c:291
>> __random random.c:288 __random random.c:288
>> rand rand.c:27 rand rand.c:27
>> rand rand.c:26 rand rand.c:26
>> rand@plt rand@plt
>> rand@plt rand@plt
>> compute_flag div.c:25 compute_flag div.c:25
>> compute_flag div.c:22 compute_flag div.c:22
>> main div.c:40 main div.c:40
>> main div.c:40 main div.c:40
>> main div.c:39 main div.c:39
>>
>> hot chain pair 3:
>> cycles: 18, hits: 6.10% cycles: 19, hits: 6.51%
>> --------------------------- --------------------------
>> __random_r random_r.c:360 __random_r random_r.c:360
>> __random_r random_r.c:388 __random_r random_r.c:388
>> __random_r random_r.c:388 __random_r random_r.c:388
>> __random_r random_r.c:380 __random_r random_r.c:380
>> __random_r random_r.c:357 __random_r random_r.c:357
>> __random random.c:293 __random random.c:293
>> __random random.c:293 __random random.c:293
>> __random random.c:291 __random random.c:291
>> __random random.c:291 __random random.c:291
>> __random random.c:291 __random random.c:291
>> __random random.c:288 __random random.c:288
>> rand rand.c:27 rand rand.c:27
>> rand rand.c:26 rand rand.c:26
>> rand@plt rand@plt
>> rand@plt rand@plt
>> compute_flag div.c:25 compute_flag div.c:25
>> compute_flag div.c:22 compute_flag div.c:22
>> main div.c:40 main div.c:40
>>
>> hot chain pair 4:
>> cycles: 9, hits: 5.95% cycles: 8, hits: 5.03%
>> --------------------------- --------------------------
>> __random_r random_r.c:360 __random_r random_r.c:360
>> __random_r random_r.c:388 __random_r random_r.c:388
>> __random_r random_r.c:388 __random_r random_r.c:388
>> __random_r random_r.c:380 __random_r random_r.c:380
>>
>> [ Hot chains in old perf data but source line changed (*) in new perf data ]
>>
>> [ Hot chains in old perf data only ]
>>
>> hot chain 1:
>> cycles: 2, hits: 4.08%
>> --------------------------
>> main div.c:42
>> compute_flag div.c:28
>>
>> [ Hot chains in new perf data only ]
>>
>> hot chain 1:
>> cycles: 36, hits: 3.36%
>> --------------------------
>> __random_r random_r.c:357
>> __random random.c:293
>> __random random.c:293
>> __random random.c:291
>> __random random.c:291
>> __random random.c:291
>> __random random.c:288
>> rand rand.c:27
>> rand rand.c:26
>> rand@plt
>> rand@plt
>> compute_flag div.c:25
>> compute_flag div.c:22
>> main div.c:40
>> main div.c:40
>>
>> If we enable the source line comparison option, the output may be different.
>>
>> perf record -b ... Generate perf.data.old with branch data
>> perf record -b ... Generate perf.data with branch data
>> perf diff --stream --before ./before --after ./after
>>
>> [ Matched hot chains between old perf data and new perf data) ]
>>
>> hot chain pair 1:
>> cycles: 18, hits: 6.10% cycles: 19, hits: 6.51%
>> --------------------------- --------------------------
>> __random_r random_r.c:360 __random_r random_r.c:360
>> __random_r random_r.c:388 __random_r random_r.c:388
>> __random_r random_r.c:388 __random_r random_r.c:388
>> __random_r random_r.c:380 __random_r random_r.c:380
>> __random_r random_r.c:357 __random_r random_r.c:357
>> __random random.c:293 __random random.c:293
>> __random random.c:293 __random random.c:293
>> __random random.c:291 __random random.c:291
>> __random random.c:291 __random random.c:291
>> __random random.c:291 __random random.c:291
>> __random random.c:288 __random random.c:288
>> rand rand.c:27 rand rand.c:27
>> rand rand.c:26 rand rand.c:26
>> rand@plt rand@plt
>> rand@plt rand@plt
>> compute_flag div.c:25 compute_flag div.c:25
>> compute_flag div.c:22 compute_flag div.c:22
>> main div.c:40 main div.c:40
>>
>> hot chain pair 2:
>> cycles: 9, hits: 5.95% cycles: 8, hits: 5.03%
>> --------------------------- --------------------------
>> __random_r random_r.c:360 __random_r random_r.c:360
>> __random_r random_r.c:388 __random_r random_r.c:388
>> __random_r random_r.c:388 __random_r random_r.c:388
>> __random_r random_r.c:380 __random_r random_r.c:380
>>
>> [ Hot chains in old perf data but source line changed (*) in new perf data ]
>>
>> hot chain pair 1:
>> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
>> --------------------------- --------------------------
>> main div.c:39 main div.c:39*
>> main div.c:44 main div.c:44
>>
>> hot chain pair 2:
>> cycles: 35, hits: 21.43% cycles: 33, hits: 19.37%
>> --------------------------- --------------------------
>> __random_r random_r.c:360 __random_r random_r.c:360
>> __random_r random_r.c:388 __random_r random_r.c:388
>> __random_r random_r.c:388 __random_r random_r.c:388
>> __random_r random_r.c:380 __random_r random_r.c:380
>> __random_r random_r.c:357 __random_r random_r.c:357
>> __random random.c:293 __random random.c:293
>> __random random.c:293 __random random.c:293
>> __random random.c:291 __random random.c:291
>> __random random.c:291 __random random.c:291
>> __random random.c:291 __random random.c:291
>> __random random.c:288 __random random.c:288
>> rand rand.c:27 rand rand.c:27
>> rand rand.c:26 rand rand.c:26
>> rand@plt rand@plt
>> rand@plt rand@plt
>> compute_flag div.c:25 compute_flag div.c:25
>> compute_flag div.c:22 compute_flag div.c:22
>> main div.c:40 main div.c:40
>> main div.c:40 main div.c:40
>> main div.c:39 main div.c:39*
>>
>> [ Hot chains in old perf data only ]
>>
>> hot chain 1:
>> cycles: 2, hits: 4.08%
>> --------------------------
>> main div.c:42
>> compute_flag div.c:28
>>
>> [ Hot chains in new perf data only ]
>>
>> hot chain 1:
>> cycles: 36, hits: 3.36%
>> --------------------------
>> __random_r random_r.c:357
>> __random random.c:293
>> __random random.c:293
>> __random random.c:291
>> __random random.c:291
>> __random random.c:291
>> __random random.c:288
>> rand rand.c:27
>> rand rand.c:26
>> rand@plt
>> rand@plt
>> compute_flag div.c:25
>> compute_flag div.c:22
>> main div.c:40
>> main div.c:40
>>
>> Now we can see, following streams pair is moved to another section
>> "[ Hot chains in old perf data but source line changed (*) in new perf data ]"
>>
>> cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
>> --------------------------- --------------------------
>> main div.c:39 main div.c:39*
>> main div.c:44 main div.c:44
>>
>> v3:
>> ---
>> v2 has 14 patches, it's hard to review.
>> v3 is only 7 patches for basic stream comparison.
>>
>> Jin Yao (7):
>> perf util: Create source line mapping table
>> perf util: Create streams for managing top N hottest callchains
>> perf util: Return per-event callchain streams
>> perf util: Compare two streams
>> perf util: Calculate the sum of all streams hits
>> perf util: Report hot streams
>> perf diff: Support hot streams comparison
>>
>> tools/perf/Documentation/perf-diff.txt | 14 +
>> tools/perf/builtin-diff.c | 170 +++++++-
>> tools/perf/util/Build | 1 +
>> tools/perf/util/callchain.c | 495 ++++++++++++++++++++++
>> tools/perf/util/callchain.h | 32 ++
>> tools/perf/util/srclist.c | 555 +++++++++++++++++++++++++
>> tools/perf/util/srclist.h | 65 +++
>> 7 files changed, 1319 insertions(+), 13 deletions(-)
>> create mode 100644 tools/perf/util/srclist.c
>> create mode 100644 tools/perf/util/srclist.h
>>
>> --
>> 2.17.1
>>
>