2017-08-06 21:24:56

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 00/14] generate full callchain cursor entries for inlined frames

This series of patches completely reworks the way inline frames are handled.
Instead of querying for the inline nodes on-demand in the individual tools,
we now create proper callchain nodes for inlined frames. The advantages this
approach brings are numerous:

- less duplicated code in the individual browser
- aggregated cost for inlined frames for the --children top-down list
- various bug fixes that arose from querying for a srcline/symbol based on
the IP of a sample, which will always point to the last inlined frame
instead of the corresponding non-inlined frame
- overall much better support for visualizing cost for heavily-inlined C++
code, which simply was confusing and unreliably before
- srcline honors the global setting as to whether full paths or basenames
should be shown
- caches for inlined frames and srcline information, which allow us to
enable inline frame handling by default

For comparison, below lists the output before and after for `perf script`
and `perf report`. The example file I used to generate the perf data is:

~~~~~
$ cat inlining.cpp
#include <complex>
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
uniform_real_distribution<double> uniform(-1E5, 1E5);
default_random_engine engine;
double s = 0;
for (int i = 0; i < 10000000; ++i) {
s += norm(complex<double>(uniform(engine), uniform(engine)));
}
cout << s << '\n';
return 0;
}
$ g++ -O2 -g -o inlining inlining.cpp
$ perf record --call-graph dwarf ./inlining
~~~~~

Now, the (broken) status-quo looks like this. Look for "NOTE:" to see some
of my comments that outline the various issues I'm trying to solve by this
patch series.

~~~~~
$ perf script --inline
...
inlining 11083 97459.356656: 33680 cycles:
214f7 __hypot_finite (/usr/lib/libm-2.25.so)
ace3 hypot (/usr/lib/libm-2.25.so)
a4a main (/home/milian/projects/src/perf-tests/inlining)
std::__complex_abs
std::abs<double>
std::_Norm_helper<true>::_S_do_it<double>
std::norm<double>
main
20510 __libc_start_main (/usr/lib/libc-2.25.so)
bd9 _start (/home/milian/projects/src/perf-tests/inlining)
# NOTE: the above inlined stack is confusing: the a4a is an address into main,
# which is the non-inlined symbol. the entry with the address should be
# at the end of the stack, where it's actually duplicated once more but
# there it's missing the address
...
$ perf report -s sym -g srcline -i perf.inlining.data --inline --stdio
...
--38.86%--_start
__libc_start_main
|
|--15.68%--main random.tcc:3326
| /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
| /usr/include/c++/6.3.1/bits/random.tcc:3326 (inline)
|
|--10.36%--main random.h:143
| /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
| /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
| /usr/include/c++/6.3.1/bits/random.h:332 (inline)
| /usr/include/c++/6.3.1/bits/random.h:151 (inline)
| /usr/include/c++/6.3.1/bits/random.h:143 (inline)
|
|--5.66%--main random.tcc:3332
| /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
| /usr/include/c++/6.3.1/bits/random.tcc:3332 (inline)
...
# NOTE: the grouping is totally off because the first and last frame of the
inline nodes is completely bogus, since the IP is used to find the sym/srcline
which is different from the actual inlined sym/srcline.
also, the code currently displays either the inlined function name or
the corresponding filename (but in full length, instead of just the basename).

$ perf report -s sym -g srcline -i perf.inlining.data --inline --stdio --no-children
...
38.86% [.] main
|
|--15.68%--main random.tcc:3326
| /usr/include/c++/6.3.1/bits/random.tcc:3326 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /home/milian/projects/src/perf-tests/inlining.cpp:14 (inline)
| __libc_start_main
| _start
...
# NOTE: the srcline for main is wrong, it should be inlining.cpp:14,
i.e. what is displayed in the line below (see also perf script issue above)
~~~~~

Afterwards, all of the above issues are resolved (and inlined frames are
displayed by default):

~~~~~
$ perf script
...
inlining 11083 97459.356656: 33680 cycles:
214f7 __hypot_finite (/usr/lib/libm-2.25.so)
ace3 hypot (/usr/lib/libm-2.25.so)
a4a std::__complex_abs (inlined)
a4a std::abs<double> (inlined)
a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
a4a std::norm<double> (inlined)
a4a main (/home/milian/projects/src/perf-tests/inlining)
20510 __libc_start_main (/usr/lib/libc-2.25.so)
bd9 _start (/home/milian/projects/src/perf-tests/inlining)
...
# NOTE: only one main entry, at the correct position.
we do display the (repeated) instruction pointer as that ensures
interoperability with e.g. the stackcollapse-perf.pl script

$ perf report -s sym -g srcline -i perf.inlining.data --stdio
...
100.00% 38.86% [.] main
|
|--61.14%--main inlining.cpp:14
| std::norm<double> complex:664 (inlined)
| std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
| std::abs<double> complex:597 (inlined)
| std::__complex_abs complex:589 (inlined)
| |
| |--60.29%--hypot
| | |
| | --56.03%--__hypot_finite
| |
| --0.85%--cabs
|
--38.86%--_start
__libc_start_main
|
|--38.19%--main inlining.cpp:14
| |
| |--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 (inlined)
| | std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 (inlined)
| | |
| | --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 (inlined)
| | |
| | |--17.91%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3332 (inlined)
| | | |
| | | --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() random.h:332 (inlined)
| | | std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> random.h:151 (inlined)
| | | |
| | | |--10.36%--std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:143 (inlined)
| | | |
| | | --1.88%--std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc random.h:141 (inlined)
| | |
| | |--15.68%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 (inlined)
| | |
| | --0.79%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3335 (inlined)
| |
| --1.99%--std::norm<double> complex:664 (inlined)
| std::_Norm_helper<true>::_S_do_it<double> complex:654 (inlined)
| std::abs<double> complex:597 (inlined)
| std::__complex_abs complex:589 (inlined)
|
--0.67%--main inlining.cpp:13
...

# NOTE: still somewhat confusing due to the _start and __libc_start_main frames
that actually are *above* the main frame. But at least the stuff below
properly splits up and shows that mutiple functions got inlined into
inlining.cpp:14, not just one as before.

$ perf report -s sym -g srcline -i perf.inlining.data --stdio --no-children
...
38.86% [.] main
|
|--15.68%--std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3326 (inlined)
| std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() random.h:185 (inlined)
| std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1818 (inlined)
| std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:1809 (inlined)
| main inlining.cpp:14
| __libc_start_main
| _start
...
# NOTE: the first and last entry of the inline stack have the correct symbol and srcline now
both function and srcline is shown, as well as the (inlined) suffix
only the basename of the srcline is shown

v2 fixes some issues reported by Namhyung or found by me in further
testing, adds caching and enables inline frames by default.

Milian Wolff (14):
perf report: remove code to handle inline frames from browsers
perf util: take elf_name as const string in dso__demangle_sym
perf report: create real callchain entries for inlined frames
perf report: fall-back to function name comparison for -g srcline
perf report: mark inlined frames in output by " (inlined)" suffix
perf script: mark inlined frames and do not print DSO for them
perf report: compare symbol name for inlined frames when matching
perf report: compare symbol name for inlined frames when sorting
perf report: properly handle branch count in match_chain
perf report: cache failed lookups of inlined frames
perf report: cache srclines for callchain nodes
perf report: use srcline from callchain for hist entries
perf util: do not consider empty files as valid srclines
perf util: enable handling of inlined frames by default

tools/perf/Documentation/perf-report.txt | 3 +-
tools/perf/Documentation/perf-script.txt | 3 +-
tools/perf/ui/browsers/hists.c | 180 ++------------------
tools/perf/ui/stdio/hist.c | 77 +--------
tools/perf/util/callchain.c | 151 ++++++++---------
tools/perf/util/callchain.h | 6 +-
tools/perf/util/dso.c | 3 +
tools/perf/util/dso.h | 2 +
tools/perf/util/event.c | 1 +
tools/perf/util/evsel_fprintf.c | 37 +----
tools/perf/util/hist.c | 7 +-
tools/perf/util/machine.c | 66 +++++++-
tools/perf/util/sort.c | 6 +
tools/perf/util/sort.h | 1 -
tools/perf/util/srcline.c | 271 +++++++++++++++++++++++++------
tools/perf/util/srcline.h | 26 ++-
tools/perf/util/symbol-elf.c | 2 +-
tools/perf/util/symbol-minimal.c | 2 +-
tools/perf/util/symbol.c | 1 +
tools/perf/util/symbol.h | 4 +-
20 files changed, 426 insertions(+), 423 deletions(-)

--
2.13.3


2017-08-06 21:24:58

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 01/14] perf report: remove code to handle inline frames from browsers

The follow-up commits will make inline frames first-class citizens
in the callchain, thereby obsoleting all of this special code.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/ui/browsers/hists.c | 180 +++-------------------------------------
tools/perf/ui/stdio/hist.c | 77 +----------------
tools/perf/util/evsel_fprintf.c | 32 -------
tools/perf/util/hist.c | 5 --
tools/perf/util/sort.h | 1 -
5 files changed, 13 insertions(+), 282 deletions(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index f4bc2462bc2c..48d056bb3747 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -154,57 +154,9 @@ static void callchain_list__set_folding(struct callchain_list *cl, bool unfold)
cl->unfolded = unfold ? cl->has_children : false;
}

-static struct inline_node *inline_node__create(struct map *map, u64 ip)
-{
- struct dso *dso;
- struct inline_node *node;
-
- if (map == NULL)
- return NULL;
-
- dso = map->dso;
- if (dso == NULL)
- return NULL;
-
- node = dso__parse_addr_inlines(dso,
- map__rip_2objdump(map, ip));
-
- return node;
-}
-
-static int inline__count_rows(struct inline_node *node)
-{
- struct inline_list *ilist;
- int i = 0;
-
- if (node == NULL)
- return 0;
-
- list_for_each_entry(ilist, &node->val, list) {
- if ((ilist->filename != NULL) || (ilist->funcname != NULL))
- i++;
- }
-
- return i;
-}
-
-static int callchain_list__inline_rows(struct callchain_list *chain)
-{
- struct inline_node *node;
- int rows;
-
- node = inline_node__create(chain->ms.map, chain->ip);
- if (node == NULL)
- return 0;
-
- rows = inline__count_rows(node);
- inline_node__delete(node);
- return rows;
-}
-
static int callchain_node__count_rows_rb_tree(struct callchain_node *node)
{
- int n = 0, inline_rows;
+ int n = 0;
struct rb_node *nd;

for (nd = rb_first(&node->rb_root); nd; nd = rb_next(nd)) {
@@ -215,12 +167,6 @@ static int callchain_node__count_rows_rb_tree(struct callchain_node *node)
list_for_each_entry(chain, &child->val, list) {
++n;

- if (symbol_conf.inline_name) {
- inline_rows =
- callchain_list__inline_rows(chain);
- n += inline_rows;
- }
-
/* We need this because we may not have children */
folded_sign = callchain_list__folded(chain);
if (folded_sign == '+')
@@ -272,7 +218,7 @@ static int callchain_node__count_rows(struct callchain_node *node)
{
struct callchain_list *chain;
bool unfolded = false;
- int n = 0, inline_rows;
+ int n = 0;

if (callchain_param.mode == CHAIN_FLAT)
return callchain_node__count_flat_rows(node);
@@ -281,10 +227,6 @@ static int callchain_node__count_rows(struct callchain_node *node)

list_for_each_entry(chain, &node->val, list) {
++n;
- if (symbol_conf.inline_name) {
- inline_rows = callchain_list__inline_rows(chain);
- n += inline_rows;
- }

unfolded = chain->unfolded;
}
@@ -432,19 +374,6 @@ static void hist_entry__init_have_children(struct hist_entry *he)
he->init_have_children = true;
}

-static void hist_entry_init_inline_node(struct hist_entry *he)
-{
- if (he->inline_node)
- return;
-
- he->inline_node = inline_node__create(he->ms.map, he->ip);
-
- if (he->inline_node == NULL)
- return;
-
- he->has_children = true;
-}
-
static bool hist_browser__toggle_fold(struct hist_browser *browser)
{
struct hist_entry *he = browser->he_selection;
@@ -476,12 +405,8 @@ static bool hist_browser__toggle_fold(struct hist_browser *browser)

if (he->unfolded) {
if (he->leaf)
- if (he->inline_node)
- he->nr_rows = inline__count_rows(
- he->inline_node);
- else
- he->nr_rows = callchain__count_rows(
- &he->sorted_chain);
+ he->nr_rows = callchain__count_rows(
+ &he->sorted_chain);
else
he->nr_rows = hierarchy_count_rows(browser, he, false);

@@ -841,71 +766,6 @@ static bool hist_browser__check_dump_full(struct hist_browser *browser __maybe_u

#define LEVEL_OFFSET_STEP 3

-static int hist_browser__show_inline(struct hist_browser *browser,
- struct inline_node *node,
- unsigned short row,
- int offset)
-{
- struct inline_list *ilist;
- char buf[1024];
- int color, width, first_row;
-
- first_row = row;
- width = browser->b.width - (LEVEL_OFFSET_STEP + 2);
- list_for_each_entry(ilist, &node->val, list) {
- if ((ilist->filename != NULL) || (ilist->funcname != NULL)) {
- color = HE_COLORSET_NORMAL;
- if (ui_browser__is_current_entry(&browser->b, row))
- color = HE_COLORSET_SELECTED;
-
- if (callchain_param.key == CCKEY_ADDRESS ||
- callchain_param.key == CCKEY_SRCLINE) {
- if (ilist->filename != NULL)
- scnprintf(buf, sizeof(buf),
- "%s:%d (inline)",
- ilist->filename,
- ilist->line_nr);
- else
- scnprintf(buf, sizeof(buf), "??");
- } else if (ilist->funcname != NULL)
- scnprintf(buf, sizeof(buf), "%s (inline)",
- ilist->funcname);
- else if (ilist->filename != NULL)
- scnprintf(buf, sizeof(buf),
- "%s:%d (inline)",
- ilist->filename,
- ilist->line_nr);
- else
- scnprintf(buf, sizeof(buf), "??");
-
- ui_browser__set_color(&browser->b, color);
- hist_browser__gotorc(browser, row, 0);
- ui_browser__write_nstring(&browser->b, " ",
- LEVEL_OFFSET_STEP + offset);
- ui_browser__write_nstring(&browser->b, buf, width);
- row++;
- }
- }
-
- return row - first_row;
-}
-
-static size_t show_inline_list(struct hist_browser *browser, struct map *map,
- u64 ip, int row, int offset)
-{
- struct inline_node *node;
- int ret;
-
- node = inline_node__create(map, ip);
- if (node == NULL)
- return 0;
-
- ret = hist_browser__show_inline(browser, node, row, offset);
-
- inline_node__delete(node);
- return ret;
-}
-
static int hist_browser__show_callchain_list(struct hist_browser *browser,
struct callchain_node *node,
struct callchain_list *chain,
@@ -917,7 +777,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
char bf[1024], *alloc_str;
char buf[64], *alloc_str2;
const char *str;
- int inline_rows = 0, ret = 1;
+ int ret = 1;

if (arg->row_offset != 0) {
arg->row_offset--;
@@ -958,12 +818,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
free(alloc_str);
free(alloc_str2);

- if (symbol_conf.inline_name) {
- inline_rows = show_inline_list(browser, chain->ms.map,
- chain->ip, row + 1, offset);
- }
-
- return ret + inline_rows;
+ return ret;
}

static bool check_percent_display(struct rb_node *node, u64 parent_total)
@@ -1387,12 +1242,6 @@ static int hist_browser__show_entry(struct hist_browser *browser,
folded_sign = hist_entry__folded(entry);
}

- if (symbol_conf.inline_name &&
- (!entry->has_children)) {
- hist_entry_init_inline_node(entry);
- folded_sign = hist_entry__folded(entry);
- }
-
if (row_offset == 0) {
struct hpp_arg arg = {
.b = &browser->b,
@@ -1424,8 +1273,7 @@ static int hist_browser__show_entry(struct hist_browser *browser,
}

if (first) {
- if (symbol_conf.use_callchain ||
- symbol_conf.inline_name) {
+ if (symbol_conf.use_callchain) {
ui_browser__printf(&browser->b, "%c ", folded_sign);
width -= 2;
}
@@ -1467,15 +1315,11 @@ static int hist_browser__show_entry(struct hist_browser *browser,
.is_current_entry = current_entry,
};

- if (entry->inline_node)
- printed += hist_browser__show_inline(browser,
- entry->inline_node, row, 0);
- else
- printed += hist_browser__show_callchain(browser,
- entry, 1, row,
- hist_browser__show_callchain_entry,
- &arg,
- hist_browser__check_output_full);
+ printed += hist_browser__show_callchain(browser,
+ entry, 1, row,
+ hist_browser__show_callchain_entry,
+ &arg,
+ hist_browser__check_output_full);
}

return printed;
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 5c95b8301c67..8585bc628e0c 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -21,64 +21,6 @@ static size_t callchain__fprintf_left_margin(FILE *fp, int left_margin)
return ret;
}

-static size_t inline__fprintf(struct map *map, u64 ip, int left_margin,
- int depth, int depth_mask, FILE *fp)
-{
- struct dso *dso;
- struct inline_node *node;
- struct inline_list *ilist;
- int ret = 0, i;
-
- if (map == NULL)
- return 0;
-
- dso = map->dso;
- if (dso == NULL)
- return 0;
-
- node = dso__parse_addr_inlines(dso,
- map__rip_2objdump(map, ip));
- if (node == NULL)
- return 0;
-
- list_for_each_entry(ilist, &node->val, list) {
- if ((ilist->filename != NULL) || (ilist->funcname != NULL)) {
- ret += callchain__fprintf_left_margin(fp, left_margin);
-
- for (i = 0; i < depth; i++) {
- if (depth_mask & (1 << i))
- ret += fprintf(fp, "|");
- else
- ret += fprintf(fp, " ");
- ret += fprintf(fp, " ");
- }
-
- if (callchain_param.key == CCKEY_ADDRESS ||
- callchain_param.key == CCKEY_SRCLINE) {
- if (ilist->filename != NULL)
- ret += fprintf(fp, "%s:%d (inline)",
- ilist->filename,
- ilist->line_nr);
- else
- ret += fprintf(fp, "??");
- } else if (ilist->funcname != NULL)
- ret += fprintf(fp, "%s (inline)",
- ilist->funcname);
- else if (ilist->filename != NULL)
- ret += fprintf(fp, "%s:%d (inline)",
- ilist->filename,
- ilist->line_nr);
- else
- ret += fprintf(fp, "??");
-
- ret += fprintf(fp, "\n");
- }
- }
-
- inline_node__delete(node);
- return ret;
-}
-
static size_t ipchain__fprintf_graph_line(FILE *fp, int depth, int depth_mask,
int left_margin)
{
@@ -141,9 +83,6 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
fputc('\n', fp);
free(alloc_str);

- if (symbol_conf.inline_name)
- ret += inline__fprintf(chain->ms.map, chain->ip,
- left_margin, depth, depth_mask, fp);
return ret;
}

@@ -318,13 +257,6 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,

if (++entries_printed == callchain_param.print_limit)
break;
-
- if (symbol_conf.inline_name)
- ret += inline__fprintf(chain->ms.map,
- chain->ip,
- left_margin,
- 0, 0,
- fp);
}
root = &cnode->rb_root;
}
@@ -604,7 +536,6 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
{
int ret;
int callchain_ret = 0;
- int inline_ret = 0;
struct perf_hpp hpp = {
.buf = bf,
.size = size,
@@ -626,13 +557,7 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
callchain_ret = hist_entry_callchain__fprintf(he, total_period,
0, fp);

- if (callchain_ret == 0 && symbol_conf.inline_name) {
- inline_ret = inline__fprintf(he->ms.map, he->ip, 0, 0, 0, fp);
- ret += inline_ret;
- if (inline_ret > 0)
- ret += fprintf(fp, "\n");
- } else
- ret += callchain_ret;
+ ret += callchain_ret;

return ret;
}
diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index 583f3a602506..f2c6c5ee11e8 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -169,38 +169,6 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
if (!print_oneline)
printed += fprintf(fp, "\n");

- if (symbol_conf.inline_name && node->map) {
- struct inline_node *inode;
-
- addr = map__rip_2objdump(node->map, node->ip),
- inode = dso__parse_addr_inlines(node->map->dso, addr);
-
- if (inode) {
- struct inline_list *ilist;
-
- list_for_each_entry(ilist, &inode->val, list) {
- if (print_arrow)
- printed += fprintf(fp, " <-");
-
- /* IP is same, just skip it */
- if (print_ip)
- printed += fprintf(fp, "%c%16s",
- s, "");
- if (print_sym)
- printed += fprintf(fp, " %s",
- ilist->funcname);
- if (print_srcline)
- printed += fprintf(fp, "\n %s:%d",
- ilist->filename,
- ilist->line_nr);
- if (!print_oneline)
- printed += fprintf(fp, "\n");
- }
-
- inline_node__delete(inode);
- }
- }
-
if (symbol_conf.bt_stop_list &&
node->sym &&
strlist__has_entry(symbol_conf.bt_stop_list,
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 9453b2e27015..043a54871276 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1137,11 +1137,6 @@ void hist_entry__delete(struct hist_entry *he)
zfree(&he->mem_info);
}

- if (he->inline_node) {
- inline_node__delete(he->inline_node);
- he->inline_node = NULL;
- }
-
zfree(&he->stat_acc);
free_srcline(he->srcline);
if (he->srcfile && he->srcfile[0])
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index b7c75597e18f..2f41416f01fa 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -129,7 +129,6 @@ struct hist_entry {
};
char *srcline;
char *srcfile;
- struct inline_node *inline_node;
struct symbol *parent;
struct branch_info *branch_info;
struct hists *hists;
--
2.13.3

2017-08-06 21:25:11

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 03/14] perf report: create real callchain entries for inlined frames

The inlined frames use a fake symbol that is tracked in a special
map inside the dso, which is always sorted by name. All other
entries of the symbol beside the function name are unused for
inline frames. The advantage of this approach is that all existing
users of the callchain API can now transparently display inlined
frames without having to patch their code.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/util/callchain.c | 31 +++-----
tools/perf/util/callchain.h | 6 +-
tools/perf/util/dso.c | 2 +
tools/perf/util/dso.h | 1 +
tools/perf/util/machine.c | 56 +++++++++++++-
tools/perf/util/srcline.c | 183 ++++++++++++++++++++++++++++++++++----------
tools/perf/util/srcline.h | 19 ++++-
tools/perf/util/symbol.h | 1 +
8 files changed, 230 insertions(+), 69 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index f320b0777e0d..9854adb06ac1 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -559,6 +559,7 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
call->ip = cursor_node->ip;
call->ms.sym = cursor_node->sym;
call->ms.map = map__get(cursor_node->map);
+ call->srcline = cursor_node->srcline;

if (cursor_node->branch) {
call->branch_count = 1;
@@ -640,20 +641,11 @@ enum match_result {
static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
struct callchain_list *cnode)
{
- char *left = NULL;
- char *right = NULL;
+ const char *left = cnode->srcline;
+ const char *right = node->srcline;
enum match_result ret = MATCH_EQ;
int cmp;

- if (cnode->ms.map)
- left = get_srcline(cnode->ms.map->dso,
- map__rip_2objdump(cnode->ms.map, cnode->ip),
- cnode->ms.sym, true, false);
- if (node->map)
- right = get_srcline(node->map->dso,
- map__rip_2objdump(node->map, node->ip),
- node->sym, true, false);
-
if (left && right)
cmp = strcmp(left, right);
else if (!left && right)
@@ -668,8 +660,6 @@ static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
if (cmp != 0)
ret = cmp < 0 ? MATCH_LT : MATCH_GT;

- free_srcline(left);
- free_srcline(right);
return ret;
}

@@ -958,7 +948,7 @@ merge_chain_branch(struct callchain_cursor *cursor,
list_for_each_entry_safe(list, next_list, &src->val, list) {
callchain_cursor_append(cursor, list->ip,
list->ms.map, list->ms.sym,
- false, NULL, 0, 0, 0);
+ false, NULL, 0, 0, 0, list->srcline);
list_del(&list->list);
map__zput(list->ms.map);
free(list);
@@ -998,7 +988,8 @@ int callchain_merge(struct callchain_cursor *cursor,
int callchain_cursor_append(struct callchain_cursor *cursor,
u64 ip, struct map *map, struct symbol *sym,
bool branch, struct branch_flags *flags,
- int nr_loop_iter, int samples, u64 branch_from)
+ int nr_loop_iter, int samples, u64 branch_from,
+ const char *srcline)
{
struct callchain_cursor_node *node = *cursor->last;

@@ -1017,6 +1008,7 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
node->branch = branch;
node->nr_loop_iter = nr_loop_iter;
node->samples = samples;
+ node->srcline = srcline;

if (flags)
memcpy(&node->branch_flags, flags,
@@ -1104,12 +1096,7 @@ char *callchain_list__sym_name(struct callchain_list *cl,
int printed;

if (cl->ms.sym) {
- if (show_srcline && cl->ms.map && !cl->srcline)
- cl->srcline = get_srcline(cl->ms.map->dso,
- map__rip_2objdump(cl->ms.map,
- cl->ip),
- cl->ms.sym, false, show_addr);
- if (cl->srcline)
+ if (show_srcline && cl->srcline)
printed = scnprintf(bf, bfsize, "%s %s",
cl->ms.sym->name, cl->srcline);
else
@@ -1524,7 +1511,7 @@ int callchain_cursor__copy(struct callchain_cursor *dst,
rc = callchain_cursor_append(dst, node->ip, node->map, node->sym,
node->branch, &node->branch_flags,
node->nr_loop_iter, node->samples,
- node->branch_from);
+ node->branch_from, node->srcline);
if (rc)
break;

diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 97738201464a..bf81b56f34c3 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -121,7 +121,7 @@ struct callchain_list {
u64 iter_count;
u64 samples_count;
struct branch_type_stat brtype_stat;
- char *srcline;
+ const char *srcline;
struct list_head list;
};

@@ -135,6 +135,7 @@ struct callchain_cursor_node {
u64 ip;
struct map *map;
struct symbol *sym;
+ const char *srcline;
bool branch;
struct branch_flags branch_flags;
u64 branch_from;
@@ -201,7 +202,8 @@ static inline void callchain_cursor_reset(struct callchain_cursor *cursor)
int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
struct map *map, struct symbol *sym,
bool branch, struct branch_flags *flags,
- int nr_loop_iter, int samples, u64 branch_from);
+ int nr_loop_iter, int samples, u64 branch_from,
+ const char *srcline);

/* Close a cursor writing session. Initialize for the reader */
static inline void callchain_cursor_commit(struct callchain_cursor *cursor)
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index b9e087fb8247..72e6e390fd26 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -9,6 +9,7 @@
#include "compress.h"
#include "path.h"
#include "symbol.h"
+#include "srcline.h"
#include "dso.h"
#include "machine.h"
#include "auxtrace.h"
@@ -1233,6 +1234,7 @@ void dso__delete(struct dso *dso)
dso->long_name);
for (i = 0; i < MAP__NR_TYPES; ++i)
symbols__delete(&dso->symbols[i]);
+ inlines__tree_delete(&dso->inlined_nodes);

if (dso->short_name_allocated) {
zfree((char **)&dso->short_name);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index f886141678eb..7d1e2b3c1f10 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -141,6 +141,7 @@ struct dso {
struct rb_root *root; /* root of rbtree that rb_node is in */
struct rb_root symbols[MAP__NR_TYPES];
struct rb_root symbol_names[MAP__NR_TYPES];
+ struct rb_root inlined_nodes;
struct {
u64 addr;
struct symbol *symbol;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index d4df353051af..a7f8499c8756 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1673,6 +1673,15 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
return mi;
}

+static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
+{
+ if (!map || callchain_param.key == CCKEY_FUNCTION)
+ return NULL;
+
+ return get_srcline(map->dso, map__rip_2objdump(map, ip),
+ sym, false, callchain_param.key == CCKEY_ADDRESS);
+}
+
static int add_callchain_ip(struct thread *thread,
struct callchain_cursor *cursor,
struct symbol **parent,
@@ -1686,6 +1695,7 @@ static int add_callchain_ip(struct thread *thread,
u64 branch_from)
{
struct addr_location al;
+ const char *srcline = NULL;

al.filtered = 0;
al.sym = NULL;
@@ -1735,9 +1745,11 @@ static int add_callchain_ip(struct thread *thread,

if (symbol_conf.hide_unresolved && al.sym == NULL)
return 0;
+
+ srcline = callchain_srcline(al.map, al.sym, al.addr);
return callchain_cursor_append(cursor, al.addr, al.map, al.sym,
branch, flags, nr_loop_iter, samples,
- branch_from);
+ branch_from, srcline);
}

struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
@@ -2044,15 +2056,55 @@ static int thread__resolve_callchain_sample(struct thread *thread,
return 0;
}

+static int append_inlines(struct callchain_cursor *cursor,
+ struct map *map, struct symbol *sym, u64 ip)
+{
+ struct inline_node *inline_node;
+ struct inline_list *ilist;
+ u64 addr;
+
+ if (!symbol_conf.inline_name || !map || !sym)
+ return 1;
+
+ addr = map__rip_2objdump(map, ip);
+
+ inline_node = inlines__tree_find(&map->dso->inlined_nodes, addr);
+ if (!inline_node) {
+ inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
+ if (!inline_node)
+ return 1;
+
+ inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
+ }
+
+ list_for_each_entry(ilist, &inline_node->val, list) {
+ int ret = callchain_cursor_append(cursor, ip, map,
+ ilist->symbol, false,
+ NULL, 0, 0, 0,
+ ilist->srcline);
+
+ if (ret != 0)
+ return ret;
+ }
+
+ return 0;
+}
+
static int unwind_entry(struct unwind_entry *entry, void *arg)
{
struct callchain_cursor *cursor = arg;
+ const char *srcline = NULL;

if (symbol_conf.hide_unresolved && entry->sym == NULL)
return 0;
+
+ if (append_inlines(cursor, entry->map, entry->sym, entry->ip) == 0)
+ return 0;
+
+ srcline = callchain_srcline(entry->map, entry->sym, entry->ip);
return callchain_cursor_append(cursor, entry->ip,
entry->map, entry->sym,
- false, NULL, 0, 0, 0);
+ false, NULL, 0, 0, 0, srcline);
}

static int thread__resolve_callchain_unwind(struct thread *thread,
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index ebc88a74e67b..a1fdf035d1dd 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -33,28 +33,17 @@ static const char *dso__name(struct dso *dso)
return dso_name;
}

-static int inline_list__append(char *filename, char *funcname, int line_nr,
- struct inline_node *node, struct dso *dso)
+static int inline_list__append(struct symbol *symbol, char *srcline,
+ struct inline_node *node)
{
struct inline_list *ilist;
- char *demangled;

ilist = zalloc(sizeof(*ilist));
if (ilist == NULL)
return -1;

- ilist->filename = filename;
- ilist->line_nr = line_nr;
-
- if (dso != NULL) {
- demangled = dso__demangle_sym(dso, 0, funcname);
- if (demangled == NULL) {
- ilist->funcname = funcname;
- } else {
- ilist->funcname = demangled;
- free(funcname);
- }
- }
+ ilist->symbol = symbol;
+ ilist->srcline = srcline;

if (callchain_param.order == ORDER_CALLEE)
list_add_tail(&ilist->list, &node->val);
@@ -64,6 +53,30 @@ static int inline_list__append(char *filename, char *funcname, int line_nr,
return 0;
}

+// basename version that takes a const input string
+static const char *gnu_basename(const char *path)
+{
+ const char *base = strrchr(path, '/');
+
+ return base ? base + 1 : path;
+}
+
+static char *srcline_from_fileline(const char *file, unsigned int line)
+{
+ char *srcline;
+
+ if (!file)
+ return NULL;
+
+ if (!srcline_full_filename)
+ file = gnu_basename(file);
+
+ if (asprintf(&srcline, "%s:%u", file, line) < 0)
+ return NULL;
+
+ return srcline;
+}
+
#ifdef HAVE_LIBBFD_SUPPORT

/*
@@ -203,19 +216,55 @@ static void addr2line_cleanup(struct a2l_data *a2l)

#define MAX_INLINE_NEST 1024

+static struct symbol *new_inline_sym(struct dso *dso,
+ struct symbol *base_sym,
+ const char *funcname)
+{
+ struct symbol *inline_sym;
+ char *demangled = NULL;
+
+ if (dso) {
+ demangled = dso__demangle_sym(dso, 0, funcname);
+ if (demangled)
+ funcname = demangled;
+ }
+
+ if (strcmp(funcname, base_sym->name) == 0) {
+ // reuse the real, existing symbol
+ inline_sym = base_sym;
+ } else {
+ // create a fake symbol for the inline frame
+ inline_sym = symbol__new(base_sym ? base_sym->start : 0,
+ base_sym ? base_sym->end : 0,
+ base_sym ? base_sym->binding : 0,
+ funcname);
+ if (inline_sym)
+ inline_sym->inlined = 1;
+ }
+
+ free(demangled);
+
+ return inline_sym;
+}
+
static int inline_list__append_dso_a2l(struct dso *dso,
- struct inline_node *node)
+ struct inline_node *node,
+ struct symbol *sym)
{
struct a2l_data *a2l = dso->a2l;
- char *funcname = a2l->funcname ? strdup(a2l->funcname) : NULL;
- char *filename = a2l->filename ? strdup(a2l->filename) : NULL;
+ struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
+ char *srcline = NULL;
+
+ if (a2l->filename)
+ srcline = srcline_from_fileline(a2l->filename, a2l->line);

- return inline_list__append(filename, funcname, a2l->line, node, dso);
+ return inline_list__append(inline_sym, srcline, node);
}

static int addr2line(const char *dso_name, u64 addr,
char **file, unsigned int *line, struct dso *dso,
- bool unwind_inlines, struct inline_node *node)
+ bool unwind_inlines, struct inline_node *node,
+ struct symbol *sym)
{
int ret = 0;
struct a2l_data *a2l = dso->a2l;
@@ -241,7 +290,7 @@ static int addr2line(const char *dso_name, u64 addr,
if (unwind_inlines) {
int cnt = 0;

- if (node && inline_list__append_dso_a2l(dso, node))
+ if (node && inline_list__append_dso_a2l(dso, node, sym))
return 0;

while (bfd_find_inliner_info(a2l->abfd, &a2l->filename,
@@ -249,7 +298,7 @@ static int addr2line(const char *dso_name, u64 addr,
cnt++ < MAX_INLINE_NEST) {

if (node != NULL) {
- if (inline_list__append_dso_a2l(dso, node))
+ if (inline_list__append_dso_a2l(dso, node, sym))
return 0;
// found at least one inline frame
ret = 1;
@@ -281,7 +330,7 @@ void dso__free_a2l(struct dso *dso)
}

static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
- struct dso *dso)
+ struct dso *dso, struct symbol *sym)
{
struct inline_node *node;

@@ -294,7 +343,7 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
INIT_LIST_HEAD(&node->val);
node->addr = addr;

- if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node))
+ if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node, sym))
goto out_free_inline_node;

if (list_empty(&node->val))
@@ -334,7 +383,8 @@ static int addr2line(const char *dso_name, u64 addr,
char **file, unsigned int *line_nr,
struct dso *dso __maybe_unused,
bool unwind_inlines __maybe_unused,
- struct inline_node *node __maybe_unused)
+ struct inline_node *node __maybe_unused,
+ struct symbol *sym __maybe_unused)
{
FILE *fp;
char cmd[PATH_MAX];
@@ -374,7 +424,7 @@ void dso__free_a2l(struct dso *dso __maybe_unused)
}

static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
- struct dso *dso __maybe_unused)
+ struct dso *dso __maybe_unused, struct symbol *sym)
{
FILE *fp;
char cmd[PATH_MAX];
@@ -402,13 +452,15 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
node->addr = addr;

while (getline(&filename, &len, fp) != -1) {
+ char *srcline;
+
if (filename_split(filename, &line_nr) != 1) {
free(filename);
goto out;
}

- if (inline_list__append(filename, NULL, line_nr, node,
- NULL) != 0)
+ srcline = srcline_from_fileline(filename, line_nr);
+ if (inline_list__append(sym, srcline, node) != 0)
goto out;

filename = NULL;
@@ -448,19 +500,18 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
if (dso_name == NULL)
goto out;

- if (!addr2line(dso_name, addr, &file, &line, dso, unwind_inlines, NULL))
+ if (!addr2line(dso_name, addr, &file, &line, dso,
+ unwind_inlines, NULL, sym))
goto out;

- if (asprintf(&srcline, "%s:%u",
- srcline_full_filename ? file : basename(file),
- line) < 0) {
- free(file);
+ srcline = srcline_from_fileline(file, line);
+ free(file);
+
+ if (!srcline)
goto out;
- }

dso->a2l_fails = 0;

- free(file);
return srcline;

out:
@@ -494,7 +545,8 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
}

-struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr)
+struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
+ struct symbol *sym)
{
const char *dso_name;

@@ -502,7 +554,7 @@ struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr)
if (dso_name == NULL)
return NULL;

- return addr2inlines(dso_name, addr, dso);
+ return addr2inlines(dso_name, addr, dso, sym);
}

void inline_node__delete(struct inline_node *node)
@@ -511,10 +563,63 @@ void inline_node__delete(struct inline_node *node)

list_for_each_entry_safe(ilist, tmp, &node->val, list) {
list_del_init(&ilist->list);
- zfree(&ilist->filename);
- zfree(&ilist->funcname);
+ zfree(&ilist->srcline);
+ // only the inlined symbols are owned by the list
+ if (ilist->symbol && ilist->symbol->inlined)
+ symbol__delete(ilist->symbol);
free(ilist);
}

free(node);
}
+
+void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines)
+{
+ struct rb_node **p = &tree->rb_node;
+ struct rb_node *parent = NULL;
+ const u64 addr = inlines->addr;
+ struct inline_node *i;
+
+ while (*p != NULL) {
+ parent = *p;
+ i = rb_entry(parent, struct inline_node, rb_node);
+ if (addr < i->addr)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+ }
+ rb_link_node(&inlines->rb_node, parent, p);
+ rb_insert_color(&inlines->rb_node, tree);
+}
+
+struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr)
+{
+ struct rb_node *n = tree->rb_node;
+
+ while (n) {
+ struct inline_node *i = rb_entry(n, struct inline_node,
+ rb_node);
+
+ if (addr < i->addr)
+ n = n->rb_left;
+ else if (addr > i->addr)
+ n = n->rb_right;
+ else
+ return i;
+ }
+
+ return NULL;
+}
+
+void inlines__tree_delete(struct rb_root *tree)
+{
+ struct inline_node *pos;
+ struct rb_node *next = rb_first(tree);
+
+ while (next) {
+ pos = rb_entry(next, struct inline_node, rb_node);
+ next = rb_next(&pos->rb_node);
+ rb_erase(&pos->rb_node, tree);
+ inline_node__delete(pos);
+ }
+}
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 7b52ba88676e..0d2aca92e8c7 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -2,6 +2,7 @@
#define PERF_SRCLINE_H

#include <linux/list.h>
+#include <linux/rbtree.h>
#include <linux/types.h>

struct dso;
@@ -17,18 +18,28 @@ void free_srcline(char *srcline);
#define SRCLINE_UNKNOWN ((char *) "??:0")

struct inline_list {
- char *filename;
- char *funcname;
- unsigned int line_nr;
+ struct symbol *symbol;
+ char *srcline;
struct list_head list;
};

struct inline_node {
u64 addr;
struct list_head val;
+ struct rb_node rb_node;
};

-struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr);
+// parse inlined frames for the given address
+struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
+ struct symbol *sym);
+// free resources associated to the inline node list
void inline_node__delete(struct inline_node *node);

+// insert the inline node list into the DSO, which will take ownership
+void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines);
+// find previously inserted inline node list
+struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr);
+// delete all nodes within the tree of inline_node s
+void inlines__tree_delete(struct rb_root *tree);
+
#endif /* PERF_SRCLINE_H */
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index f0b08810d7fa..b358570ce615 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -59,6 +59,7 @@ struct symbol {
u8 binding;
u8 idle:1;
u8 ignore:1;
+ u8 inlined:1;
u8 arch_sym;
char name[0];
};
--
2.13.3

2017-08-06 21:25:14

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 10/14] perf report: cache failed lookups of inlined frames

When no inlined frames could be found for a given address,
we did not store this information anywhere. That means we
potentially do the costly inliner lookup repeatedly for
cases where we know it can never succeed.

This patch makes dso__parse_addr_inlines always return a
valid inline_node. It will be empty when no inliners are
found. This enables us to cache the empty list in the DSO,
thereby improving the performance when many addresses
fail to find the inliners.

For my trivial example, the performance impact is already
quite significant:

Before:

~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):

594.804032 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% )
53 context-switches # 0.089 K/sec ( +- 4.09% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
5,687 page-faults # 0.010 M/sec ( +- 0.02% )
2,300,918,213 cycles # 3.868 GHz ( +- 0.09% )
4,395,839,080 instructions # 1.91 insn per cycle ( +- 0.00% )
939,177,205 branches # 1578.969 M/sec ( +- 0.00% )
11,824,633 branch-misses # 1.26% of all branches ( +- 0.10% )

0.596246531 seconds time elapsed ( +- 0.07% )
~~~~~

After:

~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):

113.111405 task-clock (msec) # 0.990 CPUs utilized ( +- 0.89% )
29 context-switches # 0.255 K/sec ( +- 54.25% )
0 cpu-migrations # 0.000 K/sec
5,380 page-faults # 0.048 M/sec ( +- 0.01% )
432,378,779 cycles # 3.823 GHz ( +- 0.75% )
670,057,633 instructions # 1.55 insn per cycle ( +- 0.01% )
141,001,247 branches # 1246.570 M/sec ( +- 0.01% )
2,346,845 branch-misses # 1.66% of all branches ( +- 0.19% )

0.114222393 seconds time elapsed ( +- 1.19% )
~~~~~

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/util/machine.c | 15 +++++++--------
tools/perf/util/srcline.c | 18 ++----------------
2 files changed, 9 insertions(+), 24 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a7f8499c8756..479c42450d6a 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2062,9 +2062,10 @@ static int append_inlines(struct callchain_cursor *cursor,
struct inline_node *inline_node;
struct inline_list *ilist;
u64 addr;
+ int ret = 1;

if (!symbol_conf.inline_name || !map || !sym)
- return 1;
+ return ret;

addr = map__rip_2objdump(map, ip);

@@ -2072,22 +2073,20 @@ static int append_inlines(struct callchain_cursor *cursor,
if (!inline_node) {
inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
if (!inline_node)
- return 1;
-
+ return ret;
inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
}

list_for_each_entry(ilist, &inline_node->val, list) {
- int ret = callchain_cursor_append(cursor, ip, map,
- ilist->symbol, false,
- NULL, 0, 0, 0,
- ilist->srcline);
+ ret = callchain_cursor_append(cursor, ip, map,
+ ilist->symbol, false,
+ NULL, 0, 0, 0, ilist->srcline);

if (ret != 0)
return ret;
}

- return 0;
+ return ret;
}

static int unwind_entry(struct unwind_entry *entry, void *arg)
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index a1fdf035d1dd..01b4d5ee51fd 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -343,17 +343,8 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
INIT_LIST_HEAD(&node->val);
node->addr = addr;

- if (!addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node, sym))
- goto out_free_inline_node;
-
- if (list_empty(&node->val))
- goto out_free_inline_node;
-
- return node;
-
-out_free_inline_node:
- inline_node__delete(node);
- return NULL;
+ addr2line(dso_name, addr, NULL, NULL, dso, TRUE, node, sym);
+ return node;
}

#else /* HAVE_LIBBFD_SUPPORT */
@@ -469,11 +460,6 @@ static struct inline_node *addr2inlines(const char *dso_name, u64 addr,
out:
pclose(fp);

- if (list_empty(&node->val)) {
- inline_node__delete(node);
- return NULL;
- }
-
return node;
}

--
2.13.3

2017-08-06 21:25:21

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 12/14] perf report: use srcline from callchain for hist entries

This also removes the symbol name from the srcline column,
more on this below.

This ensures we use the correct srcline, which could originate
from a potentially inlined function. The hist entries used to
query for the srcline based purely on the IP, which leads to
wrong results for inlined entries.

Before:

~~~~~
perf report --inline -s srcline -g none --stdio
...
# Children Self Source:Line
# ........ ........ ..................................................................................................................................
#
94.23% 0.00% __libc_start_main+18446603487898210537
94.23% 0.00% _start+41
44.58% 0.00% main+100
44.58% 0.00% std::_Norm_helper<true>::_S_do_it<double>+100
44.58% 0.00% std::__complex_abs+100
44.58% 0.00% std::abs<double>+100
44.58% 0.00% std::norm<double>+100
36.01% 0.00% hypot+18446603487892193300
25.81% 0.00% main+41
25.81% 0.00% std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
25.81% 0.00% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
25.75% 25.75% random.h:143
18.39% 0.00% main+57
18.39% 0.00% std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
18.39% 0.00% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
13.80% 13.80% random.tcc:3330
5.64% 0.00% ??:0
4.13% 4.13% __hypot_finite+163
4.13% 0.00% __hypot_finite+18446603487892193443
...
~~~~~

After:

~~~~~
perf report --inline -s srcline -g none --stdio
...
# Children Self Source:Line
# ........ ........ ...........................................
#
94.30% 1.19% main.cpp:39
94.23% 0.00% __libc_start_main+18446603487898210537
94.23% 0.00% _start+41
48.44% 1.70% random.h:1823
48.44% 0.00% random.h:1814
46.74% 2.53% random.h:185
44.68% 0.10% complex:589
44.68% 0.00% complex:597
44.68% 0.00% complex:654
44.68% 0.00% complex:664
40.61% 13.80% random.tcc:3330
36.01% 0.00% hypot+18446603487892193300
26.81% 0.00% random.h:151
26.81% 0.00% random.h:332
25.75% 25.75% random.h:143
5.64% 0.00% ??:0
4.13% 4.13% __hypot_finite+163
4.13% 0.00% __hypot_finite+18446603487892193443
...
~~~~~

Note that this change removes the symbol from the source:line
hist column. If this information is desired, users should
explicitly query for it if needed. I.e. run this command
instead:

~~~~~
perf report --inline -s sym,srcline -g none --stdio
...
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:uppp'
# Event count (approx.): 1381229476
#
# Children Self Symbol Source:Line
# ........ ........ ................................................................................................................................... ...........................................
#
94.30% 1.19% [.] main main.cpp:39
94.23% 0.00% [.] __libc_start_main __libc_start_main+18446603487898210537
94.23% 0.00% [.] _start _start+41
48.44% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) random.h:1814
48.44% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) random.h:1823
46.74% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) random.h:185
44.68% 0.00% [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) complex:654
44.68% 0.00% [.] std::__complex_abs (inlined) complex:589
44.68% 0.00% [.] std::abs<double> (inlined) complex:597
44.68% 0.00% [.] std::norm<double> (inlined) complex:664
39.80% 13.59% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3330
36.01% 0.00% [.] hypot hypot+18446603487892193300
26.81% 0.00% [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) random.h:151
26.81% 0.00% [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) random.h:332
25.75% 0.00% [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) random.h:143
25.19% 25.19% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:143
4.13% 4.13% [.] __hypot_finite __hypot_finite+163
4.13% 0.00% [.] __hypot_finite __hypot_finite+18446603487892193443
...
~~~~~

Compared to the old behavior, this reduces duplication in the output.
Before we used to print the symbol name in the srcline column even
when the sym column was explicitly requested. I.e. the output was:

~~~~~
perf report --inline -s sym,srcline -g none --stdio
...
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'cycles:uppp'
# Event count (approx.): 1381229476
#
# Children Self Symbol Source:Line
# ........ ........ ................................................................................................................................... ..................................................................................................................................
#
94.23% 0.00% [.] __libc_start_main __libc_start_main+18446603487898210537
94.23% 0.00% [.] _start _start+41
44.58% 0.00% [.] main main+100
44.58% 0.00% [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) std::_Norm_helper<true>::_S_do_it<double>+100
44.58% 0.00% [.] std::__complex_abs (inlined) std::__complex_abs+100
44.58% 0.00% [.] std::abs<double> (inlined) std::abs<double>+100
44.58% 0.00% [.] std::norm<double> (inlined) std::norm<double>+100
36.01% 0.00% [.] hypot hypot+18446603487892193300
25.81% 0.00% [.] main main+41
25.81% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
25.81% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
25.69% 25.69% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:143
18.39% 0.00% [.] main main+57
18.39% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
18.39% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
13.80% 13.80% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3330
4.13% 4.13% [.] __hypot_finite __hypot_finite+163
4.13% 0.00% [.] __hypot_finite __hypot_finite+18446603487892193443
...
~~~~~

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/util/callchain.c | 1 +
tools/perf/util/event.c | 1 +
tools/perf/util/hist.c | 2 ++
tools/perf/util/symbol.h | 1 +
4 files changed, 5 insertions(+)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 2c85adc1a3e2..20e85be27f13 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1064,6 +1064,7 @@ int fill_callchain_info(struct addr_location *al, struct callchain_cursor_node *
{
al->map = node->map;
al->sym = node->sym;
+ al->srcline = node->srcline;
if (node->map)
al->addr = node->map->map_ip(node->map, node->ip);
else
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 1c905ba3641b..9518ad2ef351 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1497,6 +1497,7 @@ int machine__resolve(struct machine *machine, struct addr_location *al,
al->sym = NULL;
al->cpu = sample->cpu;
al->socket = -1;
+ al->srcline = NULL;

if (al->cpu >= 0) {
struct perf_env *env = machine->env;
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 043a54871276..f166ebcadc69 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -592,6 +592,7 @@ __hists__add_entry(struct hists *hists,
.map = al->map,
.sym = al->sym,
},
+ .srcline = al->srcline ? strdup(al->srcline) : NULL,
.socket = al->socket,
.cpu = al->cpu,
.cpumode = al->cpumode,
@@ -946,6 +947,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
.map = al->map,
.sym = al->sym,
},
+ .srcline = al->srcline ? strdup(al->srcline) : NULL,
.parent = iter->parent,
.raw_data = sample->raw_data,
.raw_size = sample->raw_size,
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index b358570ce615..1a5f61a18dea 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -208,6 +208,7 @@ struct addr_location {
struct thread *thread;
struct map *map;
struct symbol *sym;
+ const char *srcline;
u64 addr;
char level;
u8 filtered;
--
2.13.3

2017-08-06 21:25:19

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 13/14] perf util: do not consider empty files as valid srclines

Sometimes we get a non-null, but empty, string for the filename from
bfd. This then results in srclines of the form ":0", which is
different from the canonical SRCLINE_UNKNOWN in the form "??:0".
Set the file to NULL if it is empty to fix this.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/util/srcline.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 0c5ee741c515..7bf38b78a1ba 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -168,6 +168,9 @@ static void find_address_in_section(bfd *abfd, asection *section, void *data)
a2l->found = bfd_find_nearest_line(abfd, section, a2l->syms, pc - vma,
&a2l->filename, &a2l->funcname,
&a2l->line);
+
+ if (a2l->filename && !strlen(a2l->filename))
+ a2l->filename = NULL;
}

static struct a2l_data *addr2line_init(const char *path)
@@ -297,6 +300,9 @@ static int addr2line(const char *dso_name, u64 addr,
&a2l->funcname, &a2l->line) &&
cnt++ < MAX_INLINE_NEST) {

+ if (a2l->filename && !strlen(a2l->filename))
+ a2l->filename = NULL;
+
if (node != NULL) {
if (inline_list__append_dso_a2l(dso, node, sym))
return 0;
--
2.13.3

2017-08-06 21:25:45

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 14/14] perf util: enable handling of inlined frames by default

Now that we have caches in place to speed up the process of finding
inlined frames and srcline information repeatedly, we can enable
this useful option by default.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Suggested-by: Ingo Molnar <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/Documentation/perf-report.txt | 3 ++-
tools/perf/Documentation/perf-script.txt | 3 ++-
tools/perf/util/symbol.c | 1 +
3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 9fa84617181e..8d5834a837d8 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -433,7 +433,8 @@ include::itrace.txt[]

--inline::
If a callgraph address belongs to an inlined function, the inline stack
- will be printed. Each entry is function name or file/line.
+ will be printed. Each entry is function name or file/line. Enabled by
+ default, disable with --no-inline.

include::callchain-overhead-calculation.txt[]

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 5ee8796be96e..683242ad15a2 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -327,7 +327,8 @@ include::itrace.txt[]

--inline::
If a callgraph address belongs to an inlined function, the inline stack
- will be printed. Each entry has function name and file/line.
+ will be printed. Each entry has function name and file/line. Enabled by
+ default, disable with --no-inline.

SEE ALSO
--------
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 971b990557b4..7b8d9981588e 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -45,6 +45,7 @@ struct symbol_conf symbol_conf = {
.show_hist_headers = true,
.symfs = "",
.event_group = true,
+ .inline_name = true,
};

static enum dso_binary_type binary_type_symtab[] = {
--
2.13.3

2017-08-06 21:26:12

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 11/14] perf report: cache srclines for callchain nodes

On one hand this ensures that the memory is properly freed when
the DSO gets freed. On the other hand this significantly speeds up
the processing of the callchain nodes when lots of srclines are
requested. For one of my data files e.g.:

Before:

Performance counter stats for 'perf report -s srcline -g srcline --stdio':

52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches

52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!

After:

Performance counter stats for 'perf report -s srcline -g srcline --stdio':

22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches

22.609857253 seconds time elapsed

Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and
thus do not run this code path that often.

I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons
of srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative
would be to remove the formatting options and handle that on a
different level - i.e. print the sym/addr on demand wherever we
actually output something. And the unwind_inlines could be moved into
a separate function that does not return the srcline.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/util/dso.c | 1 +
tools/perf/util/dso.h | 1 +
tools/perf/util/machine.c | 17 +++++++++---
tools/perf/util/srcline.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/srcline.h | 7 +++++
5 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 72e6e390fd26..8c7f2862cff2 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1235,6 +1235,7 @@ void dso__delete(struct dso *dso)
for (i = 0; i < MAP__NR_TYPES; ++i)
symbols__delete(&dso->symbols[i]);
inlines__tree_delete(&dso->inlined_nodes);
+ srcline__tree_delete(&dso->srclines);

if (dso->short_name_allocated) {
zfree((char **)&dso->short_name);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 7d1e2b3c1f10..ac3a65a30ff2 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -142,6 +142,7 @@ struct dso {
struct rb_root symbols[MAP__NR_TYPES];
struct rb_root symbol_names[MAP__NR_TYPES];
struct rb_root inlined_nodes;
+ struct rb_root srclines;
struct {
u64 addr;
struct symbol *symbol;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 479c42450d6a..c5ee6ba2b9ae 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1675,11 +1675,22 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,

static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
{
+ char *srcline = NULL;
+
if (!map || callchain_param.key == CCKEY_FUNCTION)
- return NULL;
+ return srcline;
+
+ srcline = srcline__tree_find(&map->dso->srclines, ip);
+ if (!srcline) {
+ bool show_sym = false;
+ bool show_addr = callchain_param.key == CCKEY_ADDRESS;
+
+ srcline = get_srcline(map->dso, map__rip_2objdump(map, ip),
+ sym, show_sym, show_addr);
+ srcline__tree_insert(&map->dso->srclines, ip, srcline);
+ }

- return get_srcline(map->dso, map__rip_2objdump(map, ip),
- sym, false, callchain_param.key == CCKEY_ADDRESS);
+ return srcline;
}

static int add_callchain_ip(struct thread *thread,
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 01b4d5ee51fd..0c5ee741c515 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -531,6 +531,72 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
}

+struct srcline_node {
+ u64 addr;
+ char *srcline;
+ struct rb_node rb_node;
+};
+
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
+{
+ struct rb_node **p = &tree->rb_node;
+ struct rb_node *parent = NULL;
+ struct srcline_node *i, *node;
+
+ node = zalloc(sizeof(struct srcline_node));
+ if (!node) {
+ perror("not enough memory for the srcline node");
+ return;
+ }
+
+ node->addr = addr;
+ node->srcline = srcline;
+
+ while (*p != NULL) {
+ parent = *p;
+ i = rb_entry(parent, struct srcline_node, rb_node);
+ if (addr < i->addr)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+ }
+ rb_link_node(&node->rb_node, parent, p);
+ rb_insert_color(&node->rb_node, tree);
+}
+
+char *srcline__tree_find(struct rb_root *tree, u64 addr)
+{
+ struct rb_node *n = tree->rb_node;
+
+ while (n) {
+ struct srcline_node *i = rb_entry(n, struct srcline_node,
+ rb_node);
+
+ if (addr < i->addr)
+ n = n->rb_left;
+ else if (addr > i->addr)
+ n = n->rb_right;
+ else
+ return i->srcline;
+ }
+
+ return NULL;
+}
+
+void srcline__tree_delete(struct rb_root *tree)
+{
+ struct srcline_node *pos;
+ struct rb_node *next = rb_first(tree);
+
+ while (next) {
+ pos = rb_entry(next, struct srcline_node, rb_node);
+ next = rb_next(&pos->rb_node);
+ rb_erase(&pos->rb_node, tree);
+ free_srcline(pos->srcline);
+ zfree(&pos);
+ }
+}
+
struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
struct symbol *sym)
{
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 0d2aca92e8c7..187a71082cb8 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -15,6 +15,13 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
bool show_sym, bool show_addr, bool unwind_inlines);
void free_srcline(char *srcline);

+// insert the srcline into the DSO, which will take ownership
+void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline);
+// find previously inserted srcline
+char *srcline__tree_find(struct rb_root *tree, u64 addr);
+// delete all srclines within the tree
+void srcline__tree_delete(struct rb_root *tree);
+
#define SRCLINE_UNKNOWN ((char *) "??:0")

struct inline_list {
--
2.13.3

2017-08-06 21:26:33

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 09/14] perf report: properly handle branch count in match_chain

Some of the code paths I introduced before returned too early
without running the code to handle a node's branch count.
By refactoring match_chain to only have one exit point, this
can be remedied.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/util/callchain.c | 115 +++++++++++++++++++++++---------------------
1 file changed, 59 insertions(+), 56 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 45a39260a821..2c85adc1a3e2 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -659,77 +659,80 @@ static enum match_result match_chain_strings(const char *left,
return ret;
}

+static enum match_result match_address(u64 left, u64 right)
+{
+ if (left == right)
+ return MATCH_EQ;
+ else if (left < right)
+ return MATCH_LT;
+ else
+ return MATCH_GT;
+}
+
static enum match_result match_chain(struct callchain_cursor_node *node,
struct callchain_list *cnode)
{
- struct symbol *sym = node->sym;
- enum match_result match;
- u64 left, right;
+ enum match_result match = MATCH_ERROR;

- if (callchain_param.key == CCKEY_SRCLINE) {
+ switch (callchain_param.key) {
+ case CCKEY_SRCLINE:
match = match_chain_strings(cnode->srcline, node->srcline);
-
- // if no srcline is available, fallback to symbol name
- if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
- match = match_chain_strings(cnode->ms.sym->name,
- node->sym->name);
-
if (match != MATCH_ERROR)
- return match;
-
- // otherwise fall-back to IP-based comparison below
- }
-
- if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
- // compare inlined frames based on their symbol name because
- // different inlined frames will have the same symbol start
- if (cnode->ms.sym->inlined || node->sym->inlined)
- return match_chain_strings(cnode->ms.sym->name,
- node->sym->name);
-
- left = cnode->ms.sym->start;
- right = sym->start;
- } else {
- left = cnode->ip;
- right = node->ip;
+ break;
+ // else fall-through
+ case CCKEY_FUNCTION:
+ if (node->sym && cnode->ms.sym) {
+ // compare inlined frames based on their symbol name
+ // because different inlined frames will have the same
+ // symbol start. otherwise do a faster comparison based
+ // on the symbol start address
+ if (cnode->ms.sym->inlined || node->sym->inlined)
+ match = match_chain_strings(cnode->ms.sym->name,
+ node->sym->name);
+ else
+ match = match_address(cnode->ms.sym->start,
+ node->sym->start);
+ if (match != MATCH_ERROR)
+ break;
+ }
+ // else fall-through
+ case CCKEY_ADDRESS:
+ default:
+ match = match_address(cnode->ip, node->ip);
+ break;
}

- if (left == right) {
- if (node->branch) {
- cnode->branch_count++;
+ if (match == MATCH_EQ && node->branch) {
+ cnode->branch_count++;

- if (node->branch_from) {
- /*
- * It's "to" of a branch
- */
- cnode->brtype_stat.branch_to = true;
+ if (node->branch_from) {
+ /*
+ * It's "to" of a branch
+ */
+ cnode->brtype_stat.branch_to = true;

- if (node->branch_flags.predicted)
- cnode->predicted_count++;
+ if (node->branch_flags.predicted)
+ cnode->predicted_count++;

- if (node->branch_flags.abort)
- cnode->abort_count++;
+ if (node->branch_flags.abort)
+ cnode->abort_count++;

- branch_type_count(&cnode->brtype_stat,
- &node->branch_flags,
- node->branch_from,
- node->ip);
- } else {
- /*
- * It's "from" of a branch
- */
- cnode->brtype_stat.branch_to = false;
- cnode->cycles_count +=
- node->branch_flags.cycles;
- cnode->iter_count += node->nr_loop_iter;
- cnode->samples_count += node->samples;
- }
+ branch_type_count(&cnode->brtype_stat,
+ &node->branch_flags,
+ node->branch_from,
+ node->ip);
+ } else {
+ /*
+ * It's "from" of a branch
+ */
+ cnode->brtype_stat.branch_to = false;
+ cnode->cycles_count += node->branch_flags.cycles;
+ cnode->iter_count += node->nr_loop_iter;
+ cnode->samples_count += node->samples;
}
-
- return MATCH_EQ;
}

- return left > right ? MATCH_GT : MATCH_LT;
+ return match;
}

/*
--
2.13.3

2017-08-06 21:27:01

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 08/14] perf report: compare symbol name for inlined frames when sorting

Similar to the callstack frame matching, we also have to compare
the symbol name when sorting hist entries. The reason is twofold:
On one hand, multiple inlined functions will use the same symbol
start/end values of the parent, non-inlined symbol. As such, all
of these symbols often end up missing from top-level report, as
they get merged with the non-inlined frame. On the other hand,
multiple different functions may end up inlining the same function,
and we need to aggregate these values properly.

Before:

~~~~~
perf report --stdio --inline -g none
# Children Self Command Shared Object Symbol
# ........ ........ ............ ................ ...................................
#
100.00% 39.69% cpp-inlining cpp-inlining [.] main
100.00% 0.00% cpp-inlining cpp-inlining [.] _start
100.00% 0.00% cpp-inlining libc-2.25.so [.] __libc_start_main
97.03% 0.00% cpp-inlining cpp-inlining [.] std::norm<double> (inlined)
59.53% 4.26% cpp-inlining libm-2.25.so [.] hypot
55.21% 55.08% cpp-inlining libm-2.25.so [.] __hypot_finite
0.52% 0.52% cpp-inlining libm-2.25.so [.] cabs
~~~~~

After:

~~~~~
perf report --stdio --inline -g none
# Children Self Command Shared Object Symbol
# ........ ........ ............ ................ ...................................................................................................................................
#
100.00% 39.69% cpp-inlining cpp-inlining [.] main
100.00% 0.00% cpp-inlining cpp-inlining [.] _start
100.00% 0.00% cpp-inlining libc-2.25.so [.] __libc_start_main
62.57% 0.00% cpp-inlining cpp-inlining [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)
62.57% 0.00% cpp-inlining cpp-inlining [.] std::__complex_abs (inlined)
62.57% 0.00% cpp-inlining cpp-inlining [.] std::abs<double> (inlined)
62.57% 0.00% cpp-inlining cpp-inlining [.] std::norm<double> (inlined)
59.53% 4.26% cpp-inlining libm-2.25.so [.] hypot
55.21% 55.08% cpp-inlining libm-2.25.so [.] __hypot_finite
34.46% 0.00% cpp-inlining cpp-inlining [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
32.39% 0.00% cpp-inlining cpp-inlining [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
32.39% 0.00% cpp-inlining cpp-inlining [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
12.29% 0.00% cpp-inlining cpp-inlining [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
12.29% 0.00% cpp-inlining cpp-inlining [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
12.29% 0.00% cpp-inlining cpp-inlining [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
0.52% 0.52% cpp-inlining libm-2.25.so [.] cabs
~~~~~

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/util/sort.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 66fa841717fb..73752998beef 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -225,6 +225,9 @@ static int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
if (sym_l == sym_r)
return 0;

+ if (sym_l->inlined || sym_r->inlined)
+ return strcmp(sym_l->name, sym_r->name);
+
if (sym_l->start != sym_r->start)
return (int64_t)(sym_r->start - sym_l->start);

--
2.13.3

2017-08-06 21:25:08

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 05/14] perf report: mark inlined frames in output by " (inlined)" suffix

The original patch that introduced inline frame output in the
various browsers used this suffix already. The new centralized
approach that uses fake symbols for inlined frames was missing
this approach so far.

Instead of changing the symbol name itself, we only print the
suffix where needed. This allows us to efficiently lookup
the symbol for a given name without first having to append the
suffix before the lookup.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/util/callchain.c | 10 +++++++---
tools/perf/util/sort.c | 3 +++
2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 6afa8ccd75b5..5e099fe0b9dc 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1100,11 +1100,15 @@ char *callchain_list__sym_name(struct callchain_list *cl,
int printed;

if (cl->ms.sym) {
+ const char *inlined = cl->ms.sym->inlined ? " (inlined)" : "";
+
if (show_srcline && cl->srcline)
- printed = scnprintf(bf, bfsize, "%s %s",
- cl->ms.sym->name, cl->srcline);
+ printed = scnprintf(bf, bfsize, "%s %s%s",
+ cl->ms.sym->name, cl->srcline,
+ inlined);
else
- printed = scnprintf(bf, bfsize, "%s", cl->ms.sym->name);
+ printed = scnprintf(bf, bfsize, "%s%s",
+ cl->ms.sym->name, inlined);
} else
printed = scnprintf(bf, bfsize, "%#" PRIx64, cl->ip);

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 12359bd986db..66fa841717fb 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -283,6 +283,9 @@ static int _hist_entry__sym_snprintf(struct map *map, struct symbol *sym,
ret += repsep_snprintf(bf + ret, size - ret, "%.*s",
width - ret,
sym->name);
+ if (sym->inlined)
+ ret += repsep_snprintf(bf + ret, size - ret,
+ " (inlined)");
}
} else {
size_t len = BITS_PER_LONG / 4;
--
2.13.3

2017-08-06 21:27:30

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 07/14] perf report: compare symbol name for inlined frames when matching

The fake symbols we create for inlined frames will represent
different functions but can use the symbol start address. This leads
to issues when different inline branches all lead to the same
function.

Before:
~~~~~
$ perf report -s sym -i perf.inlining.data --inline --stdio -g function
...
--38.86%--_start
__libc_start_main
main
|
--37.57%--std::norm<double> (inlined)
std::_Norm_helper<true>::_S_do_it<double> (inlined)
|
--36.36%--std::abs<double> (inlined)
std::__complex_abs (inlined)
|
--12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
~~~~~

Note that this backtrace representation is completely bogus.
Complex abs does not call the linear congruential engine! It
is just a side-effect of a longer inlined stack being appended
to a shorter, different inlined stack, both of which originate
in the same function (main).

This patch fixes the issue:

~~~~~
$ perf report -s sym -i perf.inlining.data --inline --stdio -g function
...
--38.86%--_start
__libc_start_main
main
|
|--35.59%--std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
| std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
| |
| --34.37%--std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)
| std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)
| |
| --12.24%--std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)
| std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)
| std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)
|
--1.99%--std::norm<double> (inlined)
std::_Norm_helper<true>::_S_do_it<double> (inlined)
std::abs<double> (inlined)
std::__complex_abs (inlined)
~~~~~

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/util/callchain.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 5e099fe0b9dc..45a39260a821 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -663,11 +663,11 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
struct callchain_list *cnode)
{
struct symbol *sym = node->sym;
+ enum match_result match;
u64 left, right;

if (callchain_param.key == CCKEY_SRCLINE) {
- enum match_result match = match_chain_strings(cnode->srcline,
- node->srcline);
+ match = match_chain_strings(cnode->srcline, node->srcline);

// if no srcline is available, fallback to symbol name
if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
@@ -681,6 +681,12 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
}

if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
+ // compare inlined frames based on their symbol name because
+ // different inlined frames will have the same symbol start
+ if (cnode->ms.sym->inlined || node->sym->inlined)
+ return match_chain_strings(cnode->ms.sym->name,
+ node->sym->name);
+
left = cnode->ms.sym->start;
right = sym->start;
} else {
--
2.13.3

2017-08-06 21:25:06

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 04/14] perf report: fall-back to function name comparison for -g srcline

When a callchain entry has no srcline available, we ended up
comparing the instruction pointer. I consider this to be not
too useful. Rather, I think we should group the entries by
function name, which this patch adds. For people who want to
split the data on the IP boundary, using `-g address` is the
correct choice.

Before:

~~~~~
100.00% 38.86% [.] main
|
|--61.14%--main inlining.cpp:14
| std::norm<double> complex:664
| std::_Norm_helper<true>::_S_do_it<double> complex:654
| std::abs<double> complex:597
| std::__complex_abs complex:589
| |
| |--56.03%--hypot
| | |
| | |--8.45%--__hypot_finite
| | |
| | |--7.62%--__hypot_finite
| | |
| | |--2.29%--__hypot_finite
| | |
| | |--2.24%--__hypot_finite
| | |
| | |--2.06%--__hypot_finite
| | |
| | |--1.81%--__hypot_finite
...
~~~~~

After:

~~~~~
100.00% 38.86% [.] main
|
|--61.14%--main inlining.cpp:14
| std::norm<double> complex:664
| std::_Norm_helper<true>::_S_do_it<double> complex:654
| std::abs<double> complex:597
| std::__complex_abs complex:589
| |
| |--60.29%--hypot
| | |
| | --56.03%--__hypot_finite
| |
| --0.85%--cabs
~~~~~

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/util/callchain.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 9854adb06ac1..6afa8ccd75b5 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -638,11 +638,9 @@ enum match_result {
MATCH_GT,
};

-static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
- struct callchain_list *cnode)
+static enum match_result match_chain_strings(const char *left,
+ const char *right)
{
- const char *left = cnode->srcline;
- const char *right = node->srcline;
enum match_result ret = MATCH_EQ;
int cmp;

@@ -652,10 +650,8 @@ static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
cmp = 1;
else if (left && !right)
cmp = -1;
- else if (cnode->ip == node->ip)
- cmp = 0;
else
- cmp = (cnode->ip < node->ip) ? -1 : 1;
+ return MATCH_ERROR;

if (cmp != 0)
ret = cmp < 0 ? MATCH_LT : MATCH_GT;
@@ -670,10 +666,18 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
u64 left, right;

if (callchain_param.key == CCKEY_SRCLINE) {
- enum match_result match = match_chain_srcline(node, cnode);
+ enum match_result match = match_chain_strings(cnode->srcline,
+ node->srcline);
+
+ // if no srcline is available, fallback to symbol name
+ if (match == MATCH_ERROR && cnode->ms.sym && node->sym)
+ match = match_chain_strings(cnode->ms.sym->name,
+ node->sym->name);

if (match != MATCH_ERROR)
return match;
+
+ // otherwise fall-back to IP-based comparison below
}

if (cnode->ms.sym && sym && callchain_param.key == CCKEY_FUNCTION) {
--
2.13.3

2017-08-06 21:27:59

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 06/14] perf script: mark inlined frames and do not print DSO for them

Instead of showing the (repeated) DSO name of the non-inlined
frame, we now show the "(inlined)" suffix instead.

Before:
214f7 __hypot_finite (/usr/lib/libm-2.25.so)
ace3 hypot (/usr/lib/libm-2.25.so)
a4a std::__complex_abs (/home/milian/projects/src/perf-tests/inlining)
a4a std::abs<double> (/home/milian/projects/src/perf-tests/inlining)
a4a std::_Norm_helper<true>::_S_do_it<double> (/home/milian/projects/src/perf-tests/inlining)
a4a std::norm<double> (/home/milian/projects/src/perf-tests/inlining)
a4a main (/home/milian/projects/src/perf-tests/inlining)
20510 __libc_start_main (/usr/lib/libc-2.25.so)
bd9 _start (/home/milian/projects/src/perf-tests/inlining)

After:
214f7 __hypot_finite (/usr/lib/libm-2.25.so)
ace3 hypot (/usr/lib/libm-2.25.so)
a4a std::__complex_abs (inlined)
a4a std::abs<double> (inlined)
a4a std::_Norm_helper<true>::_S_do_it<double> (inlined)
a4a std::norm<double> (inlined)
a4a main (/home/milian/projects/src/perf-tests/inlining)
20510 __libc_start_main (/usr/lib/libc-2.25.so)
bd9 _start (/home/milian/projects/src/perf-tests/inlining)

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/util/evsel_fprintf.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index f2c6c5ee11e8..5b9e89257aa7 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -157,7 +157,7 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
}
}

- if (print_dso) {
+ if (print_dso && (!node->sym || !node->sym->inlined)) {
printed += fprintf(fp, " (");
printed += map__fprintf_dsoname(node->map, fp);
printed += fprintf(fp, ")");
@@ -166,6 +166,9 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
if (print_srcline)
printed += map__fprintf_srcline(node->map, addr, "\n ", fp);

+ if (node->sym && node->sym->inlined)
+ printed += fprintf(fp, " (inlined)");
+
if (!print_oneline)
printed += fprintf(fp, "\n");

--
2.13.3

2017-08-06 21:28:22

by Milian Wolff

[permalink] [raw]
Subject: [PATCH v2 02/14] perf util: take elf_name as const string in dso__demangle_sym

The input string is not modified and thus can be passed
in as a pointer to const data.

Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Signed-off-by: Milian Wolff <[email protected]>
---
tools/perf/util/symbol-elf.c | 2 +-
tools/perf/util/symbol-minimal.c | 2 +-
tools/perf/util/symbol.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 502505cf236a..7cf18f14e152 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -391,7 +391,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss, struct map *
return 0;
}

-char *dso__demangle_sym(struct dso *dso, int kmodule, char *elf_name)
+char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name)
{
return demangle_sym(dso, kmodule, elf_name);
}
diff --git a/tools/perf/util/symbol-minimal.c b/tools/perf/util/symbol-minimal.c
index 40bf5d4c0bfd..1a5aa35b0100 100644
--- a/tools/perf/util/symbol-minimal.c
+++ b/tools/perf/util/symbol-minimal.c
@@ -377,7 +377,7 @@ void symbol__elf_init(void)

char *dso__demangle_sym(struct dso *dso __maybe_unused,
int kmodule __maybe_unused,
- char *elf_name __maybe_unused)
+ const char *elf_name __maybe_unused)
{
return NULL;
}
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 41ebba9a2eb2..f0b08810d7fa 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -306,7 +306,7 @@ int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *syms_ss,
int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss,
struct map *map);

-char *dso__demangle_sym(struct dso *dso, int kmodule, char *elf_name);
+char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);

void __symbols__insert(struct rb_root *symbols, struct symbol *sym, bool kernel);
void symbols__insert(struct rb_root *symbols, struct symbol *sym);
--
2.13.3

2017-08-07 15:07:18

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2 01/14] perf report: remove code to handle inline frames from browsers

Em Sun, Aug 06, 2017 at 11:24:33PM +0200, Milian Wolff escreveu:
> The follow-up commits will make inline frames first-class citizens
> in the callchain, thereby obsoleting all of this special code.

So you are removing the feature to then reintroduce it, is that it? That
is not usual :-\

Normally we go on replacing bit by bit or have some ifdef, etc, to then
phase out the old code.

Perhaps in this case your approach is the best one, still have to look
at all of it, and it would help if the people behind the original code
could review this, Yao Jin, can you take a look at this patch series,
please?

- Arnaldo

> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: David Ahern <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Yao Jin <[email protected]>
> Signed-off-by: Milian Wolff <[email protected]>
> ---
> tools/perf/ui/browsers/hists.c | 180 +++-------------------------------------
> tools/perf/ui/stdio/hist.c | 77 +----------------
> tools/perf/util/evsel_fprintf.c | 32 -------
> tools/perf/util/hist.c | 5 --
> tools/perf/util/sort.h | 1 -
> 5 files changed, 13 insertions(+), 282 deletions(-)
>
> diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
> index f4bc2462bc2c..48d056bb3747 100644
> --- a/tools/perf/ui/browsers/hists.c
> +++ b/tools/perf/ui/browsers/hists.c
> @@ -154,57 +154,9 @@ static void callchain_list__set_folding(struct callchain_list *cl, bool unfold)
> cl->unfolded = unfold ? cl->has_children : false;
> }
>
> -static struct inline_node *inline_node__create(struct map *map, u64 ip)
> -{
> - struct dso *dso;
> - struct inline_node *node;
> -
> - if (map == NULL)
> - return NULL;
> -
> - dso = map->dso;
> - if (dso == NULL)
> - return NULL;
> -
> - node = dso__parse_addr_inlines(dso,
> - map__rip_2objdump(map, ip));
> -
> - return node;
> -}
> -
> -static int inline__count_rows(struct inline_node *node)
> -{
> - struct inline_list *ilist;
> - int i = 0;
> -
> - if (node == NULL)
> - return 0;
> -
> - list_for_each_entry(ilist, &node->val, list) {
> - if ((ilist->filename != NULL) || (ilist->funcname != NULL))
> - i++;
> - }
> -
> - return i;
> -}
> -
> -static int callchain_list__inline_rows(struct callchain_list *chain)
> -{
> - struct inline_node *node;
> - int rows;
> -
> - node = inline_node__create(chain->ms.map, chain->ip);
> - if (node == NULL)
> - return 0;
> -
> - rows = inline__count_rows(node);
> - inline_node__delete(node);
> - return rows;
> -}
> -
> static int callchain_node__count_rows_rb_tree(struct callchain_node *node)
> {
> - int n = 0, inline_rows;
> + int n = 0;
> struct rb_node *nd;
>
> for (nd = rb_first(&node->rb_root); nd; nd = rb_next(nd)) {
> @@ -215,12 +167,6 @@ static int callchain_node__count_rows_rb_tree(struct callchain_node *node)
> list_for_each_entry(chain, &child->val, list) {
> ++n;
>
> - if (symbol_conf.inline_name) {
> - inline_rows =
> - callchain_list__inline_rows(chain);
> - n += inline_rows;
> - }
> -
> /* We need this because we may not have children */
> folded_sign = callchain_list__folded(chain);
> if (folded_sign == '+')
> @@ -272,7 +218,7 @@ static int callchain_node__count_rows(struct callchain_node *node)
> {
> struct callchain_list *chain;
> bool unfolded = false;
> - int n = 0, inline_rows;
> + int n = 0;
>
> if (callchain_param.mode == CHAIN_FLAT)
> return callchain_node__count_flat_rows(node);
> @@ -281,10 +227,6 @@ static int callchain_node__count_rows(struct callchain_node *node)
>
> list_for_each_entry(chain, &node->val, list) {
> ++n;
> - if (symbol_conf.inline_name) {
> - inline_rows = callchain_list__inline_rows(chain);
> - n += inline_rows;
> - }
>
> unfolded = chain->unfolded;
> }
> @@ -432,19 +374,6 @@ static void hist_entry__init_have_children(struct hist_entry *he)
> he->init_have_children = true;
> }
>
> -static void hist_entry_init_inline_node(struct hist_entry *he)
> -{
> - if (he->inline_node)
> - return;
> -
> - he->inline_node = inline_node__create(he->ms.map, he->ip);
> -
> - if (he->inline_node == NULL)
> - return;
> -
> - he->has_children = true;
> -}
> -
> static bool hist_browser__toggle_fold(struct hist_browser *browser)
> {
> struct hist_entry *he = browser->he_selection;
> @@ -476,12 +405,8 @@ static bool hist_browser__toggle_fold(struct hist_browser *browser)
>
> if (he->unfolded) {
> if (he->leaf)
> - if (he->inline_node)
> - he->nr_rows = inline__count_rows(
> - he->inline_node);
> - else
> - he->nr_rows = callchain__count_rows(
> - &he->sorted_chain);
> + he->nr_rows = callchain__count_rows(
> + &he->sorted_chain);
> else
> he->nr_rows = hierarchy_count_rows(browser, he, false);
>
> @@ -841,71 +766,6 @@ static bool hist_browser__check_dump_full(struct hist_browser *browser __maybe_u
>
> #define LEVEL_OFFSET_STEP 3
>
> -static int hist_browser__show_inline(struct hist_browser *browser,
> - struct inline_node *node,
> - unsigned short row,
> - int offset)
> -{
> - struct inline_list *ilist;
> - char buf[1024];
> - int color, width, first_row;
> -
> - first_row = row;
> - width = browser->b.width - (LEVEL_OFFSET_STEP + 2);
> - list_for_each_entry(ilist, &node->val, list) {
> - if ((ilist->filename != NULL) || (ilist->funcname != NULL)) {
> - color = HE_COLORSET_NORMAL;
> - if (ui_browser__is_current_entry(&browser->b, row))
> - color = HE_COLORSET_SELECTED;
> -
> - if (callchain_param.key == CCKEY_ADDRESS ||
> - callchain_param.key == CCKEY_SRCLINE) {
> - if (ilist->filename != NULL)
> - scnprintf(buf, sizeof(buf),
> - "%s:%d (inline)",
> - ilist->filename,
> - ilist->line_nr);
> - else
> - scnprintf(buf, sizeof(buf), "??");
> - } else if (ilist->funcname != NULL)
> - scnprintf(buf, sizeof(buf), "%s (inline)",
> - ilist->funcname);
> - else if (ilist->filename != NULL)
> - scnprintf(buf, sizeof(buf),
> - "%s:%d (inline)",
> - ilist->filename,
> - ilist->line_nr);
> - else
> - scnprintf(buf, sizeof(buf), "??");
> -
> - ui_browser__set_color(&browser->b, color);
> - hist_browser__gotorc(browser, row, 0);
> - ui_browser__write_nstring(&browser->b, " ",
> - LEVEL_OFFSET_STEP + offset);
> - ui_browser__write_nstring(&browser->b, buf, width);
> - row++;
> - }
> - }
> -
> - return row - first_row;
> -}
> -
> -static size_t show_inline_list(struct hist_browser *browser, struct map *map,
> - u64 ip, int row, int offset)
> -{
> - struct inline_node *node;
> - int ret;
> -
> - node = inline_node__create(map, ip);
> - if (node == NULL)
> - return 0;
> -
> - ret = hist_browser__show_inline(browser, node, row, offset);
> -
> - inline_node__delete(node);
> - return ret;
> -}
> -
> static int hist_browser__show_callchain_list(struct hist_browser *browser,
> struct callchain_node *node,
> struct callchain_list *chain,
> @@ -917,7 +777,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
> char bf[1024], *alloc_str;
> char buf[64], *alloc_str2;
> const char *str;
> - int inline_rows = 0, ret = 1;
> + int ret = 1;
>
> if (arg->row_offset != 0) {
> arg->row_offset--;
> @@ -958,12 +818,7 @@ static int hist_browser__show_callchain_list(struct hist_browser *browser,
> free(alloc_str);
> free(alloc_str2);
>
> - if (symbol_conf.inline_name) {
> - inline_rows = show_inline_list(browser, chain->ms.map,
> - chain->ip, row + 1, offset);
> - }
> -
> - return ret + inline_rows;
> + return ret;
> }
>
> static bool check_percent_display(struct rb_node *node, u64 parent_total)
> @@ -1387,12 +1242,6 @@ static int hist_browser__show_entry(struct hist_browser *browser,
> folded_sign = hist_entry__folded(entry);
> }
>
> - if (symbol_conf.inline_name &&
> - (!entry->has_children)) {
> - hist_entry_init_inline_node(entry);
> - folded_sign = hist_entry__folded(entry);
> - }
> -
> if (row_offset == 0) {
> struct hpp_arg arg = {
> .b = &browser->b,
> @@ -1424,8 +1273,7 @@ static int hist_browser__show_entry(struct hist_browser *browser,
> }
>
> if (first) {
> - if (symbol_conf.use_callchain ||
> - symbol_conf.inline_name) {
> + if (symbol_conf.use_callchain) {
> ui_browser__printf(&browser->b, "%c ", folded_sign);
> width -= 2;
> }
> @@ -1467,15 +1315,11 @@ static int hist_browser__show_entry(struct hist_browser *browser,
> .is_current_entry = current_entry,
> };
>
> - if (entry->inline_node)
> - printed += hist_browser__show_inline(browser,
> - entry->inline_node, row, 0);
> - else
> - printed += hist_browser__show_callchain(browser,
> - entry, 1, row,
> - hist_browser__show_callchain_entry,
> - &arg,
> - hist_browser__check_output_full);
> + printed += hist_browser__show_callchain(browser,
> + entry, 1, row,
> + hist_browser__show_callchain_entry,
> + &arg,
> + hist_browser__check_output_full);
> }
>
> return printed;
> diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
> index 5c95b8301c67..8585bc628e0c 100644
> --- a/tools/perf/ui/stdio/hist.c
> +++ b/tools/perf/ui/stdio/hist.c
> @@ -21,64 +21,6 @@ static size_t callchain__fprintf_left_margin(FILE *fp, int left_margin)
> return ret;
> }
>
> -static size_t inline__fprintf(struct map *map, u64 ip, int left_margin,
> - int depth, int depth_mask, FILE *fp)
> -{
> - struct dso *dso;
> - struct inline_node *node;
> - struct inline_list *ilist;
> - int ret = 0, i;
> -
> - if (map == NULL)
> - return 0;
> -
> - dso = map->dso;
> - if (dso == NULL)
> - return 0;
> -
> - node = dso__parse_addr_inlines(dso,
> - map__rip_2objdump(map, ip));
> - if (node == NULL)
> - return 0;
> -
> - list_for_each_entry(ilist, &node->val, list) {
> - if ((ilist->filename != NULL) || (ilist->funcname != NULL)) {
> - ret += callchain__fprintf_left_margin(fp, left_margin);
> -
> - for (i = 0; i < depth; i++) {
> - if (depth_mask & (1 << i))
> - ret += fprintf(fp, "|");
> - else
> - ret += fprintf(fp, " ");
> - ret += fprintf(fp, " ");
> - }
> -
> - if (callchain_param.key == CCKEY_ADDRESS ||
> - callchain_param.key == CCKEY_SRCLINE) {
> - if (ilist->filename != NULL)
> - ret += fprintf(fp, "%s:%d (inline)",
> - ilist->filename,
> - ilist->line_nr);
> - else
> - ret += fprintf(fp, "??");
> - } else if (ilist->funcname != NULL)
> - ret += fprintf(fp, "%s (inline)",
> - ilist->funcname);
> - else if (ilist->filename != NULL)
> - ret += fprintf(fp, "%s:%d (inline)",
> - ilist->filename,
> - ilist->line_nr);
> - else
> - ret += fprintf(fp, "??");
> -
> - ret += fprintf(fp, "\n");
> - }
> - }
> -
> - inline_node__delete(node);
> - return ret;
> -}
> -
> static size_t ipchain__fprintf_graph_line(FILE *fp, int depth, int depth_mask,
> int left_margin)
> {
> @@ -141,9 +83,6 @@ static size_t ipchain__fprintf_graph(FILE *fp, struct callchain_node *node,
> fputc('\n', fp);
> free(alloc_str);
>
> - if (symbol_conf.inline_name)
> - ret += inline__fprintf(chain->ms.map, chain->ip,
> - left_margin, depth, depth_mask, fp);
> return ret;
> }
>
> @@ -318,13 +257,6 @@ static size_t callchain__fprintf_graph(FILE *fp, struct rb_root *root,
>
> if (++entries_printed == callchain_param.print_limit)
> break;
> -
> - if (symbol_conf.inline_name)
> - ret += inline__fprintf(chain->ms.map,
> - chain->ip,
> - left_margin,
> - 0, 0,
> - fp);
> }
> root = &cnode->rb_root;
> }
> @@ -604,7 +536,6 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
> {
> int ret;
> int callchain_ret = 0;
> - int inline_ret = 0;
> struct perf_hpp hpp = {
> .buf = bf,
> .size = size,
> @@ -626,13 +557,7 @@ static int hist_entry__fprintf(struct hist_entry *he, size_t size,
> callchain_ret = hist_entry_callchain__fprintf(he, total_period,
> 0, fp);
>
> - if (callchain_ret == 0 && symbol_conf.inline_name) {
> - inline_ret = inline__fprintf(he->ms.map, he->ip, 0, 0, 0, fp);
> - ret += inline_ret;
> - if (inline_ret > 0)
> - ret += fprintf(fp, "\n");
> - } else
> - ret += callchain_ret;
> + ret += callchain_ret;
>
> return ret;
> }
> diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
> index 583f3a602506..f2c6c5ee11e8 100644
> --- a/tools/perf/util/evsel_fprintf.c
> +++ b/tools/perf/util/evsel_fprintf.c
> @@ -169,38 +169,6 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
> if (!print_oneline)
> printed += fprintf(fp, "\n");
>
> - if (symbol_conf.inline_name && node->map) {
> - struct inline_node *inode;
> -
> - addr = map__rip_2objdump(node->map, node->ip),
> - inode = dso__parse_addr_inlines(node->map->dso, addr);
> -
> - if (inode) {
> - struct inline_list *ilist;
> -
> - list_for_each_entry(ilist, &inode->val, list) {
> - if (print_arrow)
> - printed += fprintf(fp, " <-");
> -
> - /* IP is same, just skip it */
> - if (print_ip)
> - printed += fprintf(fp, "%c%16s",
> - s, "");
> - if (print_sym)
> - printed += fprintf(fp, " %s",
> - ilist->funcname);
> - if (print_srcline)
> - printed += fprintf(fp, "\n %s:%d",
> - ilist->filename,
> - ilist->line_nr);
> - if (!print_oneline)
> - printed += fprintf(fp, "\n");
> - }
> -
> - inline_node__delete(inode);
> - }
> - }
> -
> if (symbol_conf.bt_stop_list &&
> node->sym &&
> strlist__has_entry(symbol_conf.bt_stop_list,
> diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
> index 9453b2e27015..043a54871276 100644
> --- a/tools/perf/util/hist.c
> +++ b/tools/perf/util/hist.c
> @@ -1137,11 +1137,6 @@ void hist_entry__delete(struct hist_entry *he)
> zfree(&he->mem_info);
> }
>
> - if (he->inline_node) {
> - inline_node__delete(he->inline_node);
> - he->inline_node = NULL;
> - }
> -
> zfree(&he->stat_acc);
> free_srcline(he->srcline);
> if (he->srcfile && he->srcfile[0])
> diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
> index b7c75597e18f..2f41416f01fa 100644
> --- a/tools/perf/util/sort.h
> +++ b/tools/perf/util/sort.h
> @@ -129,7 +129,6 @@ struct hist_entry {
> };
> char *srcline;
> char *srcfile;
> - struct inline_node *inline_node;
> struct symbol *parent;
> struct branch_info *branch_info;
> struct hists *hists;
> --
> 2.13.3

2017-08-07 15:10:26

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2 02/14] perf util: take elf_name as const string in dso__demangle_sym

Em Sun, Aug 06, 2017 at 11:24:34PM +0200, Milian Wolff escreveu:
> The input string is not modified and thus can be passed
> in as a pointer to const data.

Applied.

- Arnaldo

> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: David Ahern <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Yao Jin <[email protected]>
> Signed-off-by: Milian Wolff <[email protected]>
> ---
> tools/perf/util/symbol-elf.c | 2 +-
> tools/perf/util/symbol-minimal.c | 2 +-
> tools/perf/util/symbol.h | 2 +-
> 3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> index 502505cf236a..7cf18f14e152 100644
> --- a/tools/perf/util/symbol-elf.c
> +++ b/tools/perf/util/symbol-elf.c
> @@ -391,7 +391,7 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss, struct map *
> return 0;
> }
>
> -char *dso__demangle_sym(struct dso *dso, int kmodule, char *elf_name)
> +char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name)
> {
> return demangle_sym(dso, kmodule, elf_name);
> }
> diff --git a/tools/perf/util/symbol-minimal.c b/tools/perf/util/symbol-minimal.c
> index 40bf5d4c0bfd..1a5aa35b0100 100644
> --- a/tools/perf/util/symbol-minimal.c
> +++ b/tools/perf/util/symbol-minimal.c
> @@ -377,7 +377,7 @@ void symbol__elf_init(void)
>
> char *dso__demangle_sym(struct dso *dso __maybe_unused,
> int kmodule __maybe_unused,
> - char *elf_name __maybe_unused)
> + const char *elf_name __maybe_unused)
> {
> return NULL;
> }
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index 41ebba9a2eb2..f0b08810d7fa 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -306,7 +306,7 @@ int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *syms_ss,
> int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss,
> struct map *map);
>
> -char *dso__demangle_sym(struct dso *dso, int kmodule, char *elf_name);
> +char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
>
> void __symbols__insert(struct rb_root *symbols, struct symbol *sym, bool kernel);
> void symbols__insert(struct rb_root *symbols, struct symbol *sym);
> --
> 2.13.3

2017-08-07 15:21:09

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2 13/14] perf util: do not consider empty files as valid srclines

Em Sun, Aug 06, 2017 at 11:24:45PM +0200, Milian Wolff escreveu:
> Sometimes we get a non-null, but empty, string for the filename from
> bfd. This then results in srclines of the form ":0", which is
> different from the canonical SRCLINE_UNKNOWN in the form "??:0".
> Set the file to NULL if it is empty to fix this.

I cherry picked this one now, i.e. looks useful even before looking at
the other patches in depth.

- Arnaldo

> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: David Ahern <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Yao Jin <[email protected]>
> Signed-off-by: Milian Wolff <[email protected]>
> ---
> tools/perf/util/srcline.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
> index 0c5ee741c515..7bf38b78a1ba 100644
> --- a/tools/perf/util/srcline.c
> +++ b/tools/perf/util/srcline.c
> @@ -168,6 +168,9 @@ static void find_address_in_section(bfd *abfd, asection *section, void *data)
> a2l->found = bfd_find_nearest_line(abfd, section, a2l->syms, pc - vma,
> &a2l->filename, &a2l->funcname,
> &a2l->line);
> +
> + if (a2l->filename && !strlen(a2l->filename))
> + a2l->filename = NULL;
> }
>
> static struct a2l_data *addr2line_init(const char *path)
> @@ -297,6 +300,9 @@ static int addr2line(const char *dso_name, u64 addr,
> &a2l->funcname, &a2l->line) &&
> cnt++ < MAX_INLINE_NEST) {
>
> + if (a2l->filename && !strlen(a2l->filename))
> + a2l->filename = NULL;
> +
> if (node != NULL) {
> if (inline_list__append_dso_a2l(dso, node, sym))
> return 0;
> --
> 2.13.3

2017-08-07 19:22:12

by Milian Wolff

[permalink] [raw]
Subject: Re: [PATCH v2 01/14] perf report: remove code to handle inline frames from browsers

On Montag, 7. August 2017 17:07:10 CEST Arnaldo Carvalho de Melo wrote:
> Em Sun, Aug 06, 2017 at 11:24:33PM +0200, Milian Wolff escreveu:
> > The follow-up commits will make inline frames first-class citizens
> > in the callchain, thereby obsoleting all of this special code.
>
> So you are removing the feature to then reintroduce it, is that it? That
> is not usual :-\
>
> Normally we go on replacing bit by bit or have some ifdef, etc, to then
> phase out the old code.
>
> Perhaps in this case your approach is the best one, still have to look
> at all of it, and it would help if the people behind the original code
> could review this, Yao Jin, can you take a look at this patch series,
> please?

Yes, I also did that in v1 of this patch series. Note that I can easily squash
this, if needed. But I personally think for reviewing purposes, having it as
separate patches is far better.

Cheers

--
Milian Wolff | [email protected] | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

2017-08-07 20:16:31

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2 01/14] perf report: remove code to handle inline frames from browsers

Em Mon, Aug 07, 2017 at 09:22:08PM +0200, Milian Wolff escreveu:
> On Montag, 7. August 2017 17:07:10 CEST Arnaldo Carvalho de Melo wrote:
> > Em Sun, Aug 06, 2017 at 11:24:33PM +0200, Milian Wolff escreveu:
> > > The follow-up commits will make inline frames first-class citizens
> > > in the callchain, thereby obsoleting all of this special code.

> > So you are removing the feature to then reintroduce it, is that it? That
> > is not usual :-\

> > Normally we go on replacing bit by bit or have some ifdef, etc, to then
> > phase out the old code.

> > Perhaps in this case your approach is the best one, still have to look
> > at all of it, and it would help if the people behind the original code
> > could review this, Yao Jin, can you take a look at this patch series,
> > please?

> Yes, I also did that in v1 of this patch series. Note that I can easily squash
> this, if needed. But I personally think for reviewing purposes, having it as
> separate patches is far better.

Well, if it is a full blown alternative implementation, like it seems to
be (haven't started reviewing yet), probably.

- Arnaldo

2017-08-10 02:13:29

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v2 11/14] perf report: cache srclines for callchain nodes

Hi Milian,

On Sun, Aug 06, 2017 at 11:24:43PM +0200, Milian Wolff wrote:
> On one hand this ensures that the memory is properly freed when
> the DSO gets freed. On the other hand this significantly speeds up
> the processing of the callchain nodes when lots of srclines are
> requested. For one of my data files e.g.:
>
> Before:
>
> Performance counter stats for 'perf report -s srcline -g srcline --stdio':
>
> 52496.495043 task-clock (msec) # 0.999 CPUs utilized
> 634 context-switches # 0.012 K/sec
> 2 cpu-migrations # 0.000 K/sec
> 191,561 page-faults # 0.004 M/sec
> 165,074,498,235 cycles # 3.144 GHz
> 334,170,832,408 instructions # 2.02 insn per cycle
> 90,220,029,745 branches # 1718.591 M/sec
> 654,525,177 branch-misses # 0.73% of all branches
>
> 52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
>
> After:
>
> Performance counter stats for 'perf report -s srcline -g srcline --stdio':
>
> 22606.323706 task-clock (msec) # 1.000 CPUs utilized
> 31 context-switches # 0.001 K/sec
> 0 cpu-migrations # 0.000 K/sec
> 185,471 page-faults # 0.008 M/sec
> 71,188,113,681 cycles # 3.149 GHz
> 133,204,943,083 instructions # 1.87 insn per cycle
> 34,886,384,979 branches # 1543.214 M/sec
> 278,214,495 branch-misses # 0.80% of all branches
>
> 22.609857253 seconds time elapsed
>
> Note that the difference is only this large when `--inline` is not
> passed. In such situations, we would use the inliner cache and
> thus do not run this code path that often.
>
> I think that this cache should actually be used in other places, too.
> When looking at the valgrind leak report for perf report, we see tons
> of srclines being leaked, most notably from calls to
> hist_entry__get_srcline. The problem is that get_srcline has many
> different formatting options (show_sym, show_addr, potentially even
> unwind_inlines when calling __get_srcline directly). As such, the
> srcline cannot easily be cached for all calls, or we'd have to add
> caches for all formatting combinations (6 so far). An alternative
> would be to remove the formatting options and handle that on a
> different level - i.e. print the sym/addr on demand wherever we
> actually output something. And the unwind_inlines could be moved into
> a separate function that does not return the srcline.

Agreed. Also I guess no need to unwind anymore to get a srcfile for
an entry with your change.

Thanks,
Namhyung


>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: David Ahern <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Yao Jin <[email protected]>
> Signed-off-by: Milian Wolff <[email protected]>
> ---
> tools/perf/util/dso.c | 1 +
> tools/perf/util/dso.h | 1 +
> tools/perf/util/machine.c | 17 +++++++++---
> tools/perf/util/srcline.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++
> tools/perf/util/srcline.h | 7 +++++
> 5 files changed, 89 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> index 72e6e390fd26..8c7f2862cff2 100644
> --- a/tools/perf/util/dso.c
> +++ b/tools/perf/util/dso.c
> @@ -1235,6 +1235,7 @@ void dso__delete(struct dso *dso)
> for (i = 0; i < MAP__NR_TYPES; ++i)
> symbols__delete(&dso->symbols[i]);
> inlines__tree_delete(&dso->inlined_nodes);
> + srcline__tree_delete(&dso->srclines);
>
> if (dso->short_name_allocated) {
> zfree((char **)&dso->short_name);
> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> index 7d1e2b3c1f10..ac3a65a30ff2 100644
> --- a/tools/perf/util/dso.h
> +++ b/tools/perf/util/dso.h
> @@ -142,6 +142,7 @@ struct dso {
> struct rb_root symbols[MAP__NR_TYPES];
> struct rb_root symbol_names[MAP__NR_TYPES];
> struct rb_root inlined_nodes;
> + struct rb_root srclines;
> struct {
> u64 addr;
> struct symbol *symbol;
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 479c42450d6a..c5ee6ba2b9ae 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1675,11 +1675,22 @@ struct mem_info *sample__resolve_mem(struct perf_sample *sample,
>
> static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip)
> {
> + char *srcline = NULL;
> +
> if (!map || callchain_param.key == CCKEY_FUNCTION)
> - return NULL;
> + return srcline;
> +
> + srcline = srcline__tree_find(&map->dso->srclines, ip);
> + if (!srcline) {
> + bool show_sym = false;
> + bool show_addr = callchain_param.key == CCKEY_ADDRESS;
> +
> + srcline = get_srcline(map->dso, map__rip_2objdump(map, ip),
> + sym, show_sym, show_addr);
> + srcline__tree_insert(&map->dso->srclines, ip, srcline);
> + }
>
> - return get_srcline(map->dso, map__rip_2objdump(map, ip),
> - sym, false, callchain_param.key == CCKEY_ADDRESS);
> + return srcline;
> }
>
> static int add_callchain_ip(struct thread *thread,
> diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
> index 01b4d5ee51fd..0c5ee741c515 100644
> --- a/tools/perf/util/srcline.c
> +++ b/tools/perf/util/srcline.c
> @@ -531,6 +531,72 @@ char *get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
> return __get_srcline(dso, addr, sym, show_sym, show_addr, false);
> }
>
> +struct srcline_node {
> + u64 addr;
> + char *srcline;
> + struct rb_node rb_node;
> +};
> +
> +void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
> +{
> + struct rb_node **p = &tree->rb_node;
> + struct rb_node *parent = NULL;
> + struct srcline_node *i, *node;
> +
> + node = zalloc(sizeof(struct srcline_node));
> + if (!node) {
> + perror("not enough memory for the srcline node");
> + return;
> + }
> +
> + node->addr = addr;
> + node->srcline = srcline;
> +
> + while (*p != NULL) {
> + parent = *p;
> + i = rb_entry(parent, struct srcline_node, rb_node);
> + if (addr < i->addr)
> + p = &(*p)->rb_left;
> + else
> + p = &(*p)->rb_right;
> + }
> + rb_link_node(&node->rb_node, parent, p);
> + rb_insert_color(&node->rb_node, tree);
> +}
> +
> +char *srcline__tree_find(struct rb_root *tree, u64 addr)
> +{
> + struct rb_node *n = tree->rb_node;
> +
> + while (n) {
> + struct srcline_node *i = rb_entry(n, struct srcline_node,
> + rb_node);
> +
> + if (addr < i->addr)
> + n = n->rb_left;
> + else if (addr > i->addr)
> + n = n->rb_right;
> + else
> + return i->srcline;
> + }
> +
> + return NULL;
> +}
> +
> +void srcline__tree_delete(struct rb_root *tree)
> +{
> + struct srcline_node *pos;
> + struct rb_node *next = rb_first(tree);
> +
> + while (next) {
> + pos = rb_entry(next, struct srcline_node, rb_node);
> + next = rb_next(&pos->rb_node);
> + rb_erase(&pos->rb_node, tree);
> + free_srcline(pos->srcline);
> + zfree(&pos);
> + }
> +}
> +
> struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
> struct symbol *sym)
> {
> diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
> index 0d2aca92e8c7..187a71082cb8 100644
> --- a/tools/perf/util/srcline.h
> +++ b/tools/perf/util/srcline.h
> @@ -15,6 +15,13 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
> bool show_sym, bool show_addr, bool unwind_inlines);
> void free_srcline(char *srcline);
>
> +// insert the srcline into the DSO, which will take ownership
> +void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline);
> +// find previously inserted srcline
> +char *srcline__tree_find(struct rb_root *tree, u64 addr);
> +// delete all srclines within the tree
> +void srcline__tree_delete(struct rb_root *tree);
> +
> #define SRCLINE_UNKNOWN ((char *) "??:0")
>
> struct inline_list {
> --
> 2.13.3
>

2017-08-10 11:51:47

by Milian Wolff

[permalink] [raw]
Subject: Re: [PATCH v2 11/14] perf report: cache srclines for callchain nodes

On Donnerstag, 10. August 2017 04:13:25 CEST Namhyung Kim wrote:
> Hi Milian,
>
> On Sun, Aug 06, 2017 at 11:24:43PM +0200, Milian Wolff wrote:
> > On one hand this ensures that the memory is properly freed when
> > the DSO gets freed. On the other hand this significantly speeds up
> > the processing of the callchain nodes when lots of srclines are
> > requested. For one of my data files e.g.:
> >
> > Before:
> > Performance counter stats for 'perf report -s srcline -g srcline --
stdio':
> > 52496.495043 task-clock (msec) # 0.999 CPUs utilized
> >
> > 634 context-switches # 0.012 K/sec
> >
> > 2 cpu-migrations # 0.000 K/sec
> >
> > 191,561 page-faults # 0.004 M/sec
> >
> > 165,074,498,235 cycles # 3.144 GHz
> > 334,170,832,408 instructions # 2.02 insn per
> > cycle
> >
> > 90,220,029,745 branches # 1718.591 M/sec
> >
> > 654,525,177 branch-misses # 0.73% of all
> > branches
> >
> > 52.533273822 seconds time elapsedProcessed 236605 events and lost 40
> > chunks!>
> > After:
> > Performance counter stats for 'perf report -s srcline -g srcline --
stdio':
> > 22606.323706 task-clock (msec) # 1.000 CPUs utilized
> >
> > 31 context-switches # 0.001 K/sec
> >
> > 0 cpu-migrations # 0.000 K/sec
> >
> > 185,471 page-faults # 0.008 M/sec
> >
> > 71,188,113,681 cycles # 3.149 GHz
> >
> > 133,204,943,083 instructions # 1.87 insn per
> > cycle
> >
> > 34,886,384,979 branches # 1543.214 M/sec
> >
> > 278,214,495 branch-misses # 0.80% of all
> > branches
> >
> > 22.609857253 seconds time elapsed
> >
> > Note that the difference is only this large when `--inline` is not
> > passed. In such situations, we would use the inliner cache and
> > thus do not run this code path that often.
> >
> > I think that this cache should actually be used in other places, too.
> > When looking at the valgrind leak report for perf report, we see tons
> > of srclines being leaked, most notably from calls to
> > hist_entry__get_srcline. The problem is that get_srcline has many
> > different formatting options (show_sym, show_addr, potentially even
> > unwind_inlines when calling __get_srcline directly). As such, the
> > srcline cannot easily be cached for all calls, or we'd have to add
> > caches for all formatting combinations (6 so far). An alternative
> > would be to remove the formatting options and handle that on a
> > different level - i.e. print the sym/addr on demand wherever we
> > actually output something. And the unwind_inlines could be moved into
> > a separate function that does not return the srcline.
>
> Agreed. Also I guess no need to unwind anymore to get a srcfile for
> an entry with your change.

Does this mean I should respin the patch series with the above changes
integrated? Or can we get this in first and then continue with the cleanup as
described above later on?

Thanks

--
Milian Wolff | [email protected] | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

2017-08-10 14:56:48

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v2 11/14] perf report: cache srclines for callchain nodes

On Thu, Aug 10, 2017 at 8:51 PM, Milian Wolff <[email protected]> wrote:
> On Donnerstag, 10. August 2017 04:13:25 CEST Namhyung Kim wrote:
>> Hi Milian,
>>
>> On Sun, Aug 06, 2017 at 11:24:43PM +0200, Milian Wolff wrote:
>> > On one hand this ensures that the memory is properly freed when
>> > the DSO gets freed. On the other hand this significantly speeds up
>> > the processing of the callchain nodes when lots of srclines are
>> > requested. For one of my data files e.g.:
>> >
>> > Before:
>> > Performance counter stats for 'perf report -s srcline -g srcline --
> stdio':
>> > 52496.495043 task-clock (msec) # 0.999 CPUs utilized
>> >
>> > 634 context-switches # 0.012 K/sec
>> >
>> > 2 cpu-migrations # 0.000 K/sec
>> >
>> > 191,561 page-faults # 0.004 M/sec
>> >
>> > 165,074,498,235 cycles # 3.144 GHz
>> > 334,170,832,408 instructions # 2.02 insn per
>> > cycle
>> >
>> > 90,220,029,745 branches # 1718.591 M/sec
>> >
>> > 654,525,177 branch-misses # 0.73% of all
>> > branches
>> >
>> > 52.533273822 seconds time elapsedProcessed 236605 events and lost 40
>> > chunks!>
>> > After:
>> > Performance counter stats for 'perf report -s srcline -g srcline --
> stdio':
>> > 22606.323706 task-clock (msec) # 1.000 CPUs utilized
>> >
>> > 31 context-switches # 0.001 K/sec
>> >
>> > 0 cpu-migrations # 0.000 K/sec
>> >
>> > 185,471 page-faults # 0.008 M/sec
>> >
>> > 71,188,113,681 cycles # 3.149 GHz
>> >
>> > 133,204,943,083 instructions # 1.87 insn per
>> > cycle
>> >
>> > 34,886,384,979 branches # 1543.214 M/sec
>> >
>> > 278,214,495 branch-misses # 0.80% of all
>> > branches
>> >
>> > 22.609857253 seconds time elapsed
>> >
>> > Note that the difference is only this large when `--inline` is not
>> > passed. In such situations, we would use the inliner cache and
>> > thus do not run this code path that often.
>> >
>> > I think that this cache should actually be used in other places, too.
>> > When looking at the valgrind leak report for perf report, we see tons
>> > of srclines being leaked, most notably from calls to
>> > hist_entry__get_srcline. The problem is that get_srcline has many
>> > different formatting options (show_sym, show_addr, potentially even
>> > unwind_inlines when calling __get_srcline directly). As such, the
>> > srcline cannot easily be cached for all calls, or we'd have to add
>> > caches for all formatting combinations (6 so far). An alternative
>> > would be to remove the formatting options and handle that on a
>> > different level - i.e. print the sym/addr on demand wherever we
>> > actually output something. And the unwind_inlines could be moved into
>> > a separate function that does not return the srcline.
>>
>> Agreed. Also I guess no need to unwind anymore to get a srcfile for
>> an entry with your change.
>
> Does this mean I should respin the patch series with the above changes
> integrated? Or can we get this in first and then continue with the cleanup as
> described above later on?

Nop, it can be done later IMHO. I will try to review the code next week.

Thanks,
Namhyung

2017-08-10 17:58:59

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v2 11/14] perf report: cache srclines for callchain nodes

Em Thu, Aug 10, 2017 at 11:56:24PM +0900, Namhyung Kim escreveu:
> On Thu, Aug 10, 2017 at 8:51 PM, Milian Wolff <[email protected]> wrote:
> > On Donnerstag, 10. August 2017 04:13:25 CEST Namhyung Kim wrote:
> >> Agreed. Also I guess no need to unwind anymore to get a srcfile for
> >> an entry with your change.

> > Does this mean I should respin the patch series with the above changes
> > integrated? Or can we get this in first and then continue with the cleanup as
> > described above later on?

> Nop, it can be done later IMHO. I will try to review the code next week.

Ok, I'm preparing a new batch to send to Ingo, will wait a bit more for
your review, ok, Millian?

- Arnaldo

2017-08-11 11:28:31

by Milian Wolff

[permalink] [raw]
Subject: Re: [PATCH v2 11/14] perf report: cache srclines for callchain nodes

On Donnerstag, 10. August 2017 19:58:45 CEST Arnaldo Carvalho de Melo wrote:
> Em Thu, Aug 10, 2017 at 11:56:24PM +0900, Namhyung Kim escreveu:
> > On Thu, Aug 10, 2017 at 8:51 PM, Milian Wolff <[email protected]>
wrote:
> > > On Donnerstag, 10. August 2017 04:13:25 CEST Namhyung Kim wrote:
> > >> Agreed. Also I guess no need to unwind anymore to get a srcfile for
> > >> an entry with your change.
> > >
> > > Does this mean I should respin the patch series with the above changes
> > > integrated? Or can we get this in first and then continue with the
> > > cleanup as described above later on?
> >
> > Nop, it can be done later IMHO. I will try to review the code next week.
>
> Ok, I'm preparing a new batch to send to Ingo, will wait a bit more for
> your review, ok, Millian?

Considering that this is a big and somewhat invasive patch series, I'm all for
being conservative on it. I.e. please take your time on reviewing it. I'm
pretty sure there are some things that need to be improved on, e.g. my usage
of the rbtree code. I'm a C++ dev, this C-style copy'n'paste programming for
containers is foreign to me.

Cheers

--
Milian Wolff | [email protected] | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

Subject: [tip:perf/core] perf util: Take elf_name as const string in dso__demangle_sym

Commit-ID: 80c345b255cbb4e9dfb193bf0bf5536217237f6a
Gitweb: http://git.kernel.org/tip/80c345b255cbb4e9dfb193bf0bf5536217237f6a
Author: Milian Wolff <[email protected]>
AuthorDate: Sun, 6 Aug 2017 23:24:34 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Fri, 11 Aug 2017 16:06:31 -0300

perf util: Take elf_name as const string in dso__demangle_sym

The input string is not modified and thus can be passed in as a pointer
to const data.

Signed-off-by: Milian Wolff <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/symbol-elf.c | 2 +-
tools/perf/util/symbol-minimal.c | 2 +-
tools/perf/util/symbol.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 502505c..7cf18f1 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -391,7 +391,7 @@ out_elf_end:
return 0;
}

-char *dso__demangle_sym(struct dso *dso, int kmodule, char *elf_name)
+char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name)
{
return demangle_sym(dso, kmodule, elf_name);
}
diff --git a/tools/perf/util/symbol-minimal.c b/tools/perf/util/symbol-minimal.c
index 40bf5d4..1a5aa35 100644
--- a/tools/perf/util/symbol-minimal.c
+++ b/tools/perf/util/symbol-minimal.c
@@ -377,7 +377,7 @@ void symbol__elf_init(void)

char *dso__demangle_sym(struct dso *dso __maybe_unused,
int kmodule __maybe_unused,
- char *elf_name __maybe_unused)
+ const char *elf_name __maybe_unused)
{
return NULL;
}
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 41ebba9..f0b0881 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -306,7 +306,7 @@ int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *syms_ss,
int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss,
struct map *map);

-char *dso__demangle_sym(struct dso *dso, int kmodule, char *elf_name);
+char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);

void __symbols__insert(struct rb_root *symbols, struct symbol *sym, bool kernel);
void symbols__insert(struct rb_root *symbols, struct symbol *sym);

Subject: [tip:perf/core] perf srcline: Do not consider empty files as valid srclines

Commit-ID: d964b1cdbd94f359f1f65f81440be84ceb45978e
Gitweb: http://git.kernel.org/tip/d964b1cdbd94f359f1f65f81440be84ceb45978e
Author: Milian Wolff <[email protected]>
AuthorDate: Sun, 6 Aug 2017 23:24:45 +0200
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Fri, 11 Aug 2017 16:06:31 -0300

perf srcline: Do not consider empty files as valid srclines

Sometimes we get a non-null, but empty, string for the filename from
bfd. This then results in srclines of the form ":0", which is different
from the canonical SRCLINE_UNKNOWN in the form "??:0". Set the file to
NULL if it is empty to fix this.

Signed-off-by: Milian Wolff <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Yao Jin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/srcline.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index ebc88a7..ed8e8d2 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -155,6 +155,9 @@ static void find_address_in_section(bfd *abfd, asection *section, void *data)
a2l->found = bfd_find_nearest_line(abfd, section, a2l->syms, pc - vma,
&a2l->filename, &a2l->funcname,
&a2l->line);
+
+ if (a2l->filename && !strlen(a2l->filename))
+ a2l->filename = NULL;
}

static struct a2l_data *addr2line_init(const char *path)
@@ -248,6 +251,9 @@ static int addr2line(const char *dso_name, u64 addr,
&a2l->funcname, &a2l->line) &&
cnt++ < MAX_INLINE_NEST) {

+ if (a2l->filename && !strlen(a2l->filename))
+ a2l->filename = NULL;
+
if (node != NULL) {
if (inline_list__append_dso_a2l(dso, node))
return 0;

2017-08-16 07:53:23

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v2 03/14] perf report: create real callchain entries for inlined frames

Hi Milian,

On Sun, Aug 06, 2017 at 11:24:35PM +0200, Milian Wolff wrote:
> The inlined frames use a fake symbol that is tracked in a special
> map inside the dso, which is always sorted by name. All other

It seems the above is not true. Fake symbols are maintained by
inline_node which in turn maintained by dso->inlines tree.


> entries of the symbol beside the function name are unused for
> inline frames. The advantage of this approach is that all existing
> users of the callchain API can now transparently display inlined
> frames without having to patch their code.
>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: David Ahern <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Yao Jin <[email protected]>
> Signed-off-by: Milian Wolff <[email protected]>
> ---
> tools/perf/util/callchain.c | 31 +++-----
> tools/perf/util/callchain.h | 6 +-
> tools/perf/util/dso.c | 2 +
> tools/perf/util/dso.h | 1 +
> tools/perf/util/machine.c | 56 +++++++++++++-
> tools/perf/util/srcline.c | 183 ++++++++++++++++++++++++++++++++++----------
> tools/perf/util/srcline.h | 19 ++++-
> tools/perf/util/symbol.h | 1 +
> 8 files changed, 230 insertions(+), 69 deletions(-)
>
> diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
> index f320b0777e0d..9854adb06ac1 100644
> --- a/tools/perf/util/callchain.c
> +++ b/tools/perf/util/callchain.c
> @@ -559,6 +559,7 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
> call->ip = cursor_node->ip;
> call->ms.sym = cursor_node->sym;
> call->ms.map = map__get(cursor_node->map);
> + call->srcline = cursor_node->srcline;
>
> if (cursor_node->branch) {
> call->branch_count = 1;
> @@ -640,20 +641,11 @@ enum match_result {
> static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
> struct callchain_list *cnode)
> {
> - char *left = NULL;
> - char *right = NULL;
> + const char *left = cnode->srcline;
> + const char *right = node->srcline;
> enum match_result ret = MATCH_EQ;
> int cmp;
>
> - if (cnode->ms.map)
> - left = get_srcline(cnode->ms.map->dso,
> - map__rip_2objdump(cnode->ms.map, cnode->ip),
> - cnode->ms.sym, true, false);
> - if (node->map)
> - right = get_srcline(node->map->dso,
> - map__rip_2objdump(node->map, node->ip),
> - node->sym, true, false);
> -
> if (left && right)
> cmp = strcmp(left, right);
> else if (!left && right)
> @@ -668,8 +660,6 @@ static enum match_result match_chain_srcline(struct callchain_cursor_node *node,
> if (cmp != 0)
> ret = cmp < 0 ? MATCH_LT : MATCH_GT;
>
> - free_srcline(left);
> - free_srcline(right);
> return ret;
> }
>
> @@ -958,7 +948,7 @@ merge_chain_branch(struct callchain_cursor *cursor,
> list_for_each_entry_safe(list, next_list, &src->val, list) {
> callchain_cursor_append(cursor, list->ip,
> list->ms.map, list->ms.sym,
> - false, NULL, 0, 0, 0);
> + false, NULL, 0, 0, 0, list->srcline);
> list_del(&list->list);
> map__zput(list->ms.map);
> free(list);
> @@ -998,7 +988,8 @@ int callchain_merge(struct callchain_cursor *cursor,
> int callchain_cursor_append(struct callchain_cursor *cursor,
> u64 ip, struct map *map, struct symbol *sym,
> bool branch, struct branch_flags *flags,
> - int nr_loop_iter, int samples, u64 branch_from)
> + int nr_loop_iter, int samples, u64 branch_from,
> + const char *srcline)
> {
> struct callchain_cursor_node *node = *cursor->last;
>
> @@ -1017,6 +1008,7 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
> node->branch = branch;
> node->nr_loop_iter = nr_loop_iter;
> node->samples = samples;
> + node->srcline = srcline;
>
> if (flags)
> memcpy(&node->branch_flags, flags,
> @@ -1104,12 +1096,7 @@ char *callchain_list__sym_name(struct callchain_list *cl,
> int printed;
>
> if (cl->ms.sym) {
> - if (show_srcline && cl->ms.map && !cl->srcline)
> - cl->srcline = get_srcline(cl->ms.map->dso,
> - map__rip_2objdump(cl->ms.map,
> - cl->ip),
> - cl->ms.sym, false, show_addr);
> - if (cl->srcline)
> + if (show_srcline && cl->srcline)
> printed = scnprintf(bf, bfsize, "%s %s",
> cl->ms.sym->name, cl->srcline);
> else
> @@ -1524,7 +1511,7 @@ int callchain_cursor__copy(struct callchain_cursor *dst,
> rc = callchain_cursor_append(dst, node->ip, node->map, node->sym,
> node->branch, &node->branch_flags,
> node->nr_loop_iter, node->samples,
> - node->branch_from);
> + node->branch_from, node->srcline);
> if (rc)
> break;
>
> diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
> index 97738201464a..bf81b56f34c3 100644
> --- a/tools/perf/util/callchain.h
> +++ b/tools/perf/util/callchain.h
> @@ -121,7 +121,7 @@ struct callchain_list {
> u64 iter_count;
> u64 samples_count;
> struct branch_type_stat brtype_stat;
> - char *srcline;
> + const char *srcline;
> struct list_head list;
> };
>
> @@ -135,6 +135,7 @@ struct callchain_cursor_node {
> u64 ip;
> struct map *map;
> struct symbol *sym;
> + const char *srcline;
> bool branch;
> struct branch_flags branch_flags;
> u64 branch_from;
> @@ -201,7 +202,8 @@ static inline void callchain_cursor_reset(struct callchain_cursor *cursor)
> int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
> struct map *map, struct symbol *sym,
> bool branch, struct branch_flags *flags,
> - int nr_loop_iter, int samples, u64 branch_from);
> + int nr_loop_iter, int samples, u64 branch_from,
> + const char *srcline);
>
> /* Close a cursor writing session. Initialize for the reader */
> static inline void callchain_cursor_commit(struct callchain_cursor *cursor)

I think it'd be better splitting srcline change into a separate
commit.


> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> index b9e087fb8247..72e6e390fd26 100644
> --- a/tools/perf/util/dso.c
> +++ b/tools/perf/util/dso.c
> @@ -9,6 +9,7 @@
> #include "compress.h"
> #include "path.h"
> #include "symbol.h"
> +#include "srcline.h"
> #include "dso.h"
> #include "machine.h"
> #include "auxtrace.h"
> @@ -1233,6 +1234,7 @@ void dso__delete(struct dso *dso)
> dso->long_name);
> for (i = 0; i < MAP__NR_TYPES; ++i)
> symbols__delete(&dso->symbols[i]);
> + inlines__tree_delete(&dso->inlined_nodes);

Hmm.. inline_node is released after symbol but it seems to have a
problem. Please see below.

>
> if (dso->short_name_allocated) {
> zfree((char **)&dso->short_name);
> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> index f886141678eb..7d1e2b3c1f10 100644
> --- a/tools/perf/util/dso.h
> +++ b/tools/perf/util/dso.h
> @@ -141,6 +141,7 @@ struct dso {
> struct rb_root *root; /* root of rbtree that rb_node is in */
> struct rb_root symbols[MAP__NR_TYPES];
> struct rb_root symbol_names[MAP__NR_TYPES];
> + struct rb_root inlined_nodes;
> struct {
> u64 addr;
> struct symbol *symbol;

[SNIP]
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index d4df353051af..a7f8499c8756 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
> index ebc88a74e67b..a1fdf035d1dd 100644
> --- a/tools/perf/util/srcline.c
> +++ b/tools/perf/util/srcline.c
> @@ -33,28 +33,17 @@ static const char *dso__name(struct dso *dso)
> return dso_name;
> }
>
> -static int inline_list__append(char *filename, char *funcname, int line_nr,
> - struct inline_node *node, struct dso *dso)
> +static int inline_list__append(struct symbol *symbol, char *srcline,
> + struct inline_node *node)
> {
> struct inline_list *ilist;
> - char *demangled;
>
> ilist = zalloc(sizeof(*ilist));
> if (ilist == NULL)
> return -1;
>
> - ilist->filename = filename;
> - ilist->line_nr = line_nr;
> -
> - if (dso != NULL) {
> - demangled = dso__demangle_sym(dso, 0, funcname);
> - if (demangled == NULL) {
> - ilist->funcname = funcname;
> - } else {
> - ilist->funcname = demangled;
> - free(funcname);
> - }
> - }
> + ilist->symbol = symbol;
> + ilist->srcline = srcline;
>
> if (callchain_param.order == ORDER_CALLEE)
> list_add_tail(&ilist->list, &node->val);
> @@ -64,6 +53,30 @@ static int inline_list__append(char *filename, char *funcname, int line_nr,
> return 0;
> }
>
> +// basename version that takes a const input string
> +static const char *gnu_basename(const char *path)
> +{
> + const char *base = strrchr(path, '/');
> +
> + return base ? base + 1 : path;
> +}
> +
> +static char *srcline_from_fileline(const char *file, unsigned int line)
> +{
> + char *srcline;
> +
> + if (!file)
> + return NULL;
> +
> + if (!srcline_full_filename)
> + file = gnu_basename(file);
> +
> + if (asprintf(&srcline, "%s:%u", file, line) < 0)
> + return NULL;
> +
> + return srcline;
> +}
> +
> #ifdef HAVE_LIBBFD_SUPPORT
>
> /*
> @@ -203,19 +216,55 @@ static void addr2line_cleanup(struct a2l_data *a2l)
>
> #define MAX_INLINE_NEST 1024
>
> +static struct symbol *new_inline_sym(struct dso *dso,
> + struct symbol *base_sym,
> + const char *funcname)
> +{
> + struct symbol *inline_sym;
> + char *demangled = NULL;
> +
> + if (dso) {
> + demangled = dso__demangle_sym(dso, 0, funcname);
> + if (demangled)
> + funcname = demangled;
> + }
> +
> + if (strcmp(funcname, base_sym->name) == 0) {
> + // reuse the real, existing symbol
> + inline_sym = base_sym;

So inline_node could refer the existing symbol.


> + } else {
> + // create a fake symbol for the inline frame
> + inline_sym = symbol__new(base_sym ? base_sym->start : 0,
> + base_sym ? base_sym->end : 0,
> + base_sym ? base_sym->binding : 0,
> + funcname);
> + if (inline_sym)
> + inline_sym->inlined = 1;
> + }
> +
> + free(demangled);
> +
> + return inline_sym;
> +}
> +
> static int inline_list__append_dso_a2l(struct dso *dso,
> - struct inline_node *node)
> + struct inline_node *node,
> + struct symbol *sym)
> {
> struct a2l_data *a2l = dso->a2l;
> - char *funcname = a2l->funcname ? strdup(a2l->funcname) : NULL;
> - char *filename = a2l->filename ? strdup(a2l->filename) : NULL;
> + struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
> + char *srcline = NULL;
> +
> + if (a2l->filename)
> + srcline = srcline_from_fileline(a2l->filename, a2l->line);
>
> - return inline_list__append(filename, funcname, a2l->line, node, dso);
> + return inline_list__append(inline_sym, srcline, node);
> }
>

[SNIP]
> @@ -511,10 +563,63 @@ void inline_node__delete(struct inline_node *node)
>
> list_for_each_entry_safe(ilist, tmp, &node->val, list) {
> list_del_init(&ilist->list);
> - zfree(&ilist->filename);
> - zfree(&ilist->funcname);
> + zfree(&ilist->srcline);
> + // only the inlined symbols are owned by the list
> + if (ilist->symbol && ilist->symbol->inlined)
> + symbol__delete(ilist->symbol);

Existing symbols are released at this moment.

Thanks,
Namhyung


> free(ilist);
> }
>
> free(node);
> }
> +
> +void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines)
> +{
> + struct rb_node **p = &tree->rb_node;
> + struct rb_node *parent = NULL;
> + const u64 addr = inlines->addr;
> + struct inline_node *i;
> +
> + while (*p != NULL) {
> + parent = *p;
> + i = rb_entry(parent, struct inline_node, rb_node);
> + if (addr < i->addr)
> + p = &(*p)->rb_left;
> + else
> + p = &(*p)->rb_right;
> + }
> + rb_link_node(&inlines->rb_node, parent, p);
> + rb_insert_color(&inlines->rb_node, tree);
> +}
> +
> +struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr)
> +{
> + struct rb_node *n = tree->rb_node;
> +
> + while (n) {
> + struct inline_node *i = rb_entry(n, struct inline_node,
> + rb_node);
> +
> + if (addr < i->addr)
> + n = n->rb_left;
> + else if (addr > i->addr)
> + n = n->rb_right;
> + else
> + return i;
> + }
> +
> + return NULL;
> +}
> +
> +void inlines__tree_delete(struct rb_root *tree)
> +{
> + struct inline_node *pos;
> + struct rb_node *next = rb_first(tree);
> +
> + while (next) {
> + pos = rb_entry(next, struct inline_node, rb_node);
> + next = rb_next(&pos->rb_node);
> + rb_erase(&pos->rb_node, tree);
> + inline_node__delete(pos);
> + }
> +}
> diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
> index 7b52ba88676e..0d2aca92e8c7 100644
> --- a/tools/perf/util/srcline.h
> +++ b/tools/perf/util/srcline.h
> @@ -2,6 +2,7 @@
> #define PERF_SRCLINE_H
>
> #include <linux/list.h>
> +#include <linux/rbtree.h>
> #include <linux/types.h>
>
> struct dso;
> @@ -17,18 +18,28 @@ void free_srcline(char *srcline);
> #define SRCLINE_UNKNOWN ((char *) "??:0")
>
> struct inline_list {
> - char *filename;
> - char *funcname;
> - unsigned int line_nr;
> + struct symbol *symbol;
> + char *srcline;
> struct list_head list;
> };
>
> struct inline_node {
> u64 addr;
> struct list_head val;
> + struct rb_node rb_node;
> };
>
> -struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr);
> +// parse inlined frames for the given address
> +struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
> + struct symbol *sym);
> +// free resources associated to the inline node list
> void inline_node__delete(struct inline_node *node);
>
> +// insert the inline node list into the DSO, which will take ownership
> +void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines);
> +// find previously inserted inline node list
> +struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr);
> +// delete all nodes within the tree of inline_node s
> +void inlines__tree_delete(struct rb_root *tree);
> +
> #endif /* PERF_SRCLINE_H */
> diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> index f0b08810d7fa..b358570ce615 100644
> --- a/tools/perf/util/symbol.h
> +++ b/tools/perf/util/symbol.h
> @@ -59,6 +59,7 @@ struct symbol {
> u8 binding;
> u8 idle:1;
> u8 ignore:1;
> + u8 inlined:1;
> u8 arch_sym;
> char name[0];
> };
> --
> 2.13.3
>

2017-08-20 20:57:22

by Milian Wolff

[permalink] [raw]
Subject: Re: [PATCH v2 03/14] perf report: create real callchain entries for inlined frames

On Mittwoch, 16. August 2017 09:53:13 CEST Namhyung Kim wrote:
> Hi Milian,
>
> On Sun, Aug 06, 2017 at 11:24:35PM +0200, Milian Wolff wrote:
> > The inlined frames use a fake symbol that is tracked in a special
> > map inside the dso, which is always sorted by name. All other
>
> It seems the above is not true. Fake symbols are maintained by
> inline_node which in turn maintained by dso->inlines tree.

OK, I'll rephrase this to make it more explicit. But what I call "map" is what
you call "tree", no?

<snip>

> > @@ -201,7 +202,8 @@ static inline void callchain_cursor_reset(struct
> > callchain_cursor *cursor)>
> > int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
> >
> > struct map *map, struct symbol *sym,
> > bool branch, struct branch_flags *flags,
> >
> > - int nr_loop_iter, int samples, u64 branch_from);
> > + int nr_loop_iter, int samples, u64 branch_from,
> > + const char *srcline);
> >
> > /* Close a cursor writing session. Initialize for the reader */
> > static inline void callchain_cursor_commit(struct callchain_cursor
> > *cursor)
>
> I think it'd be better splitting srcline change into a separate
> commit.

OK, I'll try to see how this goes. It would simply add the boiler plate code
to pass the srcline around, and then set it from within the callchain code,
right?

> > diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> > index b9e087fb8247..72e6e390fd26 100644
> > --- a/tools/perf/util/dso.c
> > +++ b/tools/perf/util/dso.c
> > @@ -9,6 +9,7 @@
> >
> > #include "compress.h"
> > #include "path.h"
> > #include "symbol.h"
> >
> > +#include "srcline.h"
> >
> > #include "dso.h"
> > #include "machine.h"
> > #include "auxtrace.h"
> >
> > @@ -1233,6 +1234,7 @@ void dso__delete(struct dso *dso)
> >
> > dso->long_name);
> >
> > for (i = 0; i < MAP__NR_TYPES; ++i)
> >
> > symbols__delete(&dso->symbols[i]);
> >
> > + inlines__tree_delete(&dso->inlined_nodes);
>
> Hmm.. inline_node is released after symbol but it seems to have a
> problem. Please see below.
>
> > if (dso->short_name_allocated) {
> >
> > zfree((char **)&dso->short_name);
> >
> > diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> > index f886141678eb..7d1e2b3c1f10 100644
> > --- a/tools/perf/util/dso.h
> > +++ b/tools/perf/util/dso.h
> > @@ -141,6 +141,7 @@ struct dso {
> >
> > struct rb_root *root; /* root of rbtree that rb_node is in
*/
> > struct rb_root symbols[MAP__NR_TYPES];
> > struct rb_root symbol_names[MAP__NR_TYPES];
> >
> > + struct rb_root inlined_nodes;
> >
> > struct {
> >
> > u64 addr;
> > struct symbol *symbol;
>
> [SNIP]
>
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index d4df353051af..a7f8499c8756 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
> > index ebc88a74e67b..a1fdf035d1dd 100644
> > --- a/tools/perf/util/srcline.c
> > +++ b/tools/perf/util/srcline.c
> > @@ -33,28 +33,17 @@ static const char *dso__name(struct dso *dso)
> >
> > return dso_name;
> >
> > }
> >
> > -static int inline_list__append(char *filename, char *funcname, int
> > line_nr, - struct inline_node *node, struct dso *dso)
> > +static int inline_list__append(struct symbol *symbol, char *srcline,
> > + struct inline_node *node)
> >
> > {
> >
> > struct inline_list *ilist;
> >
> > - char *demangled;
> >
> > ilist = zalloc(sizeof(*ilist));
> > if (ilist == NULL)
> >
> > return -1;
> >
> > - ilist->filename = filename;
> > - ilist->line_nr = line_nr;
> > -
> > - if (dso != NULL) {
> > - demangled = dso__demangle_sym(dso, 0, funcname);
> > - if (demangled == NULL) {
> > - ilist->funcname = funcname;
> > - } else {
> > - ilist->funcname = demangled;
> > - free(funcname);
> > - }
> > - }
> > + ilist->symbol = symbol;
> > + ilist->srcline = srcline;
> >
> > if (callchain_param.order == ORDER_CALLEE)
> >
> > list_add_tail(&ilist->list, &node->val);
> >
> > @@ -64,6 +53,30 @@ static int inline_list__append(char *filename, char
> > *funcname, int line_nr,>
> > return 0;
> >
> > }
> >
> > +// basename version that takes a const input string
> > +static const char *gnu_basename(const char *path)
> > +{
> > + const char *base = strrchr(path, '/');
> > +
> > + return base ? base + 1 : path;
> > +}
> > +
> > +static char *srcline_from_fileline(const char *file, unsigned int line)
> > +{
> > + char *srcline;
> > +
> > + if (!file)
> > + return NULL;
> > +
> > + if (!srcline_full_filename)
> > + file = gnu_basename(file);
> > +
> > + if (asprintf(&srcline, "%s:%u", file, line) < 0)
> > + return NULL;
> > +
> > + return srcline;
> > +}
> > +
> >
> > #ifdef HAVE_LIBBFD_SUPPORT
> >
> > /*
> >
> > @@ -203,19 +216,55 @@ static void addr2line_cleanup(struct a2l_data *a2l)
> >
> > #define MAX_INLINE_NEST 1024
> >
> > +static struct symbol *new_inline_sym(struct dso *dso,
> > + struct symbol *base_sym,
> > + const char *funcname)
> > +{
> > + struct symbol *inline_sym;
> > + char *demangled = NULL;
> > +
> > + if (dso) {
> > + demangled = dso__demangle_sym(dso, 0, funcname);
> > + if (demangled)
> > + funcname = demangled;
> > + }
> > +
> > + if (strcmp(funcname, base_sym->name) == 0) {
> > + // reuse the real, existing symbol
> > + inline_sym = base_sym;
>
> So inline_node could refer the existing symbol.

Yes, that does indeed seem to be an issue. I wonder why I'm not seeing any
crashes or valgrind/ASAN reports though. I'll have to dig into this.

> > + } else {
> > + // create a fake symbol for the inline frame
> > + inline_sym = symbol__new(base_sym ? base_sym->start : 0,
> > + base_sym ? base_sym->end : 0,
> > + base_sym ? base_sym->binding : 0,
> > + funcname);
> > + if (inline_sym)
> > + inline_sym->inlined = 1;
> > + }
> > +
> > + free(demangled);
> > +
> > + return inline_sym;
> > +}
> > +
> >
> > static int inline_list__append_dso_a2l(struct dso *dso,
> >
> > - struct inline_node *node)
> > + struct inline_node *node,
> > + struct symbol *sym)
> >
> > {
> >
> > struct a2l_data *a2l = dso->a2l;
> >
> > - char *funcname = a2l->funcname ? strdup(a2l->funcname) : NULL;
> > - char *filename = a2l->filename ? strdup(a2l->filename) : NULL;
> > + struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
> > + char *srcline = NULL;
> > +
> > + if (a2l->filename)
> > + srcline = srcline_from_fileline(a2l->filename, a2l->line);
> >
> > - return inline_list__append(filename, funcname, a2l->line, node, dso);
> > + return inline_list__append(inline_sym, srcline, node);
> >
> > }
>
> [SNIP]
>
> > @@ -511,10 +563,63 @@ void inline_node__delete(struct inline_node *node)
> >
> > list_for_each_entry_safe(ilist, tmp, &node->val, list) {
> >
> > list_del_init(&ilist->list);
> >
> > - zfree(&ilist->filename);
> > - zfree(&ilist->funcname);
> > + zfree(&ilist->srcline);
> > + // only the inlined symbols are owned by the list
> > + if (ilist->symbol && ilist->symbol->inlined)
> > + symbol__delete(ilist->symbol);
>
> Existing symbols are released at this moment.

Thanks for the review, I'll try to look into these issues once I have more
time again.

Cheers

--
Milian Wolff | [email protected] | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

2017-08-28 12:19:08

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v2 03/14] perf report: create real callchain entries for inlined frames

Hi Milian,

Sorry for late reply, I was on vacation.


On Mon, Aug 21, 2017 at 5:57 AM, Milian Wolff <[email protected]> wrote:
> On Mittwoch, 16. August 2017 09:53:13 CEST Namhyung Kim wrote:
>> Hi Milian,
>>
>> On Sun, Aug 06, 2017 at 11:24:35PM +0200, Milian Wolff wrote:
>> > The inlined frames use a fake symbol that is tracked in a special
>> > map inside the dso, which is always sorted by name. All other
>>
>> It seems the above is not true. Fake symbols are maintained by
>> inline_node which in turn maintained by dso->inlines tree.
>
> OK, I'll rephrase this to make it more explicit. But what I call "map" is what
> you call "tree", no?

Right, but as we have "map" in perf code base, I think it's better to
use "tree".

>
> <snip>
>
>> > @@ -201,7 +202,8 @@ static inline void callchain_cursor_reset(struct
>> > callchain_cursor *cursor)>
>> > int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
>> >
>> > struct map *map, struct symbol *sym,
>> > bool branch, struct branch_flags *flags,
>> >
>> > - int nr_loop_iter, int samples, u64 branch_from);
>> > + int nr_loop_iter, int samples, u64 branch_from,
>> > + const char *srcline);
>> >
>> > /* Close a cursor writing session. Initialize for the reader */
>> > static inline void callchain_cursor_commit(struct callchain_cursor
>> > *cursor)
>>
>> I think it'd be better splitting srcline change into a separate
>> commit.
>
> OK, I'll try to see how this goes. It would simply add the boiler plate code
> to pass the srcline around, and then set it from within the callchain code,
> right?

Yes. It'd help reviewers to concentrate on the logic IMHO.

Thanks,
Namhyung


>
>> > diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
>> > index b9e087fb8247..72e6e390fd26 100644
>> > --- a/tools/perf/util/dso.c
>> > +++ b/tools/perf/util/dso.c
>> > @@ -9,6 +9,7 @@
>> >
>> > #include "compress.h"
>> > #include "path.h"
>> > #include "symbol.h"
>> >
>> > +#include "srcline.h"
>> >
>> > #include "dso.h"
>> > #include "machine.h"
>> > #include "auxtrace.h"
>> >
>> > @@ -1233,6 +1234,7 @@ void dso__delete(struct dso *dso)
>> >
>> > dso->long_name);
>> >
>> > for (i = 0; i < MAP__NR_TYPES; ++i)
>> >
>> > symbols__delete(&dso->symbols[i]);
>> >
>> > + inlines__tree_delete(&dso->inlined_nodes);
>>
>> Hmm.. inline_node is released after symbol but it seems to have a
>> problem. Please see below.
>>
>> > if (dso->short_name_allocated) {
>> >
>> > zfree((char **)&dso->short_name);
>> >
>> > diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
>> > index f886141678eb..7d1e2b3c1f10 100644
>> > --- a/tools/perf/util/dso.h
>> > +++ b/tools/perf/util/dso.h
>> > @@ -141,6 +141,7 @@ struct dso {
>> >
>> > struct rb_root *root; /* root of rbtree that rb_node is in
> */
>> > struct rb_root symbols[MAP__NR_TYPES];
>> > struct rb_root symbol_names[MAP__NR_TYPES];
>> >
>> > + struct rb_root inlined_nodes;
>> >
>> > struct {
>> >
>> > u64 addr;
>> > struct symbol *symbol;
>>
>> [SNIP]
>>
>> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>> > index d4df353051af..a7f8499c8756 100644
>> > --- a/tools/perf/util/machine.c
>> > +++ b/tools/perf/util/machine.c
>> > diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
>> > index ebc88a74e67b..a1fdf035d1dd 100644
>> > --- a/tools/perf/util/srcline.c
>> > +++ b/tools/perf/util/srcline.c
>> > @@ -33,28 +33,17 @@ static const char *dso__name(struct dso *dso)
>> >
>> > return dso_name;
>> >
>> > }
>> >
>> > -static int inline_list__append(char *filename, char *funcname, int
>> > line_nr, - struct inline_node *node, struct dso *dso)
>> > +static int inline_list__append(struct symbol *symbol, char *srcline,
>> > + struct inline_node *node)
>> >
>> > {
>> >
>> > struct inline_list *ilist;
>> >
>> > - char *demangled;
>> >
>> > ilist = zalloc(sizeof(*ilist));
>> > if (ilist == NULL)
>> >
>> > return -1;
>> >
>> > - ilist->filename = filename;
>> > - ilist->line_nr = line_nr;
>> > -
>> > - if (dso != NULL) {
>> > - demangled = dso__demangle_sym(dso, 0, funcname);
>> > - if (demangled == NULL) {
>> > - ilist->funcname = funcname;
>> > - } else {
>> > - ilist->funcname = demangled;
>> > - free(funcname);
>> > - }
>> > - }
>> > + ilist->symbol = symbol;
>> > + ilist->srcline = srcline;
>> >
>> > if (callchain_param.order == ORDER_CALLEE)
>> >
>> > list_add_tail(&ilist->list, &node->val);
>> >
>> > @@ -64,6 +53,30 @@ static int inline_list__append(char *filename, char
>> > *funcname, int line_nr,>
>> > return 0;
>> >
>> > }
>> >
>> > +// basename version that takes a const input string
>> > +static const char *gnu_basename(const char *path)
>> > +{
>> > + const char *base = strrchr(path, '/');
>> > +
>> > + return base ? base + 1 : path;
>> > +}
>> > +
>> > +static char *srcline_from_fileline(const char *file, unsigned int line)
>> > +{
>> > + char *srcline;
>> > +
>> > + if (!file)
>> > + return NULL;
>> > +
>> > + if (!srcline_full_filename)
>> > + file = gnu_basename(file);
>> > +
>> > + if (asprintf(&srcline, "%s:%u", file, line) < 0)
>> > + return NULL;
>> > +
>> > + return srcline;
>> > +}
>> > +
>> >
>> > #ifdef HAVE_LIBBFD_SUPPORT
>> >
>> > /*
>> >
>> > @@ -203,19 +216,55 @@ static void addr2line_cleanup(struct a2l_data *a2l)
>> >
>> > #define MAX_INLINE_NEST 1024
>> >
>> > +static struct symbol *new_inline_sym(struct dso *dso,
>> > + struct symbol *base_sym,
>> > + const char *funcname)
>> > +{
>> > + struct symbol *inline_sym;
>> > + char *demangled = NULL;
>> > +
>> > + if (dso) {
>> > + demangled = dso__demangle_sym(dso, 0, funcname);
>> > + if (demangled)
>> > + funcname = demangled;
>> > + }
>> > +
>> > + if (strcmp(funcname, base_sym->name) == 0) {
>> > + // reuse the real, existing symbol
>> > + inline_sym = base_sym;
>>
>> So inline_node could refer the existing symbol.
>
> Yes, that does indeed seem to be an issue. I wonder why I'm not seeing any
> crashes or valgrind/ASAN reports though. I'll have to dig into this.
>
>> > + } else {
>> > + // create a fake symbol for the inline frame
>> > + inline_sym = symbol__new(base_sym ? base_sym->start : 0,
>> > + base_sym ? base_sym->end : 0,
>> > + base_sym ? base_sym->binding : 0,
>> > + funcname);
>> > + if (inline_sym)
>> > + inline_sym->inlined = 1;
>> > + }
>> > +
>> > + free(demangled);
>> > +
>> > + return inline_sym;
>> > +}
>> > +
>> >
>> > static int inline_list__append_dso_a2l(struct dso *dso,
>> >
>> > - struct inline_node *node)
>> > + struct inline_node *node,
>> > + struct symbol *sym)
>> >
>> > {
>> >
>> > struct a2l_data *a2l = dso->a2l;
>> >
>> > - char *funcname = a2l->funcname ? strdup(a2l->funcname) : NULL;
>> > - char *filename = a2l->filename ? strdup(a2l->filename) : NULL;
>> > + struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
>> > + char *srcline = NULL;
>> > +
>> > + if (a2l->filename)
>> > + srcline = srcline_from_fileline(a2l->filename, a2l->line);
>> >
>> > - return inline_list__append(filename, funcname, a2l->line, node, dso);
>> > + return inline_list__append(inline_sym, srcline, node);
>> >
>> > }
>>
>> [SNIP]
>>
>> > @@ -511,10 +563,63 @@ void inline_node__delete(struct inline_node *node)
>> >
>> > list_for_each_entry_safe(ilist, tmp, &node->val, list) {
>> >
>> > list_del_init(&ilist->list);
>> >
>> > - zfree(&ilist->filename);
>> > - zfree(&ilist->funcname);
>> > + zfree(&ilist->srcline);
>> > + // only the inlined symbols are owned by the list
>> > + if (ilist->symbol && ilist->symbol->inlined)
>> > + symbol__delete(ilist->symbol);
>>
>> Existing symbols are released at this moment.
>
> Thanks for the review, I'll try to look into these issues once I have more
> time again.
>
> Cheers
>
> --
> Milian Wolff | [email protected] | Senior Software Engineer
> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts



--
Thanks,
Namhyung

2017-09-06 13:13:40

by Milian Wolff

[permalink] [raw]
Subject: Re: [PATCH v2 03/14] perf report: create real callchain entries for inlined frames

On Sonntag, 20. August 2017 22:57:17 CEST Milian Wolff wrote:
> On Mittwoch, 16. August 2017 09:53:13 CEST Namhyung Kim wrote:
> > Hi Milian,
> >
> > On Sun, Aug 06, 2017 at 11:24:35PM +0200, Milian Wolff wrote:
> > > The inlined frames use a fake symbol that is tracked in a special
> > > map inside the dso, which is always sorted by name. All other
> >
> > It seems the above is not true. Fake symbols are maintained by
> > inline_node which in turn maintained by dso->inlines tree.
>
> OK, I'll rephrase this to make it more explicit. But what I call "map" is
> what you call "tree", no?
>
> <snip>
>
> > > @@ -201,7 +202,8 @@ static inline void callchain_cursor_reset(struct
> > > callchain_cursor *cursor)>
> > >
> > > int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
> > >
> > > struct map *map, struct symbol *sym,
> > > bool branch, struct branch_flags *flags,
> > >
> > > - int nr_loop_iter, int samples, u64 branch_from);
> > > + int nr_loop_iter, int samples, u64 branch_from,
> > > + const char *srcline);
> > >
> > > /* Close a cursor writing session. Initialize for the reader */
> > > static inline void callchain_cursor_commit(struct callchain_cursor
> > > *cursor)
> >
> > I think it'd be better splitting srcline change into a separate
> > commit.
>
> OK, I'll try to see how this goes. It would simply add the boiler plate code
> to pass the srcline around, and then set it from within the callchain code,
> right?
>
> > > diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> > > index b9e087fb8247..72e6e390fd26 100644
> > > --- a/tools/perf/util/dso.c
> > > +++ b/tools/perf/util/dso.c
> > > @@ -9,6 +9,7 @@
> > >
> > > #include "compress.h"
> > > #include "path.h"
> > > #include "symbol.h"
> > >
> > > +#include "srcline.h"
> > >
> > > #include "dso.h"
> > > #include "machine.h"
> > > #include "auxtrace.h"
> > >
> > > @@ -1233,6 +1234,7 @@ void dso__delete(struct dso *dso)
> > >
> > > dso->long_name);
> > >
> > > for (i = 0; i < MAP__NR_TYPES; ++i)
> > >
> > > symbols__delete(&dso->symbols[i]);
> > >
> > > + inlines__tree_delete(&dso->inlined_nodes);
> >
> > Hmm.. inline_node is released after symbol but it seems to have a
> > problem. Please see below.
> >
> > > if (dso->short_name_allocated) {
> > >
> > > zfree((char **)&dso->short_name);
> > >
> > > diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> > > index f886141678eb..7d1e2b3c1f10 100644
> > > --- a/tools/perf/util/dso.h
> > > +++ b/tools/perf/util/dso.h
> > > @@ -141,6 +141,7 @@ struct dso {
> > >
> > > struct rb_root *root; /* root of rbtree that rb_node is in
>
> */
>
> > > struct rb_root symbols[MAP__NR_TYPES];
> > > struct rb_root symbol_names[MAP__NR_TYPES];
> > >
> > > + struct rb_root inlined_nodes;
> > >
> > > struct {
> > >
> > > u64 addr;
> > > struct symbol *symbol;
> >
> > [SNIP]
> >
> > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > index d4df353051af..a7f8499c8756 100644
> > > --- a/tools/perf/util/machine.c
> > > +++ b/tools/perf/util/machine.c
> > > diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
> > > index ebc88a74e67b..a1fdf035d1dd 100644
> > > --- a/tools/perf/util/srcline.c
> > > +++ b/tools/perf/util/srcline.c
> > > @@ -33,28 +33,17 @@ static const char *dso__name(struct dso *dso)
> > >
> > > return dso_name;
> > >
> > > }
> > >
> > > -static int inline_list__append(char *filename, char *funcname, int
> > > line_nr, - struct inline_node *node, struct dso *dso)
> > > +static int inline_list__append(struct symbol *symbol, char *srcline,
> > > + struct inline_node *node)
> > >
> > > {
> > >
> > > struct inline_list *ilist;
> > >
> > > - char *demangled;
> > >
> > > ilist = zalloc(sizeof(*ilist));
> > > if (ilist == NULL)
> > >
> > > return -1;
> > >
> > > - ilist->filename = filename;
> > > - ilist->line_nr = line_nr;
> > > -
> > > - if (dso != NULL) {
> > > - demangled = dso__demangle_sym(dso, 0, funcname);
> > > - if (demangled == NULL) {
> > > - ilist->funcname = funcname;
> > > - } else {
> > > - ilist->funcname = demangled;
> > > - free(funcname);
> > > - }
> > > - }
> > > + ilist->symbol = symbol;
> > > + ilist->srcline = srcline;
> > >
> > > if (callchain_param.order == ORDER_CALLEE)
> > >
> > > list_add_tail(&ilist->list, &node->val);
> > >
> > > @@ -64,6 +53,30 @@ static int inline_list__append(char *filename, char
> > > *funcname, int line_nr,>
> > >
> > > return 0;
> > >
> > > }
> > >
> > > +// basename version that takes a const input string
> > > +static const char *gnu_basename(const char *path)
> > > +{
> > > + const char *base = strrchr(path, '/');
> > > +
> > > + return base ? base + 1 : path;
> > > +}
> > > +
> > > +static char *srcline_from_fileline(const char *file, unsigned int line)
> > > +{
> > > + char *srcline;
> > > +
> > > + if (!file)
> > > + return NULL;
> > > +
> > > + if (!srcline_full_filename)
> > > + file = gnu_basename(file);
> > > +
> > > + if (asprintf(&srcline, "%s:%u", file, line) < 0)
> > > + return NULL;
> > > +
> > > + return srcline;
> > > +}
> > > +
> > >
> > > #ifdef HAVE_LIBBFD_SUPPORT
> > >
> > > /*
> > >
> > > @@ -203,19 +216,55 @@ static void addr2line_cleanup(struct a2l_data
> > > *a2l)
> > >
> > > #define MAX_INLINE_NEST 1024
> > >
> > > +static struct symbol *new_inline_sym(struct dso *dso,
> > > + struct symbol *base_sym,
> > > + const char *funcname)
> > > +{
> > > + struct symbol *inline_sym;
> > > + char *demangled = NULL;
> > > +
> > > + if (dso) {
> > > + demangled = dso__demangle_sym(dso, 0, funcname);
> > > + if (demangled)
> > > + funcname = demangled;
> > > + }
> > > +
> > > + if (strcmp(funcname, base_sym->name) == 0) {
> > > + // reuse the real, existing symbol
> > > + inline_sym = base_sym;
> >
> > So inline_node could refer the existing symbol.
>
> Yes, that does indeed seem to be an issue. I wonder why I'm not seeing any
> crashes or valgrind/ASAN reports though. I'll have to dig into this.
>
> > > + } else {
> > > + // create a fake symbol for the inline frame
> > > + inline_sym = symbol__new(base_sym ? base_sym->start : 0,
> > > + base_sym ? base_sym->end : 0,
> > > + base_sym ? base_sym->binding : 0,
> > > + funcname);
> > > + if (inline_sym)
> > > + inline_sym->inlined = 1;
> > > + }
> > > +
> > > + free(demangled);
> > > +
> > > + return inline_sym;
> > > +}
> > > +
> > >
> > > static int inline_list__append_dso_a2l(struct dso *dso,
> > >
> > > - struct inline_node *node)
> > > + struct inline_node *node,
> > > + struct symbol *sym)
> > >
> > > {
> > >
> > > struct a2l_data *a2l = dso->a2l;
> > >
> > > - char *funcname = a2l->funcname ? strdup(a2l->funcname) : NULL;
> > > - char *filename = a2l->filename ? strdup(a2l->filename) : NULL;
> > > + struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
> > > + char *srcline = NULL;
> > > +
> > > + if (a2l->filename)
> > > + srcline = srcline_from_fileline(a2l->filename, a2l->line);
> > >
> > > - return inline_list__append(filename, funcname, a2l->line, node, dso);
> > > + return inline_list__append(inline_sym, srcline, node);
> > >
> > > }
> >
> > [SNIP]
> >
> > > @@ -511,10 +563,63 @@ void inline_node__delete(struct inline_node *node)
> > >
> > > list_for_each_entry_safe(ilist, tmp, &node->val, list) {
> > >
> > > list_del_init(&ilist->list);
> > >
> > > - zfree(&ilist->filename);
> > > - zfree(&ilist->funcname);
> > > + zfree(&ilist->srcline);
> > > + // only the inlined symbols are owned by the list
> > > + if (ilist->symbol && ilist->symbol->inlined)
> > > + symbol__delete(ilist->symbol);
> >
> > Existing symbols are released at this moment.
>
> Thanks for the review, I'll try to look into these issues once I have more
> time again.

OK, so I just dug into this part of the patch again. I don't think it's
actually a problem after all:

When an inline node reuses the real symbol, that symbol won't have its
`inlined` member set to true. Thus these symbols will never get deleted by
inline_node__delete. If you have suggestions on how to make this clearer, I'm
all ears. For now, I'll add a comment to where we alias/reuse the symbol.

I'll try to split up the patch now to make it somehow easier to review.

Cheers

--
Milian Wolff | [email protected] | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts


2017-09-07 15:00:44

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v2 03/14] perf report: create real callchain entries for inlined frames

Hi Milian,

On Wed, Sep 06, 2017 at 03:13:35PM +0200, Milian Wolff wrote:
> On Sonntag, 20. August 2017 22:57:17 CEST Milian Wolff wrote:
> > On Mittwoch, 16. August 2017 09:53:13 CEST Namhyung Kim wrote:
> > > Hi Milian,
> > >
> > > On Sun, Aug 06, 2017 at 11:24:35PM +0200, Milian Wolff wrote:
> > > > The inlined frames use a fake symbol that is tracked in a special
> > > > map inside the dso, which is always sorted by name. All other
> > >
> > > It seems the above is not true. Fake symbols are maintained by
> > > inline_node which in turn maintained by dso->inlines tree.
> >
> > OK, I'll rephrase this to make it more explicit. But what I call "map" is
> > what you call "tree", no?
> >
> > <snip>
> >
> > > > @@ -201,7 +202,8 @@ static inline void callchain_cursor_reset(struct
> > > > callchain_cursor *cursor)>
> > > >
> > > > int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
> > > >
> > > > struct map *map, struct symbol *sym,
> > > > bool branch, struct branch_flags *flags,
> > > >
> > > > - int nr_loop_iter, int samples, u64 branch_from);
> > > > + int nr_loop_iter, int samples, u64 branch_from,
> > > > + const char *srcline);
> > > >
> > > > /* Close a cursor writing session. Initialize for the reader */
> > > > static inline void callchain_cursor_commit(struct callchain_cursor
> > > > *cursor)
> > >
> > > I think it'd be better splitting srcline change into a separate
> > > commit.
> >
> > OK, I'll try to see how this goes. It would simply add the boiler plate code
> > to pass the srcline around, and then set it from within the callchain code,
> > right?
> >
> > > > diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> > > > index b9e087fb8247..72e6e390fd26 100644
> > > > --- a/tools/perf/util/dso.c
> > > > +++ b/tools/perf/util/dso.c
> > > > @@ -9,6 +9,7 @@
> > > >
> > > > #include "compress.h"
> > > > #include "path.h"
> > > > #include "symbol.h"
> > > >
> > > > +#include "srcline.h"
> > > >
> > > > #include "dso.h"
> > > > #include "machine.h"
> > > > #include "auxtrace.h"
> > > >
> > > > @@ -1233,6 +1234,7 @@ void dso__delete(struct dso *dso)
> > > >
> > > > dso->long_name);
> > > >
> > > > for (i = 0; i < MAP__NR_TYPES; ++i)
> > > >
> > > > symbols__delete(&dso->symbols[i]);
> > > >
> > > > + inlines__tree_delete(&dso->inlined_nodes);
> > >
> > > Hmm.. inline_node is released after symbol but it seems to have a
> > > problem. Please see below.
> > >
> > > > if (dso->short_name_allocated) {
> > > >
> > > > zfree((char **)&dso->short_name);
> > > >
> > > > diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> > > > index f886141678eb..7d1e2b3c1f10 100644
> > > > --- a/tools/perf/util/dso.h
> > > > +++ b/tools/perf/util/dso.h
> > > > @@ -141,6 +141,7 @@ struct dso {
> > > >
> > > > struct rb_root *root; /* root of rbtree that rb_node is in
> >
> > */
> >
> > > > struct rb_root symbols[MAP__NR_TYPES];
> > > > struct rb_root symbol_names[MAP__NR_TYPES];
> > > >
> > > > + struct rb_root inlined_nodes;
> > > >
> > > > struct {
> > > >
> > > > u64 addr;
> > > > struct symbol *symbol;
> > >
> > > [SNIP]
> > >
> > > > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > > > index d4df353051af..a7f8499c8756 100644
> > > > --- a/tools/perf/util/machine.c
> > > > +++ b/tools/perf/util/machine.c
> > > > diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
> > > > index ebc88a74e67b..a1fdf035d1dd 100644
> > > > --- a/tools/perf/util/srcline.c
> > > > +++ b/tools/perf/util/srcline.c
> > > > @@ -33,28 +33,17 @@ static const char *dso__name(struct dso *dso)
> > > >
> > > > return dso_name;
> > > >
> > > > }
> > > >
> > > > -static int inline_list__append(char *filename, char *funcname, int
> > > > line_nr, - struct inline_node *node, struct dso *dso)
> > > > +static int inline_list__append(struct symbol *symbol, char *srcline,
> > > > + struct inline_node *node)
> > > >
> > > > {
> > > >
> > > > struct inline_list *ilist;
> > > >
> > > > - char *demangled;
> > > >
> > > > ilist = zalloc(sizeof(*ilist));
> > > > if (ilist == NULL)
> > > >
> > > > return -1;
> > > >
> > > > - ilist->filename = filename;
> > > > - ilist->line_nr = line_nr;
> > > > -
> > > > - if (dso != NULL) {
> > > > - demangled = dso__demangle_sym(dso, 0, funcname);
> > > > - if (demangled == NULL) {
> > > > - ilist->funcname = funcname;
> > > > - } else {
> > > > - ilist->funcname = demangled;
> > > > - free(funcname);
> > > > - }
> > > > - }
> > > > + ilist->symbol = symbol;
> > > > + ilist->srcline = srcline;
> > > >
> > > > if (callchain_param.order == ORDER_CALLEE)
> > > >
> > > > list_add_tail(&ilist->list, &node->val);
> > > >
> > > > @@ -64,6 +53,30 @@ static int inline_list__append(char *filename, char
> > > > *funcname, int line_nr,>
> > > >
> > > > return 0;
> > > >
> > > > }
> > > >
> > > > +// basename version that takes a const input string
> > > > +static const char *gnu_basename(const char *path)
> > > > +{
> > > > + const char *base = strrchr(path, '/');
> > > > +
> > > > + return base ? base + 1 : path;
> > > > +}
> > > > +
> > > > +static char *srcline_from_fileline(const char *file, unsigned int line)
> > > > +{
> > > > + char *srcline;
> > > > +
> > > > + if (!file)
> > > > + return NULL;
> > > > +
> > > > + if (!srcline_full_filename)
> > > > + file = gnu_basename(file);
> > > > +
> > > > + if (asprintf(&srcline, "%s:%u", file, line) < 0)
> > > > + return NULL;
> > > > +
> > > > + return srcline;
> > > > +}
> > > > +
> > > >
> > > > #ifdef HAVE_LIBBFD_SUPPORT
> > > >
> > > > /*
> > > >
> > > > @@ -203,19 +216,55 @@ static void addr2line_cleanup(struct a2l_data
> > > > *a2l)
> > > >
> > > > #define MAX_INLINE_NEST 1024
> > > >
> > > > +static struct symbol *new_inline_sym(struct dso *dso,
> > > > + struct symbol *base_sym,
> > > > + const char *funcname)
> > > > +{
> > > > + struct symbol *inline_sym;
> > > > + char *demangled = NULL;
> > > > +
> > > > + if (dso) {
> > > > + demangled = dso__demangle_sym(dso, 0, funcname);
> > > > + if (demangled)
> > > > + funcname = demangled;
> > > > + }
> > > > +
> > > > + if (strcmp(funcname, base_sym->name) == 0) {
> > > > + // reuse the real, existing symbol
> > > > + inline_sym = base_sym;
> > >
> > > So inline_node could refer the existing symbol.
> >
> > Yes, that does indeed seem to be an issue. I wonder why I'm not seeing any
> > crashes or valgrind/ASAN reports though. I'll have to dig into this.
> >
> > > > + } else {
> > > > + // create a fake symbol for the inline frame
> > > > + inline_sym = symbol__new(base_sym ? base_sym->start : 0,
> > > > + base_sym ? base_sym->end : 0,
> > > > + base_sym ? base_sym->binding : 0,
> > > > + funcname);
> > > > + if (inline_sym)
> > > > + inline_sym->inlined = 1;
> > > > + }
> > > > +
> > > > + free(demangled);
> > > > +
> > > > + return inline_sym;
> > > > +}
> > > > +
> > > >
> > > > static int inline_list__append_dso_a2l(struct dso *dso,
> > > >
> > > > - struct inline_node *node)
> > > > + struct inline_node *node,
> > > > + struct symbol *sym)
> > > >
> > > > {
> > > >
> > > > struct a2l_data *a2l = dso->a2l;
> > > >
> > > > - char *funcname = a2l->funcname ? strdup(a2l->funcname) : NULL;
> > > > - char *filename = a2l->filename ? strdup(a2l->filename) : NULL;
> > > > + struct symbol *inline_sym = new_inline_sym(dso, sym, a2l->funcname);
> > > > + char *srcline = NULL;
> > > > +
> > > > + if (a2l->filename)
> > > > + srcline = srcline_from_fileline(a2l->filename, a2l->line);
> > > >
> > > > - return inline_list__append(filename, funcname, a2l->line, node, dso);
> > > > + return inline_list__append(inline_sym, srcline, node);
> > > >
> > > > }
> > >
> > > [SNIP]
> > >
> > > > @@ -511,10 +563,63 @@ void inline_node__delete(struct inline_node *node)
> > > >
> > > > list_for_each_entry_safe(ilist, tmp, &node->val, list) {
> > > >
> > > > list_del_init(&ilist->list);
> > > >
> > > > - zfree(&ilist->filename);
> > > > - zfree(&ilist->funcname);
> > > > + zfree(&ilist->srcline);
> > > > + // only the inlined symbols are owned by the list
> > > > + if (ilist->symbol && ilist->symbol->inlined)
> > > > + symbol__delete(ilist->symbol);
> > >
> > > Existing symbols are released at this moment.
> >
> > Thanks for the review, I'll try to look into these issues once I have more
> > time again.
>
> OK, so I just dug into this part of the patch again. I don't think it's
> actually a problem after all:
>
> When an inline node reuses the real symbol, that symbol won't have its
> `inlined` member set to true. Thus these symbols will never get deleted by
> inline_node__delete.

But ilist->symbol is a dangling pointer so accessing ->inlined would
be a problem, no?


> If you have suggestions on how to make this clearer, I'm
> all ears. For now, I'll add a comment to where we alias/reuse the symbol.
>
> I'll try to split up the patch now to make it somehow easier to review.

Thanks for doing this, but I'm afraid I don't have time to review
before going to OSS-NA.

Thanks,
Namhyung

2017-09-07 15:05:39

by Milian Wolff

[permalink] [raw]
Subject: Re: [PATCH v2 03/14] perf report: create real callchain entries for inlined frames

On Thursday, September 7, 2017 4:58:53 PM CEST Namhyung Kim wrote:
> Hi Milian,

Hey Namhyung!

> > > > > @@ -511,10 +563,63 @@ void inline_node__delete(struct inline_node
> > > > > *node)
> > > > >
> > > > > list_for_each_entry_safe(ilist, tmp, &node->val, list) {
> > > > >
> > > > > list_del_init(&ilist->list);
> > > > >
> > > > > - zfree(&ilist->filename);
> > > > > - zfree(&ilist->funcname);
> > > > > + zfree(&ilist->srcline);
> > > > > + // only the inlined symbols are owned by the list
> > > > > + if (ilist->symbol && ilist->symbol->inlined)
> > > > > + symbol__delete(ilist->symbol);
> > > >
> > > > Existing symbols are released at this moment.
> > >
> > > Thanks for the review, I'll try to look into these issues once I have
> > > more
> > > time again.
> >
> > OK, so I just dug into this part of the patch again. I don't think it's
> > actually a problem after all:
> >
> > When an inline node reuses the real symbol, that symbol won't have its
> > `inlined` member set to true. Thus these symbols will never get deleted by
> > inline_node__delete.
>
> But ilist->symbol is a dangling pointer so accessing ->inlined would
> be a problem, no?

Sorry, but I can't follow. Why would it be a dangling pointer? Note, again,
that I've tested this with both valgrind and ASAN and neither reports any
issues about this code.

> > If you have suggestions on how to make this clearer, I'm
> > all ears. For now, I'll add a comment to where we alias/reuse the symbol.
> >
> > I'll try to split up the patch now to make it somehow easier to review.
>
> Thanks for doing this, but I'm afraid I don't have time to review
> before going to OSS-NA.

No problem, enjoy!
--
Milian Wolff | [email protected] | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts


Attachments:
smime.p7s (3.74 kB)

2017-09-07 15:16:57

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v2 03/14] perf report: create real callchain entries for inlined frames

On Fri, Sep 8, 2017 at 12:05 AM, Milian Wolff <[email protected]> wrote:
> On Thursday, September 7, 2017 4:58:53 PM CEST Namhyung Kim wrote:
>> Hi Milian,
>
> Hey Namhyung!
>
>> > > > > @@ -511,10 +563,63 @@ void inline_node__delete(struct inline_node
>> > > > > *node)
>> > > > >
>> > > > > list_for_each_entry_safe(ilist, tmp, &node->val, list) {
>> > > > >
>> > > > > list_del_init(&ilist->list);
>> > > > >
>> > > > > - zfree(&ilist->filename);
>> > > > > - zfree(&ilist->funcname);
>> > > > > + zfree(&ilist->srcline);
>> > > > > + // only the inlined symbols are owned by the list
>> > > > > + if (ilist->symbol && ilist->symbol->inlined)
>> > > > > + symbol__delete(ilist->symbol);
>> > > >
>> > > > Existing symbols are released at this moment.
>> > >
>> > > Thanks for the review, I'll try to look into these issues once I have
>> > > more
>> > > time again.
>> >
>> > OK, so I just dug into this part of the patch again. I don't think it's
>> > actually a problem after all:
>> >
>> > When an inline node reuses the real symbol, that symbol won't have its
>> > `inlined` member set to true. Thus these symbols will never get deleted by
>> > inline_node__delete.
>>
>> But ilist->symbol is a dangling pointer so accessing ->inlined would
>> be a problem, no?
>
> Sorry, but I can't follow. Why would it be a dangling pointer? Note, again,
> that I've tested this with both valgrind and ASAN and neither reports any
> issues about this code.

IIUC, ilist->symbol can point an existing symbol. And all existing
symbols are freed before calling inline_node__delete(). I don't know
why valgrind or asan didn't catch anything.. maybe I'm missing
something.

--
Thanks,
Namhyung