2019-02-21 16:08:02

by Jonas Rabenstein

[permalink] [raw]
Subject: [PATCH 0/3] Support inline symbols in callchains

Hi,
This patchset supersedes my previous attempt to add inline symbols to
callchain of perf-script [1] by a more generic attempt to not hook in
the output stage but directly into the callchain generation. By a matter
of fact this adds those inline symbols automatically to other commands
like perf-report.
Additionally this fixes the regression reported by Jiri Olsa [2] that
some entries from previous outputs had been vanished and now only new
lines are added if symbols had been found.

The integration for perf-report is not completely done as there is an
issue if the root for an hist_entry as for inlined symbols there may be
multiple instances (for each address-range) with the same name. But in
util/sort.c:233 only the name is compared for inlined symbols. As a
consequence the returned hist_entry may hold a reference to another
instance for this inlined symbol (with another address range than
requested) we later on fail with -ERANGE in __symbol__inc_addr_samples
(util/annotate.c:857).
This issue does still permit perf-report to be executed without any
problems and the inlined symbols do show up but none of the samples is
actually accounted to them but to the original symbol :(

To further provide information what this changeset is doing, here is a
script-session to show the differences in the output:

[jonas@x60s]$ git reset --hard v5.0-rc7; make -C tools/perf >/dev/null; \
HEAD is now at a3b22b9f11d9 Linux 5.0-rc7
[jonas@x60s]$ git am *.patch; make -C tools/perf >/dev/null; \
mv tools /perf/perf perf-new
Applying: perf map: add function to lookup inline symbols
Applying: perf machine: use map__inlines in append_inlines
Applying: perf machine: add inline symbols to callchains
[jonas@x60s]$ cat test.c
static int foo(int m){int r=0; for(int i=0;i<m;++i)r+=i; return r;}
static int bar(int m){int r=0; for(int i=0;i<m;++i)r+=foo(i); return r;}
static int baz(int m){int r=0; for(int i=0;i<m;++i)r+=foo(i)*bar(i); return r;}
int main() { return baz(421); }
[jonas@x60s]$ gcc -O2 -fno-omit-frame-pointer -g -o test test.c
[jonas@x60s]$ ./perf-new record --call-graph fp ./test.c
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (19 samples) ]
[jonas@x60s]$ time ./perf-old script >old.script
real 0m0.039s
user 0m0.012s
sys 0m0.022s
[jonas@x60s]$ time ./perf-new script >new.script
real 0m0.045s
user 0m0.010s
sys 0m0.032s
[jonas@x60s]$ diff -u old.script new.script
--- old.script 2019-02-21 16:30:47.997194180 +0100
+++ new.script 2019-02-21 16:30:51.757309790 +0100
@@ -23,66 +23,101 @@
7f43b1148090 _start+0x0 (/lib/x86_64-linux-gnu/ld-2.27.so)

test 7579 1470536.968092: 354758 cycles:uppp:
+ 7f43b1148f4b elf_get_dynamic_info+0xab (inlined)
7f43b1148f4b _dl_start+0xab (/lib/x86_64-linux-gnu/ld-2.27.so)
7f43b1148098 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-2.27.so)
[...]

test 7579 1470536.969210: 1922435 cycles:uppp:
+ 563f0df9c53f foo+0x4f (inlined)
+ 563f0df9c53f bar+0x4f (inlined)
+ 563f0df9c53f baz+0x4f (inlined)
563f0df9c53f main+0x4f (/home/jonas/linux/test)
7f43b0d77b97 __libc_start_main+0xe7 (/lib/x86_64-linux-gnu/libc-2.27.so)
75e258d4c544155 [unknown] ([unknown])
[jonas@x60s]$ time ./perf-new script --no-inline >new.noinline.script
real 0m0.035s
user 0m0.012s
sys 0m0.020s
[jonas@x60s]$ diff -u old.script new.noinline.script
[jonas@x60s]$ ./perf-old report --stdio --quiet | sed '/^$/Q'
88.80% 88.80% test test [.] main
|
---0x75e258d4c544155
__libc_start_main
main
[jonas@x60s]$ ./perf-new report --stdio --quiet | sed '/^$/Q'
88.80% 88.80% test test [.] main
|
---0x75e258d4c544155
__libc_start_main
main
baz (inlined)
bar (inlined)
foo (inlined)
[jonas@x60s]$ ./perf-new report --stdio --quiet --no-inline | sed '/^$/Q'
88.80% 88.80% test test [.] main
|
---0x75e258d4c544155
__libc_start_main
main

I am still trying to find a way that in the new output of report the
88.80% 'Self' are not accounted to main itself but split up for the
inlined baz, bar and foo symbols.

I'm open to any help as well as feedback,
Jonas

[1] https://www.spinics.net/lists/linux-perf-users/msg07792.html
[2] https://www.spinics.net/lists/linux-perf-users/msg07798.html

Jonas Rabenstein (3):
perf map: add function to lookup inline symbols
perf machine: use map__inlines in append_inlines
perf machine: add inline symbols to callchains

tools/perf/util/machine.c | 130 ++++++++++++++++++++++++++------------
tools/perf/util/map.c | 23 +++++++
tools/perf/util/map.h | 3 +
3 files changed, 115 insertions(+), 41 deletions(-)

--
2.19.2



2019-02-21 16:08:15

by Jonas Rabenstein

[permalink] [raw]
Subject: [PATCH 3/3] perf machine: add inline symbols to callchains

Use map__inlines to resolve inlined functions for every entry with
an symbol that should be added to a callchain.

Signed-off-by: Jonas Rabenstein <[email protected]>
---
tools/perf/util/machine.c | 115 ++++++++++++++++++++++++++++----------
1 file changed, 87 insertions(+), 28 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index dce29c21e4ea..070d074482b4 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1908,6 +1908,91 @@ struct iterations {
u64 cycles;
};

+static int __add_callchain_location(struct callchain_cursor *cursor,
+ struct symbol **parent,
+ struct addr_location *root_al,
+ u64 addr, struct addr_location *al,
+ bool branch, struct branch_flags *flags,
+ u64 branch_from, struct iterations *iter)
+{
+ int nr_loop_iter = 0;
+ u64 iter_cycles = 0;
+
+ if (symbol_conf.hide_unresolved && al->sym == NULL)
+ return 0;
+
+ if (al->sym) {
+ if (perf_hpp_list.parent && !*parent &&
+ symbol__match_regex(al->sym, &parent_regex))
+ *parent = al->sym;
+ else if (have_ignore_callees && root_al &&
+ symbol__match_regex(al->sym, &ignore_callees_regex)) {
+ /* Treat this symbol as the root,
+ forgetting its callees. */
+ *root_al = *al;
+ callchain_cursor_reset(cursor);
+ }
+ }
+
+ if (iter) {
+ nr_loop_iter = iter->nr_loop_iter;
+ iter_cycles = iter->cycles;
+ }
+ return callchain_cursor_append(cursor, addr, al->map, al->sym, branch,
+ flags, nr_loop_iter, iter_cycles,
+ branch_from, al->srcline);
+}
+
+static int __add_callchain_ip(struct callchain_cursor *cursor, u64 ip,
+ struct addr_location *al, bool branch,
+ struct branch_flags *flags, u64 branch_from,
+ struct iterations *iter, struct symbol **parent,
+ struct addr_location *root_al)
+{
+ struct inline_node *inline_node;
+ struct inline_list *inline_list;
+ const char *srcline;
+ struct symbol *symbol;
+ int err = 0;
+
+ al->srcline = callchain_srcline(al->map, al->sym, al->addr);
+ if (callchain_param.order == ORDER_CALLER)
+ err = __add_callchain_location(cursor, parent, root_al, ip, al,
+ branch, flags, branch_from, iter);
+ if (err || !al->map || !al->sym)
+ goto no_inline;
+
+ inline_node = map__inlines(al->map, ip, al->sym);
+ if (!inline_node || list_empty(&inline_node->val))
+ goto no_inline;
+
+ symbol = al->sym;
+ srcline = al->srcline;
+ list_for_each_entry(inline_list, &inline_node->val, list) {
+ if (inline_list->symbol == symbol)
+ continue;
+ al->sym = inline_list->symbol;
+ al->srcline = inline_list->srcline;
+ err = __add_callchain_location(cursor, parent, root_al, ip,
+ al, branch, flags,
+ branch_from, iter);
+ if (err)
+ break;
+ }
+
+ if (callchain_param.order == ORDER_CALLEE) {
+ al->srcline = srcline;
+ al->sym = symbol;
+ }
+
+no_inline:
+ if (!err && callchain_param.order == ORDER_CALLEE)
+ err = __add_callchain_location(cursor, parent, root_al, ip, al,
+ branch, flags, branch_from, iter);
+ return err;
+}
+
+
static int add_callchain_ip(struct thread *thread,
struct callchain_cursor *cursor,
struct symbol **parent,
@@ -1920,9 +2005,6 @@ static int add_callchain_ip(struct thread *thread,
u64 branch_from)
{
struct addr_location al;
- int nr_loop_iter = 0;
- u64 iter_cycles = 0;
- const char *srcline = NULL;

al.filtered = 0;
al.sym = NULL;
@@ -1955,31 +2037,8 @@ static int add_callchain_ip(struct thread *thread,
thread__find_symbol(thread, *cpumode, ip, &al);
}

- if (al.sym != NULL) {
- if (perf_hpp_list.parent && !*parent &&
- symbol__match_regex(al.sym, &parent_regex))
- *parent = al.sym;
- else if (have_ignore_callees && root_al &&
- symbol__match_regex(al.sym, &ignore_callees_regex)) {
- /* Treat this symbol as the root,
- forgetting its callees. */
- *root_al = al;
- callchain_cursor_reset(cursor);
- }
- }
-
- if (symbol_conf.hide_unresolved && al.sym == NULL)
- return 0;
-
- if (iter) {
- nr_loop_iter = iter->nr_loop_iter;
- iter_cycles = iter->cycles;
- }
-
- srcline = callchain_srcline(al.map, al.sym, al.addr);
- return callchain_cursor_append(cursor, ip, al.map, al.sym,
- branch, flags, nr_loop_iter,
- iter_cycles, branch_from, srcline);
+ return __add_callchain_ip(cursor, ip, &al, branch, flags, branch_from,
+ iter, parent, root_al);
}

struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
--
2.19.2


2019-02-21 16:08:50

by Jonas Rabenstein

[permalink] [raw]
Subject: [PATCH 2/3] perf machine: use map__inlines in append_inlines

The previous patch provides a generic way to lookup the root node of the
inlined symbols for a specific address. Reuse that implementation
instead of a duplicated version of the required steps.

Signed-off-by: Jonas Rabenstein <[email protected]>
---
tools/perf/util/machine.c | 15 ++-------------
1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 143f7057d581..dce29c21e4ea 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2330,23 +2330,12 @@ static int append_inlines(struct callchain_cursor *cursor,
{
struct inline_node *inline_node;
struct inline_list *ilist;
- u64 addr;
int ret = 1;

- if (!symbol_conf.inline_name || !map || !sym)
+ inline_node = map__inlines(map, ip, sym);
+ if (!inline_node)
return ret;

- addr = map__map_ip(map, ip);
- addr = map__rip_2objdump(map, addr);
-
- inline_node = inlines__tree_find(&map->dso->inlined_nodes, addr);
- if (!inline_node) {
- inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
- if (!inline_node)
- return ret;
- inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
- }
-
list_for_each_entry(ilist, &inline_node->val, list) {
ret = callchain_cursor_append(cursor, ip, map,
ilist->symbol, false,
--
2.19.2


2019-02-21 16:09:09

by Jonas Rabenstein

[permalink] [raw]
Subject: [PATCH 1/3] perf map: add function to lookup inline symbols

Inlined symbols should always be added to the dso's inlined_nodes tree
in order to reuse them for a later lookup for the same address. Instead
of repeating those steps at the different users, provide a central
method to lookup and register inline symbols for a map.

Signed-off-by: Jonas Rabenstein <[email protected]>
---
tools/perf/util/map.c | 23 +++++++++++++++++++++++
tools/perf/util/map.h | 3 +++
2 files changed, 26 insertions(+)

diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 6751301a755c..0fe74e83ca8d 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -988,3 +988,26 @@ struct map_groups *map__kmaps(struct map *map)
}
return kmap->kmaps;
}
+
+struct inline_node *map__inlines(struct map *map, u64 addr, struct symbol *sym)
+{
+ struct inline_node *node;
+
+ if (!symbol_conf.inline_name)
+ return NULL;
+
+ if (!map->dso)
+ return NULL;
+
+ addr = map->map_ip(map, addr);
+ addr = map__rip_2objdump(map, addr);
+
+ node = inlines__tree_find(&map->dso->inlined_nodes, addr);
+ if (node)
+ return node;
+
+ node = dso__parse_addr_inlines(map->dso, addr, sym);
+ if (node)
+ inlines__tree_insert(&map->dso->inlined_nodes, node);
+ return node;
+}
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index 09282aa45c80..bdcc6e77e26e 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -270,4 +270,7 @@ static inline bool is_entry_trampoline(const char *name)
return !strcmp(name, ENTRY_TRAMPOLINE_NAME);
}

+struct inline_node *map__inlines(struct map *map, u64 addr,
+ struct symbol *sym);
+
#endif /* __PERF_MAP_H */
--
2.19.2


2019-02-23 04:22:46

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 3/3] perf machine: add inline symbols to callchains

Hello,

On Fri, Feb 22, 2019 at 1:07 AM Jonas Rabenstein
<[email protected]> wrote:
>
> Use map__inlines to resolve inlined functions for every entry with
> an symbol that should be added to a callchain.
>
> Signed-off-by: Jonas Rabenstein <[email protected]>
> ---
> tools/perf/util/machine.c | 115 ++++++++++++++++++++++++++++----------
> 1 file changed, 87 insertions(+), 28 deletions(-)
>
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index dce29c21e4ea..070d074482b4 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1908,6 +1908,91 @@ struct iterations {
> u64 cycles;
> };
>
> +static int __add_callchain_location(struct callchain_cursor *cursor,
> + struct symbol **parent,
> + struct addr_location *root_al,
> + u64 addr, struct addr_location *al,
> + bool branch, struct branch_flags *flags,
> + u64 branch_from, struct iterations *iter)
> +{
> + int nr_loop_iter = 0;
> + u64 iter_cycles = 0;
> +
> + if (symbol_conf.hide_unresolved && al->sym == NULL)
> + return 0;
> +
> + if (al->sym) {
> + if (perf_hpp_list.parent && !*parent &&
> + symbol__match_regex(al->sym, &parent_regex))
> + *parent = al->sym;
> + else if (have_ignore_callees && root_al &&
> + symbol__match_regex(al->sym, &ignore_callees_regex)) {
> + /* Treat this symbol as the root,
> + forgetting its callees. */
> + *root_al = *al;
> + callchain_cursor_reset(cursor);
> + }
> + }
> +
> + if (iter) {
> + nr_loop_iter = iter->nr_loop_iter;
> + iter_cycles = iter->cycles;
> + }
> + return callchain_cursor_append(cursor, addr, al->map, al->sym, branch,
> + flags, nr_loop_iter, iter_cycles,
> + branch_from, al->srcline);
> +}
> +
> +static int __add_callchain_ip(struct callchain_cursor *cursor, u64 ip,
> + struct addr_location *al, bool branch,
> + struct branch_flags *flags, u64 branch_from,
> + struct iterations *iter, struct symbol **parent,
> + struct addr_location *root_al)
> +{
> + struct inline_node *inline_node;
> + struct inline_list *inline_list;
> + const char *srcline;
> + struct symbol *symbol;
> + int err = 0;
> +
> + al->srcline = callchain_srcline(al->map, al->sym, al->addr);
> + if (callchain_param.order == ORDER_CALLER)
> + err = __add_callchain_location(cursor, parent, root_al, ip, al,
> + branch, flags, branch_from, iter);
> + if (err || !al->map || !al->sym)
> + goto no_inline;
> +
> + inline_node = map__inlines(al->map, ip, al->sym);
> + if (!inline_node || list_empty(&inline_node->val))
> + goto no_inline;
> +
> + symbol = al->sym;
> + srcline = al->srcline;
> + list_for_each_entry(inline_list, &inline_node->val, list) {
> + if (inline_list->symbol == symbol)
> + continue;
> + al->sym = inline_list->symbol;
> + al->srcline = inline_list->srcline;
> + err = __add_callchain_location(cursor, parent, root_al, ip,
> + al, branch, flags,
> + branch_from, iter);
> + if (err)
> + break;
> + }

Does this loop do the job both for ORDER_CALLER and ORDER_CALLEE?

Thanks,
Namhyung


> +
> + if (callchain_param.order == ORDER_CALLEE) {
> + al->srcline = srcline;
> + al->sym = symbol;
> + }
> +
> +no_inline:
> + if (!err && callchain_param.order == ORDER_CALLEE)
> + err = __add_callchain_location(cursor, parent, root_al, ip, al,
> + branch, flags, branch_from, iter);
> + return err;
> +}
> +
> +
> static int add_callchain_ip(struct thread *thread,
> struct callchain_cursor *cursor,
> struct symbol **parent,
> @@ -1920,9 +2005,6 @@ static int add_callchain_ip(struct thread *thread,
> u64 branch_from)
> {
> struct addr_location al;
> - int nr_loop_iter = 0;
> - u64 iter_cycles = 0;
> - const char *srcline = NULL;
>
> al.filtered = 0;
> al.sym = NULL;
> @@ -1955,31 +2037,8 @@ static int add_callchain_ip(struct thread *thread,
> thread__find_symbol(thread, *cpumode, ip, &al);
> }
>
> - if (al.sym != NULL) {
> - if (perf_hpp_list.parent && !*parent &&
> - symbol__match_regex(al.sym, &parent_regex))
> - *parent = al.sym;
> - else if (have_ignore_callees && root_al &&
> - symbol__match_regex(al.sym, &ignore_callees_regex)) {
> - /* Treat this symbol as the root,
> - forgetting its callees. */
> - *root_al = al;
> - callchain_cursor_reset(cursor);
> - }
> - }
> -
> - if (symbol_conf.hide_unresolved && al.sym == NULL)
> - return 0;
> -
> - if (iter) {
> - nr_loop_iter = iter->nr_loop_iter;
> - iter_cycles = iter->cycles;
> - }
> -
> - srcline = callchain_srcline(al.map, al.sym, al.addr);
> - return callchain_cursor_append(cursor, ip, al.map, al.sym,
> - branch, flags, nr_loop_iter,
> - iter_cycles, branch_from, srcline);
> + return __add_callchain_ip(cursor, ip, &al, branch, flags, branch_from,
> + iter, parent, root_al);
> }
>
> struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
> --
> 2.19.2
>

2019-02-23 04:57:25

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH 0/3] Support inline symbols in callchains

On Fri, Feb 22, 2019 at 1:07 AM Jonas Rabenstein
<[email protected]> wrote:
>
> Hi,
> This patchset supersedes my previous attempt to add inline symbols to
> callchain of perf-script [1] by a more generic attempt to not hook in
> the output stage but directly into the callchain generation. By a matter
> of fact this adds those inline symbols automatically to other commands
> like perf-report.
> Additionally this fixes the regression reported by Jiri Olsa [2] that
> some entries from previous outputs had been vanished and now only new
> lines are added if symbols had been found.
>
> The integration for perf-report is not completely done as there is an
> issue if the root for an hist_entry as for inlined symbols there may be
> multiple instances (for each address-range) with the same name. But in
> util/sort.c:233 only the name is compared for inlined symbols. As a
> consequence the returned hist_entry may hold a reference to another
> instance for this inlined symbol (with another address range than
> requested) we later on fail with -ERANGE in __symbol__inc_addr_samples
> (util/annotate.c:857).
> This issue does still permit perf-report to be executed without any
> problems and the inlined symbols do show up but none of the samples is
> actually accounted to them but to the original symbol :(
>
> To further provide information what this changeset is doing, here is a
> script-session to show the differences in the output:
>
> [jonas@x60s]$ git reset --hard v5.0-rc7; make -C tools/perf >/dev/null; \
> HEAD is now at a3b22b9f11d9 Linux 5.0-rc7
> [jonas@x60s]$ git am *.patch; make -C tools/perf >/dev/null; \
> mv tools /perf/perf perf-new
> Applying: perf map: add function to lookup inline symbols
> Applying: perf machine: use map__inlines in append_inlines
> Applying: perf machine: add inline symbols to callchains
> [jonas@x60s]$ cat test.c
> static int foo(int m){int r=0; for(int i=0;i<m;++i)r+=i; return r;}
> static int bar(int m){int r=0; for(int i=0;i<m;++i)r+=foo(i); return r;}
> static int baz(int m){int r=0; for(int i=0;i<m;++i)r+=foo(i)*bar(i); return r;}
> int main() { return baz(421); }
> [jonas@x60s]$ gcc -O2 -fno-omit-frame-pointer -g -o test test.c
> [jonas@x60s]$ ./perf-new record --call-graph fp ./test.c
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.002 MB perf.data (19 samples) ]
> [jonas@x60s]$ time ./perf-old script >old.script
> real 0m0.039s
> user 0m0.012s
> sys 0m0.022s
> [jonas@x60s]$ time ./perf-new script >new.script
> real 0m0.045s
> user 0m0.010s
> sys 0m0.032s
> [jonas@x60s]$ diff -u old.script new.script
> --- old.script 2019-02-21 16:30:47.997194180 +0100
> +++ new.script 2019-02-21 16:30:51.757309790 +0100
> @@ -23,66 +23,101 @@
> 7f43b1148090 _start+0x0 (/lib/x86_64-linux-gnu/ld-2.27.so)
>
> test 7579 1470536.968092: 354758 cycles:uppp:
> + 7f43b1148f4b elf_get_dynamic_info+0xab (inlined)
> 7f43b1148f4b _dl_start+0xab (/lib/x86_64-linux-gnu/ld-2.27.so)
> 7f43b1148098 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-2.27.so)
> [...]
>
> test 7579 1470536.969210: 1922435 cycles:uppp:
> + 563f0df9c53f foo+0x4f (inlined)
> + 563f0df9c53f bar+0x4f (inlined)
> + 563f0df9c53f baz+0x4f (inlined)
> 563f0df9c53f main+0x4f (/home/jonas/linux/test)
> 7f43b0d77b97 __libc_start_main+0xe7 (/lib/x86_64-linux-gnu/libc-2.27.so)
> 75e258d4c544155 [unknown] ([unknown])
> [jonas@x60s]$ time ./perf-new script --no-inline >new.noinline.script
> real 0m0.035s
> user 0m0.012s
> sys 0m0.020s
> [jonas@x60s]$ diff -u old.script new.noinline.script
> [jonas@x60s]$ ./perf-old report --stdio --quiet | sed '/^$/Q'
> 88.80% 88.80% test test [.] main
> |
> ---0x75e258d4c544155
> __libc_start_main
> main
> [jonas@x60s]$ ./perf-new report --stdio --quiet | sed '/^$/Q'
> 88.80% 88.80% test test [.] main
> |
> ---0x75e258d4c544155
> __libc_start_main
> main
> baz (inlined)
> bar (inlined)
> foo (inlined)
> [jonas@x60s]$ ./perf-new report --stdio --quiet --no-inline | sed '/^$/Q'
> 88.80% 88.80% test test [.] main
> |
> ---0x75e258d4c544155
> __libc_start_main
> main
>
> I am still trying to find a way that in the new output of report the
> 88.80% 'Self' are not accounted to main itself but split up for the
> inlined baz, bar and foo symbols.
>
> I'm open to any help as well as feedback,

I guess you can set al->sym to a symbol of the last callchain entry in
the iter_add_single_cumulative_entry(). For annotation, al->addr
needs to be updated from the start of the inlined subroutine if DWARF
is used, but not sure for addr2line though.

Thanks,
Namhyung